Minimal logical decoding on standbys
Hi,
Craig previously worked on $subject, see thread [1]/messages/by-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@mail.gmail.com. A bunch of the
prerequisite features from that and other related threads have been
integrated into PG. What's missing is actually allowing logical
decoding on a standby. The latest patch from that thread does that [2]https://archives.postgresql.org/message-id/CAMsr%2BYEbS8ZZ%2Bw18j7OPM2MZEeDtGN9wDVF68%3DMzpeW%3DKRZZ9Q%40mail.gmail.com,
but unfortunately hasn't been updated after slipping v10.
The biggest remaining issue to allow it is that the catalog xmin on the
primary has to be above the catalog xmin horizon of all slots on the
standby. The patch in [2]https://archives.postgresql.org/message-id/CAMsr%2BYEbS8ZZ%2Bw18j7OPM2MZEeDtGN9wDVF68%3DMzpeW%3DKRZZ9Q%40mail.gmail.com does so by periodically logging a new record
that announces the current catalog xmin horizon. Additionally it
checks that hot_standby_feedback is enabled when doing logical decoding
from a standby.
I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.
I also don't think we should actually enforce hot_standby_feedback being
enabled - there's use-cases where that's not convenient, and it's not
bullet proof anyway (can be enabled/disabled without using logical
decoding inbetween). I think when there's a conflict we should have the
HINT mention that hs_feedback can be used to prevent such conflicts,
that ought to be enough.
Attached is a rough draft patch. If we were to go for this approach,
we'd obviously need to improve the actual conflict handling against
slots - right now it just logs a WARNING and retries shortly after.
I think there's currently one hole in this approach. Nbtree (and other
index types, which are pretty unlikely to matter here) have this logic
to handle snapshot conflicts for single-page deletions:
/*
* If we have any conflict processing to do, it must happen before we
* update the page.
*
* Btree delete records can conflict with standby queries. You might
* think that vacuum records would conflict as well, but we've handled
* that already. XLOG_HEAP2_CLEANUP_INFO records provide the highest xid
* cleaned by the vacuum of the heap and so we can resolve any conflicts
* just once when that arrives. After that we know that no conflicts
* exist from individual btree vacuum records on that index.
*/
if (InHotStandby)
{
TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record);
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
xlrec->onCatalogTable, rnode);
}
I.e. we get the latest removed xid from the heap, which has the
following logic:
/*
* If there's nothing running on the standby we don't need to derive a
* full latestRemovedXid value, so use a fast path out of here. This
* returns InvalidTransactionId, and so will conflict with all HS
* transactions; but since we just worked out that that's zero people,
* it's OK.
*
* XXX There is a race condition here, which is that a new backend might
* start just after we look. If so, it cannot need to conflict, but this
* coding will result in throwing a conflict anyway.
*/
if (CountDBBackends(InvalidOid) == 0)
return latestRemovedXid;
/*
* In what follows, we have to examine the previous state of the index
* page, as well as the heap page(s) it points to. This is only valid if
* WAL replay has reached a consistent database state; which means that
* the preceding check is not just an optimization, but is *necessary*. We
* won't have let in any user sessions before we reach consistency.
*/
if (!reachedConsistency)
elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data");
so we wouldn't get a correct xid when not if nobody is connected to a
database (and by implication when not yet consistent).
I'm wondering if it's time to move the latestRemovedXid computation for
this type of record to the primary - it's likely to be cheaper there and
avoids this kind of complication. Secondarily, it'd have the advantage
of making pluggable storage integration easier - there we have the
problem that we don't know which type of relation we're dealing with
during recovery, so such lookups make pluggability harder (zheap just
adds extra flags to signal that, but that's not extensible).
Another alternative would be to just prevent such index deletions for
catalog tables when wal_level = logical.
If we were to go with this approach, there'd be at least the following
tasks:
- adapt tests from [2]https://archives.postgresql.org/message-id/CAMsr%2BYEbS8ZZ%2Bw18j7OPM2MZEeDtGN9wDVF68%3DMzpeW%3DKRZZ9Q%40mail.gmail.com
- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot exists
- fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
above.
- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.
- get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be
optimized (e.g. check wal_level before opening rel etc).
Once we have this logic, it can be used to implement something like
failover slots on-top, by having having a mechanism that occasionally
forwards slots on standbys using pg_replication_slot_advance().
Greetings,
Andres Freund
[1]: /messages/by-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@mail.gmail.com
[2]: https://archives.postgresql.org/message-id/CAMsr%2BYEbS8ZZ%2Bw18j7OPM2MZEeDtGN9wDVF68%3DMzpeW%3DKRZZ9Q%40mail.gmail.com
Attachments:
logical-decoding-on-standby.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index ab5aaff1566..cd068243d36 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1154,7 +1154,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 3eb722ce266..7f8604fbbe2 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -18,6 +18,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
#include "access/heapam.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -25,7 +26,7 @@
#include "storage/predicate.h"
static void _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
- RelFileNode hnode);
+ Relation heapRel);
/*
* _hash_doinsert() -- Handle insertion of a single index tuple.
@@ -138,7 +139,7 @@ restart_insert:
if (IsBufferCleanupOK(buf))
{
- _hash_vacuum_one_page(rel, metabuf, buf, heapRel->rd_node);
+ _hash_vacuum_one_page(rel, metabuf, buf, heapRel);
if (PageGetFreeSpace(page) >= itemsz)
break; /* OK, now we have enough space */
@@ -337,7 +338,7 @@ _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
static void
_hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
- RelFileNode hnode)
+ Relation heapRel)
{
OffsetNumber deletable[MaxOffsetNumber];
int ndeletable = 0;
@@ -394,7 +395,8 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
- xlrec.hnode = hnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(heapRel);
+ xlrec.hnode = heapRel->rd_node;
xlrec.ntuples = ndeletable;
XLogBeginInsert();
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 96501456422..5de6311c2c8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7565,12 +7565,13 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7606,6 +7607,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7656,6 +7658,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7686,7 +7689,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7696,6 +7699,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -8116,7 +8120,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -8152,7 +8157,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -8248,7 +8254,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -8385,7 +8393,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 695567b4b0d..acdce7f43ad 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -312,7 +312,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 4082103fe2d..481d3640499 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *metad);
@@ -704,6 +705,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1065,6 +1067,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.hnode = heapRel->rd_node;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 67a94cb80a2..e3e21398065 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -698,7 +698,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -982,6 +983,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index a83a4b581ed..c7c9c002a29 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 9e2bd3f8119..089fe58283b 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -913,6 +913,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 8134c52253e..456f3323fee 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -450,7 +450,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 9f99e4f0499..f8e89661715 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,7 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +112,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1f2e7139a70..1b723039a4f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,77 @@ ReplicationSlotReserveWal(void)
}
}
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ NameData slotname;
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_xmin, xid)));
+ }
+
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_catalog_xmin, xid)));
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+
+ if (found_conflict)
+ goto restart;
+}
+
+
/*
* Flush all replication slots to disk.
*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index c9bb3e987d0..e14f5f132f4 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 7a263cc1fdc..fef7e13fe97 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -19,6 +19,7 @@
#include "access/htup_details.h"
#include "access/nbtree.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1860,6 +1861,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 527138440b3..ac40dd26e8c 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
RelFileNode hnode;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 914897f83db..a702b86f481 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 819373031cd..0710e3a45c9 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -123,6 +123,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
RelFileNode hnode; /* RelFileNode of the heap the index currently
* points at */
int nitems;
@@ -137,6 +138,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index b72ccb5cc48..93185a08143 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 7964ae254f4..7a1228de934 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 1fcd8cf1b59..4b123ea67cf 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool catalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index ff1705ad2b8..0d3d49df605 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -129,6 +129,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
On Wed, Dec 12, 2018 at 3:41 PM Andres Freund <andres@anarazel.de> wrote:
I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.
To me, this paragraph appears to say that you don't like Craig's
approach without quite explaining why you don't like it. Could you be
a bit more explicit about that?
I also don't think we should actually enforce hot_standby_feedback being
enabled - there's use-cases where that's not convenient, and it's not
bullet proof anyway (can be enabled/disabled without using logical
decoding inbetween). I think when there's a conflict we should have the
HINT mention that hs_feedback can be used to prevent such conflicts,
that ought to be enough.
If we can make that work, +1 from me.
I'm wondering if it's time to move the latestRemovedXid computation for
this type of record to the primary - it's likely to be cheaper there and
avoids this kind of complication. Secondarily, it'd have the advantage
of making pluggable storage integration easier - there we have the
problem that we don't know which type of relation we're dealing with
during recovery, so such lookups make pluggability harder (zheap just
adds extra flags to signal that, but that's not extensible).
That doesn't look trivial. It seems like _bt_delitems_delete() would
need to get an array of XIDs, but that gets called from
_bt_vacuum_one_page(), which doesn't have that information available.
It doesn't look like there is a particularly cheap way of getting it,
either. What do you have in mind?
Another alternative would be to just prevent such index deletions for
catalog tables when wal_level = logical.
That doesn't sound like a very nice idea.
If we were to go with this approach, there'd be at least the following
tasks:
- adapt tests from [2]
OK.
- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot exists
Why do we need this?
- fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
above.
OK.
- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.
I guess the viable options are approximately -- (1) drop the slot, (2)
advance the slot, (3) mark the slot as "failed" but leave it in
existence as a tombstone, (4) wait until something changes. I like
(3) better than (1). (4) seems pretty unfortunate unless there's some
other system for having the slot advance automatically. Seems like a
way for replication to hang indefinitely without anybody understanding
why it's happened (or, maybe, noticing).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2018-12-13 19:32:19 -0500, Robert Haas wrote:
On Wed, Dec 12, 2018 at 3:41 PM Andres Freund <andres@anarazel.de> wrote:
I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.To me, this paragraph appears to say that you don't like Craig's
approach without quite explaining why you don't like it. Could you be
a bit more explicit about that?
I think the conflict system introduced in Craig's patch is quite
complicated, relies on logging new wal records on a regular basis, adds
needs to be more conservative about the xmin horizon, which is obviously
not great for performance.
If you look at Craig's patch, it currently relies on blocking out
concurrent checkpoints:
/*
* We must prevent a concurrent checkpoint, otherwise the catalog xmin
* advance xlog record with the new value might be written before the
* checkpoint but the checkpoint may still see the old
* oldestCatalogXmin value.
*/
if (!LWLockConditionalAcquire(CheckpointLock, LW_SHARED))
/* Couldn't get checkpointer lock; will retry later */
return;
which on its own seems unacceptable, given that CheckpointLock can be
held by checkpointer for a very long time. While that's ongoing the
catalog xmin horizon doesn't advance.
Looking at the code it seems hard, to me, to make that approach work
nicely. But I might just be tired.
I'm wondering if it's time to move the latestRemovedXid computation for
this type of record to the primary - it's likely to be cheaper there and
avoids this kind of complication. Secondarily, it'd have the advantage
of making pluggable storage integration easier - there we have the
problem that we don't know which type of relation we're dealing with
during recovery, so such lookups make pluggability harder (zheap just
adds extra flags to signal that, but that's not extensible).That doesn't look trivial. It seems like _bt_delitems_delete() would
need to get an array of XIDs, but that gets called from
_bt_vacuum_one_page(), which doesn't have that information available.
It doesn't look like there is a particularly cheap way of getting it,
either. What do you have in mind?
I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).
Writing that message right now.
- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot existsWhy do we need this?
Currently the conflict routines are only called when hot standby is
on. There's also no way to use logical decoding (including just advancing the
slot), without hot-standby being enabled, so I think that'd be a pretty
harmless restriction.
- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.I guess the viable options are approximately --
(1) drop the slot
Doable.
(2) advance the slot
That's not realistically possible, I think. We'd need to be able to use
most of the logical decoding infrastructure in that context, and we
don't have that available. It's also possible to deadlock, where
advancing the slot's xmin horizon would need further WAL, but WAL replay
is blocked on advancing the slot.
(3) mark the slot as "failed" but leave it in existence as a tombstone
We currently don't have that, but it'd be doable, I think.
(4) wait until something changes.
(4) seems pretty unfortunate unless there's some other system for
having the slot advance automatically. Seems like a way for
replication to hang indefinitely without anybody understanding why
it's happened (or, maybe, noticing).
On the other hand, it would often allow whatever user of the slot to
continue using it, till the conflict is "resolved". To me it seems about
as easy to debug physical replication being blocked, as somehow the slot
being magically deleted or marked as invalid.
Thanks for looking,
Andres Freund
Attachments:
index-page-vacuum-xid-horizon-primary.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index ab5aaff1566..2f13a0fd2ad 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -969,155 +969,6 @@ hash_xlog_update_meta_page(XLogReaderState *record)
UnlockReleaseBuffer(metabuf);
}
-/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. See also btree_xlog_delete_get_latestRemovedXid,
- * on which this function is based.
- */
-static TransactionId
-hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
-{
- xl_hash_vacuum_one_page *xlrec;
- OffsetNumber *unused;
- Buffer ibuffer,
- hbuffer;
- Page ipage,
- hpage;
- RelFileNode rnode;
- BlockNumber blkno;
- ItemId iitemid,
- hitemid;
- IndexTuple itup;
- HeapTupleHeader htuphdr;
- BlockNumber hblkno;
- OffsetNumber hoffnum;
- TransactionId latestRemovedXid = InvalidTransactionId;
- int i;
-
- xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
-
- /*
- * If there's nothing running on the standby we don't need to derive a
- * full latestRemovedXid value, so use a fast path out of here. This
- * returns InvalidTransactionId, and so will conflict with all HS
- * transactions; but since we just worked out that that's zero people,
- * it's OK.
- *
- * XXX There is a race condition here, which is that a new backend might
- * start just after we look. If so, it cannot need to conflict, but this
- * coding will result in throwing a conflict anyway.
- */
- if (CountDBBackends(InvalidOid) == 0)
- return latestRemovedXid;
-
- /*
- * Check if WAL replay has reached a consistent database state. If not, we
- * must PANIC. See the definition of
- * btree_xlog_delete_get_latestRemovedXid for more details.
- */
- if (!reachedConsistency)
- elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
-
- /*
- * Get index page. If the DB is consistent, this should not fail, nor
- * should any of the heap page fetches below. If one does, we return
- * InvalidTransactionId to cancel all HS transactions. That's probably
- * overkill, but it's safe, and certainly better than panicking here.
- */
- XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
- ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
-
- if (!BufferIsValid(ibuffer))
- return InvalidTransactionId;
- LockBuffer(ibuffer, HASH_READ);
- ipage = (Page) BufferGetPage(ibuffer);
-
- /*
- * Loop through the deleted index items to obtain the TransactionId from
- * the heap items they point to.
- */
- unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
-
- for (i = 0; i < xlrec->ntuples; i++)
- {
- /*
- * Identify the index tuple about to be deleted.
- */
- iitemid = PageGetItemId(ipage, unused[i]);
- itup = (IndexTuple) PageGetItem(ipage, iitemid);
-
- /*
- * Locate the heap page that the index tuple points at
- */
- hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
- hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM,
- hblkno, RBM_NORMAL);
-
- if (!BufferIsValid(hbuffer))
- {
- UnlockReleaseBuffer(ibuffer);
- return InvalidTransactionId;
- }
- LockBuffer(hbuffer, HASH_READ);
- hpage = (Page) BufferGetPage(hbuffer);
-
- /*
- * Look up the heap tuple header that the index tuple points at by
- * using the heap node supplied with the xlrec. We can't use
- * heap_fetch, since it uses ReadBuffer rather than XLogReadBuffer.
- * Note that we are not looking at tuple data here, just headers.
- */
- hoffnum = ItemPointerGetOffsetNumber(&(itup->t_tid));
- hitemid = PageGetItemId(hpage, hoffnum);
-
- /*
- * Follow any redirections until we find something useful.
- */
- while (ItemIdIsRedirected(hitemid))
- {
- hoffnum = ItemIdGetRedirect(hitemid);
- hitemid = PageGetItemId(hpage, hoffnum);
- CHECK_FOR_INTERRUPTS();
- }
-
- /*
- * If the heap item has storage, then read the header and use that to
- * set latestRemovedXid.
- *
- * Some LP_DEAD items may not be accessible, so we ignore them.
- */
- if (ItemIdHasStorage(hitemid))
- {
- htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
- HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
- }
- else if (ItemIdIsDead(hitemid))
- {
- /*
- * Conjecture: if hitemid is dead then it had xids before the xids
- * marked on LP_NORMAL items. So we just ignore this item and move
- * onto the next, for the purposes of calculating
- * latestRemovedxids.
- */
- }
- else
- Assert(!ItemIdIsUsed(hitemid));
-
- UnlockReleaseBuffer(hbuffer);
- }
-
- UnlockReleaseBuffer(ibuffer);
-
- /*
- * If all heap tuples were LP_DEAD then we will be returning
- * InvalidTransactionId here, which avoids conflicts. This matches
- * existing logic which assumes that LP_DEAD tuples must already be older
- * than the latestRemovedXid on the cleanup record that set them as
- * LP_DEAD, hence must already have generated a conflict.
- */
- return latestRemovedXid;
-}
-
/*
* replay delete operation in hash index to remove
* tuples marked as DEAD during index tuple insertion.
@@ -1149,12 +1000,10 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
*/
if (InHotStandby)
{
- TransactionId latestRemovedXid =
- hash_xlog_vacuum_get_latestRemovedXid(record);
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 3eb722ce266..f9a261a713f 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -24,8 +24,8 @@
#include "storage/buf_internals.h"
#include "storage/predicate.h"
-static void _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
- RelFileNode hnode);
+static void _hash_vacuum_one_page(Relation rel, Relation hrel,
+ Buffer metabuf, Buffer buf);
/*
* _hash_doinsert() -- Handle insertion of a single index tuple.
@@ -138,7 +138,7 @@ restart_insert:
if (IsBufferCleanupOK(buf))
{
- _hash_vacuum_one_page(rel, metabuf, buf, heapRel->rd_node);
+ _hash_vacuum_one_page(rel, heapRel, metabuf, buf);
if (PageGetFreeSpace(page) >= itemsz)
break; /* OK, now we have enough space */
@@ -336,8 +336,7 @@ _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
*/
static void
-_hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
- RelFileNode hnode)
+_hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
{
OffsetNumber deletable[MaxOffsetNumber];
int ndeletable = 0;
@@ -361,6 +360,10 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
if (ndeletable > 0)
{
+ TransactionId latestRemovedXid;
+
+ latestRemovedXid = index_compute_xid_horizon_for_tuples(rel, hrel, buf, deletable, ndeletable);
+
/*
* Write-lock the meta page so that we can decrement tuple count.
*/
@@ -394,7 +397,8 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
- xlrec.hnode = hnode;
+ xlrec.latestRemovedXid = latestRemovedXid;
+ xlrec.hnode = hrel->rd_node;
xlrec.ntuples = ndeletable;
XLogBeginInsert();
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 96501456422..049a8498e8f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7558,6 +7558,135 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
/* *latestRemovedXid may still be invalid at end */
}
+/*
+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.
+ *
+ * This puts the work for calculating latestRemovedXid into the recovery path
+ * rather than the primary path.
+ *
+ * It's possible that this generates a fair amount of I/O, since an index
+ * block may have hundreds of tuples being deleted. Repeat accesses to the
+ * same heap blocks are common, though are not yet optimised.
+ *
+ * XXX optimise later with something like XLogPrefetchBuffer()
+ */
+TransactionId
+heap_compute_xid_horizon_for_tuples(Relation rel,
+ ItemPointerData *tids,
+ int nitems)
+{
+ TransactionId latestRemovedXid = InvalidTransactionId;
+ BlockNumber hblkno;
+ Buffer buf = InvalidBuffer;
+ Page hpage;
+
+ /*
+ * Sort to avoid repeated lookups for the same page, and to make it more
+ * likely to access items in an efficient order. In particular this
+ * ensures thaf if there are multiple pointers to the same page, they all
+ * get processed looking up and locking the page just once.
+ */
+ qsort((void *) tids, nitems, sizeof(ItemPointerData),
+ (int (*) (const void *, const void *)) ItemPointerCompare);
+
+ /* prefetch all pages */
+#ifdef USE_PREFETCH
+ hblkno = InvalidBlockNumber;
+ for (int i = 0; i < nitems; i++)
+ {
+ ItemPointer htid = &tids[i];
+
+ if (hblkno == InvalidBlockNumber ||
+ ItemPointerGetBlockNumber(htid) != hblkno)
+ {
+ hblkno = ItemPointerGetBlockNumber(htid);
+
+ PrefetchBuffer(rel, MAIN_FORKNUM, hblkno);
+ }
+ }
+#endif
+
+ /* Iterate over all tids, and check their horizon */
+ hblkno = InvalidBlockNumber;
+ for (int i = 0; i < nitems; i++)
+ {
+ ItemPointer htid = &tids[i];
+ ItemId hitemid;
+ OffsetNumber hoffnum;
+
+ /*
+ * Read heap buffer, but avoid refetching if it's the same block as
+ * required for the last tid.
+ */
+ if (hblkno == InvalidBlockNumber ||
+ ItemPointerGetBlockNumber(htid) != hblkno)
+ {
+ /* release old buffer */
+ if (BufferIsValid(buf))
+ {
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ ReleaseBuffer(buf);
+ }
+
+ hblkno = ItemPointerGetBlockNumber(htid);
+
+ buf = ReadBuffer(rel, hblkno);
+ hpage = BufferGetPage(buf);
+
+ LockBuffer(buf, BUFFER_LOCK_SHARE);
+ }
+
+ hoffnum = ItemPointerGetOffsetNumber(htid);
+ hitemid = PageGetItemId(hpage, hoffnum);
+
+ /*
+ * Follow any redirections until we find something useful.
+ */
+ while (ItemIdIsRedirected(hitemid))
+ {
+ hoffnum = ItemIdGetRedirect(hitemid);
+ hitemid = PageGetItemId(hpage, hoffnum);
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ /*
+ * If the heap item has storage, then read the header and use that to
+ * set latestRemovedXid.
+ *
+ * Some LP_DEAD items may not be accessible, so we ignore them.
+ */
+ if (ItemIdHasStorage(hitemid))
+ {
+ HeapTupleHeader htuphdr;
+
+ htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
+
+ HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
+ }
+ else if (ItemIdIsDead(hitemid))
+ {
+ /*
+ * Conjecture: if hitemid is dead then it had xids before the xids
+ * marked on LP_NORMAL items. So we just ignore this item and move
+ * onto the next, for the purposes of calculating
+ * latestRemovedxids.
+ */
+ }
+ else
+ Assert(!ItemIdIsUsed(hitemid));
+
+ }
+
+ if (BufferIsValid(buf))
+ {
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ ReleaseBuffer(buf);
+ }
+
+ return latestRemovedXid;
+}
+
/*
* Perform XLogInsert to register a heap cleanup info message. These
* messages are sent once per VACUUM and are required because
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 9d087756879..c4064b7c02e 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -275,6 +275,42 @@ BuildIndexValueDescription(Relation indexRelation,
return buf.data;
}
+/*
+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.
+ */
+TransactionId
+index_compute_xid_horizon_for_tuples(Relation irel,
+ Relation hrel,
+ Buffer ibuf,
+ OffsetNumber *itemnos,
+ int nitems)
+{
+ ItemPointerData *htids = (ItemPointerData *) palloc(sizeof(ItemPointerData) * nitems);
+ TransactionId latestRemovedXid = InvalidTransactionId;
+ Page ipage = BufferGetPage(ibuf);
+ IndexTuple itup;
+
+ /* identify what the index tuples about to be deleted point to */
+ for (int i = 0; i < nitems; i++)
+ {
+ ItemId iitemid;
+
+ iitemid = PageGetItemId(ipage, itemnos[i]);
+ itup = (IndexTuple) PageGetItem(ipage, iitemid);
+
+ ItemPointerCopy(&itup->t_tid, &htids[i]);
+ }
+
+ /* determine the actual xid horizon */
+ latestRemovedXid =
+ heap_compute_xid_horizon_for_tuples(hrel, htids, nitems);
+
+ pfree(htids);
+
+ return latestRemovedXid;
+}
+
/* ----------------------------------------------------------------
* heap-or-index-scan access to system catalogs
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 4082103fe2d..7228c012ad5 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1032,10 +1032,16 @@ _bt_delitems_delete(Relation rel, Buffer buf,
{
Page page = BufferGetPage(buf);
BTPageOpaque opaque;
+ TransactionId latestRemovedXid = InvalidTransactionId;
/* Shouldn't be called unless there's something to do */
Assert(nitems > 0);
+ if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+ latestRemovedXid =
+ index_compute_xid_horizon_for_tuples(rel, heapRel, buf,
+ itemnos, nitems);
+
/* No ereport(ERROR) until changes are logged */
START_CRIT_SECTION();
@@ -1065,6 +1071,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.hnode = heapRel->rd_node;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 67a94cb80a2..052de4b2f3d 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -518,159 +518,6 @@ btree_xlog_vacuum(XLogReaderState *record)
UnlockReleaseBuffer(buffer);
}
-/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. This puts the work for calculating latestRemovedXid
- * into the recovery path rather than the primary path.
- *
- * It's possible that this generates a fair amount of I/O, since an index
- * block may have hundreds of tuples being deleted. Repeat accesses to the
- * same heap blocks are common, though are not yet optimised.
- *
- * XXX optimise later with something like XLogPrefetchBuffer()
- */
-static TransactionId
-btree_xlog_delete_get_latestRemovedXid(XLogReaderState *record)
-{
- xl_btree_delete *xlrec = (xl_btree_delete *) XLogRecGetData(record);
- OffsetNumber *unused;
- Buffer ibuffer,
- hbuffer;
- Page ipage,
- hpage;
- RelFileNode rnode;
- BlockNumber blkno;
- ItemId iitemid,
- hitemid;
- IndexTuple itup;
- HeapTupleHeader htuphdr;
- BlockNumber hblkno;
- OffsetNumber hoffnum;
- TransactionId latestRemovedXid = InvalidTransactionId;
- int i;
-
- /*
- * If there's nothing running on the standby we don't need to derive a
- * full latestRemovedXid value, so use a fast path out of here. This
- * returns InvalidTransactionId, and so will conflict with all HS
- * transactions; but since we just worked out that that's zero people,
- * it's OK.
- *
- * XXX There is a race condition here, which is that a new backend might
- * start just after we look. If so, it cannot need to conflict, but this
- * coding will result in throwing a conflict anyway.
- */
- if (CountDBBackends(InvalidOid) == 0)
- return latestRemovedXid;
-
- /*
- * In what follows, we have to examine the previous state of the index
- * page, as well as the heap page(s) it points to. This is only valid if
- * WAL replay has reached a consistent database state; which means that
- * the preceding check is not just an optimization, but is *necessary*. We
- * won't have let in any user sessions before we reach consistency.
- */
- if (!reachedConsistency)
- elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data");
-
- /*
- * Get index page. If the DB is consistent, this should not fail, nor
- * should any of the heap page fetches below. If one does, we return
- * InvalidTransactionId to cancel all HS transactions. That's probably
- * overkill, but it's safe, and certainly better than panicking here.
- */
- XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
- ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
- if (!BufferIsValid(ibuffer))
- return InvalidTransactionId;
- LockBuffer(ibuffer, BT_READ);
- ipage = (Page) BufferGetPage(ibuffer);
-
- /*
- * Loop through the deleted index items to obtain the TransactionId from
- * the heap items they point to.
- */
- unused = (OffsetNumber *) ((char *) xlrec + SizeOfBtreeDelete);
-
- for (i = 0; i < xlrec->nitems; i++)
- {
- /*
- * Identify the index tuple about to be deleted
- */
- iitemid = PageGetItemId(ipage, unused[i]);
- itup = (IndexTuple) PageGetItem(ipage, iitemid);
-
- /*
- * Locate the heap page that the index tuple points at
- */
- hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
- hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM, hblkno, RBM_NORMAL);
- if (!BufferIsValid(hbuffer))
- {
- UnlockReleaseBuffer(ibuffer);
- return InvalidTransactionId;
- }
- LockBuffer(hbuffer, BT_READ);
- hpage = (Page) BufferGetPage(hbuffer);
-
- /*
- * Look up the heap tuple header that the index tuple points at by
- * using the heap node supplied with the xlrec. We can't use
- * heap_fetch, since it uses ReadBuffer rather than XLogReadBuffer.
- * Note that we are not looking at tuple data here, just headers.
- */
- hoffnum = ItemPointerGetOffsetNumber(&(itup->t_tid));
- hitemid = PageGetItemId(hpage, hoffnum);
-
- /*
- * Follow any redirections until we find something useful.
- */
- while (ItemIdIsRedirected(hitemid))
- {
- hoffnum = ItemIdGetRedirect(hitemid);
- hitemid = PageGetItemId(hpage, hoffnum);
- CHECK_FOR_INTERRUPTS();
- }
-
- /*
- * If the heap item has storage, then read the header and use that to
- * set latestRemovedXid.
- *
- * Some LP_DEAD items may not be accessible, so we ignore them.
- */
- if (ItemIdHasStorage(hitemid))
- {
- htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
-
- HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
- }
- else if (ItemIdIsDead(hitemid))
- {
- /*
- * Conjecture: if hitemid is dead then it had xids before the xids
- * marked on LP_NORMAL items. So we just ignore this item and move
- * onto the next, for the purposes of calculating
- * latestRemovedxids.
- */
- }
- else
- Assert(!ItemIdIsUsed(hitemid));
-
- UnlockReleaseBuffer(hbuffer);
- }
-
- UnlockReleaseBuffer(ibuffer);
-
- /*
- * If all heap tuples were LP_DEAD then we will be returning
- * InvalidTransactionId here, which avoids conflicts. This matches
- * existing logic which assumes that LP_DEAD tuples must already be older
- * than the latestRemovedXid on the cleanup record that set them as
- * LP_DEAD, hence must already have generated a conflict.
- */
- return latestRemovedXid;
-}
-
static void
btree_xlog_delete(XLogReaderState *record)
{
@@ -693,12 +540,11 @@ btree_xlog_delete(XLogReaderState *record)
*/
if (InHotStandby)
{
- TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record);
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
}
/*
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 534fac7bf2f..0318da88bc2 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -186,6 +186,11 @@ extern IndexScanDesc RelationGetIndexScan(Relation indexRelation,
extern void IndexScanEnd(IndexScanDesc scan);
extern char *BuildIndexValueDescription(Relation indexRelation,
Datum *values, bool *isnull);
+extern TransactionId index_compute_xid_horizon_for_tuples(Relation irel,
+ Relation hrel,
+ Buffer ibuf,
+ OffsetNumber *itemnos,
+ int nitems);
/*
* heap-or-index access to system catalogs (in genam.c)
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 527138440b3..d46dc1a85b3 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ TransactionId latestRemovedXid;
RelFileNode hnode;
int ntuples;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 64cfdbd2f06..af8612e625b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -184,6 +184,10 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
extern void heap_sync(Relation relation);
extern void heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot);
+extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
+ ItemPointerData *items,
+ int nitems);
+
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern int heap_page_prune(Relation relation, Buffer buffer,
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 819373031cd..ca2a729169a 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -123,6 +123,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ TransactionId latestRemovedXid;
RelFileNode hnode; /* RelFileNode of the heap the index currently
* points at */
int nitems;
Hi,
On 12/12/2018 21:41, Andres Freund wrote:
I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.
IIRC I was originally advocating adding that xmin announcement to the
standby snapshot message, but this seems better.
If we were to go with this approach, there'd be at least the following
tasks:
- adapt tests from [2]
- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot exists
- fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
above.
- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.
- get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be
optimized (e.g. check wal_level before opening rel etc).Once we have this logic, it can be used to implement something like
failover slots on-top, by having having a mechanism that occasionally
forwards slots on standbys using pg_replication_slot_advance().
Looking at this from the failover slots perspective. Wouldn't blocking
on conflict mean that we stop physical replication on catalog xmin
advance when there is lagging logical replication on primary? It might
not be too big deal as in that use-case it should only happen if
hs_feedback was off at some point, but just wanted to point out this
potential problem.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
While testing� this feature� found that - if lots of insert happened on
the master cluster then pg_recvlogical is not showing the DATA
information� on logical replication slot which created on SLAVE.
Please refer this scenario -
1)
Create a Master cluster with wal_level=logcal and create logical
replication slot -
�SELECT * FROM pg_create_logical_replication_slot('master_slot',
'test_decoding');
2)
Create a Standby� cluster using pg_basebackup ( ./pg_basebackup -D
slave/ -v -R)� and create logical replication slot -
SELECT * FROM pg_create_logical_replication_slot('standby_slot',
'test_decoding');
3)
X terminal - start� pg_recvlogical� , provide port=5555 ( slave
cluster)� and specify slot=standby_slot
./pg_recvlogical -d postgres� -p 5555 -s 1 -F 1� -v --slot=standby_slot�
--start -f -
Y terminal - start� pg_recvlogical� , provide port=5432 ( master
cluster)� and specify slot=master_slot
./pg_recvlogical -d postgres� -p 5432 -s 1 -F 1� -v --slot=master_slot�
--start -f -
Z terminal - run pg_bench� against Master cluster ( ./pg_bench -i -s 10
postgres)
Able to see DATA information on Y terminal� but not on X.
but same able to see by firing this below query on SLAVE cluster -
SELECT * FROM pg_logical_slot_get_changes('standby_slot', NULL, NULL);
Is it expected ?
regards,
tushar
On 12/17/2018 10:46 PM, Petr Jelinek wrote:
Hi,
On 12/12/2018 21:41, Andres Freund wrote:
I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.IIRC I was originally advocating adding that xmin announcement to the
standby snapshot message, but this seems better.If we were to go with this approach, there'd be at least the following
tasks:
- adapt tests from [2]
- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot exists
- fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
above.
- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.
- get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be
optimized (e.g. check wal_level before opening rel etc).Once we have this logic, it can be used to implement something like
failover slots on-top, by having having a mechanism that occasionally
forwards slots on standbys using pg_replication_slot_advance().Looking at this from the failover slots perspective. Wouldn't blocking
on conflict mean that we stop physical replication on catalog xmin
advance when there is lagging logical replication on primary? It might
not be too big deal as in that use-case it should only happen if
hs_feedback was off at some point, but just wanted to point out this
potential problem.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
Hi,
On 2019-03-01 13:33:23 +0530, tushar wrote:
While testing� this feature� found that - if lots of insert happened on the
master cluster then pg_recvlogical is not showing the DATA information� on
logical replication slot which created on SLAVE.Please refer this scenario -
1)
Create a Master cluster with wal_level=logcal and create logical replication
slot -
�SELECT * FROM pg_create_logical_replication_slot('master_slot',
'test_decoding');2)
Create a Standby� cluster using pg_basebackup ( ./pg_basebackup -D slave/ -v
-R)� and create logical replication slot -
SELECT * FROM pg_create_logical_replication_slot('standby_slot',
'test_decoding');
So, if I understand correctly you do *not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?
Thanks,
Andres
On Fri, 14 Dec 2018 at 06:25, Andres Freund <andres@anarazel.de> wrote:
I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).
Attached is a WIP test patch
0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
modified version of Craig Ringer's test cases
(012_logical_decoding_on_replica.pl) that he had attached in [1]/messages/by-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@mail.gmail.com.
Here, I have also attached his original file
(Craigs_012_logical_decoding_on_replica.pl).
Also attached are rebased versions of couple of Andres's implementation patches.
I have added a new test scenario :
DROP TABLE from master *before* the logical records of the table
insertions are retrieved from standby. The logical records should be
successfully retrieved.
Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :
ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
# Failed test 'xmin and catalog_xmin equal after slot drop'
# at t/016_logical_decoding_on_replica.pl line 272.
# got:
# expected: 2584
Other than the above, there is this test scenario which I had to remove :
#########################################################
# Conflict with recovery: xmin cancels decoding session
#########################################################
#
# Start a transaction on the replica then perform work that should cause a
# recovery conflict with it. We'll check to make sure the client gets
# terminated with recovery conflict.
#
# Temporarily disable hs feedback so we can test recovery conflicts.
# It's fine to continue using a physical slot, the xmin should be
# cleared. We only check hot_standby_feedback when establishing
# a new decoding session so this approach circumvents the safeguards
# in place and forces a conflict.
This test starts pg_recvlogical, and expects it to be terminated due
to recovery conflict because hs feedback is disabled.
But that does not happen; instead, pg_recvlogical does not return.
But I am not sure why it does not terminate with Andres's patch; it
was expected to terminate with Craig Ringer's patch.
Further, there are subsequent test scenarios that test pg_recvlogical
with hs_feedback disabled, which I have removed because pg_recvlogical
does not return. I am yet to clearly understand why that happens. I
suspect that is only because hs_feedback is disabled.
Also, the testcases verify pg_controldata's oldestCatalogXmin values,
which are now not present with Andres's patch; so I removed tracking
of oldestCatalogXmin.
[1]: /messages/by-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@mail.gmail.com
Thanks
-Amit Khandekar
Attachments:
0001-Logical-decoding-on-standby_rebased.patchapplication/octet-stream; name=0001-Logical-decoding-on-standby_rebased.patchDownload
From 52a1ff5616f8eaed18db6fe1e44ab44d65d6ffd3 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Tue, 26 Feb 2019 11:18:27 +0530
Subject: [PATCH 1/3] Logical decoding on standby
Andres Freund.
---
src/backend/access/gist/gistxlog.c | 3 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 10 +++--
src/backend/access/heap/heapam.c | 23 +++++++---
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 3 ++
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/replication/logical/logical.c | 2 +
src/backend/replication/slot.c | 71 +++++++++++++++++++++++++++++++
src/backend/storage/ipc/standby.c | 7 ++-
src/backend/utils/cache/lsyscache.c | 16 +++++++
src/include/access/gistxlog.h | 2 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +++-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
22 files changed, 147 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 408bd53..f86ec7c 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -341,7 +341,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index c6d8726..14456fa 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1154,7 +1154,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 970733f..fd75d0e 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -24,7 +25,7 @@
#include "storage/predicate.h"
static void _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
- RelFileNode hnode);
+ Relation heapRel);
/*
* _hash_doinsert() -- Handle insertion of a single index tuple.
@@ -137,7 +138,7 @@ restart_insert:
if (IsBufferCleanupOK(buf))
{
- _hash_vacuum_one_page(rel, metabuf, buf, heapRel->rd_node);
+ _hash_vacuum_one_page(rel, metabuf, buf, heapRel);
if (PageGetFreeSpace(page) >= itemsz)
break; /* OK, now we have enough space */
@@ -336,7 +337,7 @@ _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
static void
_hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
- RelFileNode hnode)
+ Relation heapRel)
{
OffsetNumber deletable[MaxOffsetNumber];
int ndeletable = 0;
@@ -393,7 +394,8 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
- xlrec.hnode = hnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(heapRel);
+ xlrec.hnode = heapRel->rd_node;
xlrec.ntuples = ndeletable;
XLogBeginInsert();
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dc34993..982fdc7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7184,12 +7184,13 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7225,6 +7226,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7275,6 +7277,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7305,7 +7308,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7315,6 +7318,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7735,7 +7739,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7771,7 +7776,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7867,7 +7873,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -8004,7 +8012,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9416c31..affc8d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -449,7 +449,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 9c785bc..674b3f1 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *metad);
@@ -704,6 +705,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1065,6 +1067,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.hnode = heapRel->rd_node;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index b0666b4..30f2e62 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -698,7 +698,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -982,6 +983,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index b9311ce..ef4910f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 71836ee..c66137a 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -913,6 +913,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 6e5bc12..e8b7af4 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,7 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +112,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 33b23b6..d8104aa 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,77 @@ ReplicationSlotReserveWal(void)
}
}
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ NameData slotname;
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_xmin, xid)));
+ }
+
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_catalog_xmin, xid)));
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+
+ if (found_conflict)
+ goto restart;
+}
+
+
/*
* Flush all replication slots to disk.
*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4d10e57..f483d53 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index e88c45d..2441737 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1847,6 +1849,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 5117aab..71b1aa7 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -46,10 +46,10 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
RelFileNode hnode; /* RelFileNode of the heap the index currently
* points at */
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 9cef1b7..455d701 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
RelFileNode hnode;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22cd13c..482c874 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index a605851..23f950f 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -123,6 +123,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
RelFileNode hnode; /* RelFileNode of the heap the index currently
* points at */
int nitems;
@@ -137,6 +138,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 6527fc9..50f334a 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8f1d66..4e0776a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 346a310..27c09d1 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool catalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 16b0b1d..3337d7d 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -129,6 +129,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
--
2.1.4
0002-Move-latestRemovedXid-computation-for-nbtree-xlog-rebased.patchapplication/octet-stream; name=0002-Move-latestRemovedXid-computation-for-nbtree-xlog-rebased.patchDownload
From ca7a7c51f8b4302932d805f477b7f134bda40a9d Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Tue, 26 Feb 2019 11:36:29 +0530
Subject: [PATCH 2/3] Move latestRemovedXid computation for nbtree xlog record
to primary.
Andres Freund.
---
src/backend/access/hash/hash_xlog.c | 153 +---------------------------------
src/backend/access/hash/hashinsert.c | 19 +++--
src/backend/access/heap/heapam.c | 129 +++++++++++++++++++++++++++++
src/backend/access/index/genam.c | 36 ++++++++
src/backend/access/nbtree/nbtpage.c | 7 ++
src/backend/access/nbtree/nbtxlog.c | 156 +----------------------------------
src/include/access/genam.h | 5 ++
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam.h | 4 +
src/include/access/nbtxlog.h | 1 +
10 files changed, 197 insertions(+), 314 deletions(-)
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index 14456fa..3af5050 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -970,155 +970,6 @@ hash_xlog_update_meta_page(XLogReaderState *record)
}
/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. See also btree_xlog_delete_get_latestRemovedXid,
- * on which this function is based.
- */
-static TransactionId
-hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
-{
- xl_hash_vacuum_one_page *xlrec;
- OffsetNumber *unused;
- Buffer ibuffer,
- hbuffer;
- Page ipage,
- hpage;
- RelFileNode rnode;
- BlockNumber blkno;
- ItemId iitemid,
- hitemid;
- IndexTuple itup;
- HeapTupleHeader htuphdr;
- BlockNumber hblkno;
- OffsetNumber hoffnum;
- TransactionId latestRemovedXid = InvalidTransactionId;
- int i;
-
- xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
-
- /*
- * If there's nothing running on the standby we don't need to derive a
- * full latestRemovedXid value, so use a fast path out of here. This
- * returns InvalidTransactionId, and so will conflict with all HS
- * transactions; but since we just worked out that that's zero people,
- * it's OK.
- *
- * XXX There is a race condition here, which is that a new backend might
- * start just after we look. If so, it cannot need to conflict, but this
- * coding will result in throwing a conflict anyway.
- */
- if (CountDBBackends(InvalidOid) == 0)
- return latestRemovedXid;
-
- /*
- * Check if WAL replay has reached a consistent database state. If not, we
- * must PANIC. See the definition of
- * btree_xlog_delete_get_latestRemovedXid for more details.
- */
- if (!reachedConsistency)
- elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
-
- /*
- * Get index page. If the DB is consistent, this should not fail, nor
- * should any of the heap page fetches below. If one does, we return
- * InvalidTransactionId to cancel all HS transactions. That's probably
- * overkill, but it's safe, and certainly better than panicking here.
- */
- XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
- ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
-
- if (!BufferIsValid(ibuffer))
- return InvalidTransactionId;
- LockBuffer(ibuffer, HASH_READ);
- ipage = (Page) BufferGetPage(ibuffer);
-
- /*
- * Loop through the deleted index items to obtain the TransactionId from
- * the heap items they point to.
- */
- unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
-
- for (i = 0; i < xlrec->ntuples; i++)
- {
- /*
- * Identify the index tuple about to be deleted.
- */
- iitemid = PageGetItemId(ipage, unused[i]);
- itup = (IndexTuple) PageGetItem(ipage, iitemid);
-
- /*
- * Locate the heap page that the index tuple points at
- */
- hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
- hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM,
- hblkno, RBM_NORMAL);
-
- if (!BufferIsValid(hbuffer))
- {
- UnlockReleaseBuffer(ibuffer);
- return InvalidTransactionId;
- }
- LockBuffer(hbuffer, HASH_READ);
- hpage = (Page) BufferGetPage(hbuffer);
-
- /*
- * Look up the heap tuple header that the index tuple points at by
- * using the heap node supplied with the xlrec. We can't use
- * heap_fetch, since it uses ReadBuffer rather than XLogReadBuffer.
- * Note that we are not looking at tuple data here, just headers.
- */
- hoffnum = ItemPointerGetOffsetNumber(&(itup->t_tid));
- hitemid = PageGetItemId(hpage, hoffnum);
-
- /*
- * Follow any redirections until we find something useful.
- */
- while (ItemIdIsRedirected(hitemid))
- {
- hoffnum = ItemIdGetRedirect(hitemid);
- hitemid = PageGetItemId(hpage, hoffnum);
- CHECK_FOR_INTERRUPTS();
- }
-
- /*
- * If the heap item has storage, then read the header and use that to
- * set latestRemovedXid.
- *
- * Some LP_DEAD items may not be accessible, so we ignore them.
- */
- if (ItemIdHasStorage(hitemid))
- {
- htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
- HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
- }
- else if (ItemIdIsDead(hitemid))
- {
- /*
- * Conjecture: if hitemid is dead then it had xids before the xids
- * marked on LP_NORMAL items. So we just ignore this item and move
- * onto the next, for the purposes of calculating
- * latestRemovedxids.
- */
- }
- else
- Assert(!ItemIdIsUsed(hitemid));
-
- UnlockReleaseBuffer(hbuffer);
- }
-
- UnlockReleaseBuffer(ibuffer);
-
- /*
- * If all heap tuples were LP_DEAD then we will be returning
- * InvalidTransactionId here, which avoids conflicts. This matches
- * existing logic which assumes that LP_DEAD tuples must already be older
- * than the latestRemovedXid on the cleanup record that set them as
- * LP_DEAD, hence must already have generated a conflict.
- */
- return latestRemovedXid;
-}
-
-/*
* replay delete operation in hash index to remove
* tuples marked as DEAD during index tuple insertion.
*/
@@ -1149,12 +1000,10 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
*/
if (InHotStandby)
{
- TransactionId latestRemovedXid =
- hash_xlog_vacuum_get_latestRemovedXid(record);
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
xldata->onCatalogTable, rnode);
}
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index fd75d0e..88e2b3d 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -24,8 +24,8 @@
#include "storage/buf_internals.h"
#include "storage/predicate.h"
-static void _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
- Relation heapRel);
+static void _hash_vacuum_one_page(Relation rel, Relation hrel,
+ Buffer metabuf, Buffer buf);
/*
* _hash_doinsert() -- Handle insertion of a single index tuple.
@@ -138,7 +138,7 @@ restart_insert:
if (IsBufferCleanupOK(buf))
{
- _hash_vacuum_one_page(rel, metabuf, buf, heapRel);
+ _hash_vacuum_one_page(rel, heapRel, metabuf, buf);
if (PageGetFreeSpace(page) >= itemsz)
break; /* OK, now we have enough space */
@@ -336,8 +336,8 @@ _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
*/
static void
-_hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
- Relation heapRel)
+_hash_vacuum_one_page(Relation rel, Relation hrel,
+ Buffer metabuf, Buffer buf)
{
OffsetNumber deletable[MaxOffsetNumber];
int ndeletable = 0;
@@ -361,6 +361,10 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
if (ndeletable > 0)
{
+ TransactionId latestRemovedXid;
+
+ latestRemovedXid = index_compute_xid_horizon_for_tuples(rel, hrel, buf, deletable, ndeletable);
+
/*
* Write-lock the meta page so that we can decrement tuple count.
*/
@@ -394,8 +398,9 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
- xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(heapRel);
- xlrec.hnode = heapRel->rd_node;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
+ xlrec.latestRemovedXid = latestRemovedXid;
+ xlrec.hnode = hrel->rd_node;
xlrec.ntuples = ndeletable;
XLogBeginInsert();
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 982fdc7..c686b80 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7178,6 +7178,135 @@ HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
}
/*
+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.
+ *
+ * This puts the work for calculating latestRemovedXid into the recovery path
+ * rather than the primary path.
+ *
+ * It's possible that this generates a fair amount of I/O, since an index
+ * block may have hundreds of tuples being deleted. Repeat accesses to the
+ * same heap blocks are common, though are not yet optimised.
+ *
+ * XXX optimise later with something like XLogPrefetchBuffer()
+ */
+TransactionId
+heap_compute_xid_horizon_for_tuples(Relation rel,
+ ItemPointerData *tids,
+ int nitems)
+{
+ TransactionId latestRemovedXid = InvalidTransactionId;
+ BlockNumber hblkno;
+ Buffer buf = InvalidBuffer;
+ Page hpage;
+
+ /*
+ * Sort to avoid repeated lookups for the same page, and to make it more
+ * likely to access items in an efficient order. In particular this
+ * ensures thaf if there are multiple pointers to the same page, they all
+ * get processed looking up and locking the page just once.
+ */
+ qsort((void *) tids, nitems, sizeof(ItemPointerData),
+ (int (*) (const void *, const void *)) ItemPointerCompare);
+
+ /* prefetch all pages */
+#ifdef USE_PREFETCH
+ hblkno = InvalidBlockNumber;
+ for (int i = 0; i < nitems; i++)
+ {
+ ItemPointer htid = &tids[i];
+
+ if (hblkno == InvalidBlockNumber ||
+ ItemPointerGetBlockNumber(htid) != hblkno)
+ {
+ hblkno = ItemPointerGetBlockNumber(htid);
+
+ PrefetchBuffer(rel, MAIN_FORKNUM, hblkno);
+ }
+ }
+#endif
+
+ /* Iterate over all tids, and check their horizon */
+ hblkno = InvalidBlockNumber;
+ for (int i = 0; i < nitems; i++)
+ {
+ ItemPointer htid = &tids[i];
+ ItemId hitemid;
+ OffsetNumber hoffnum;
+
+ /*
+ * Read heap buffer, but avoid refetching if it's the same block as
+ * required for the last tid.
+ */
+ if (hblkno == InvalidBlockNumber ||
+ ItemPointerGetBlockNumber(htid) != hblkno)
+ {
+ /* release old buffer */
+ if (BufferIsValid(buf))
+ {
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ ReleaseBuffer(buf);
+ }
+
+ hblkno = ItemPointerGetBlockNumber(htid);
+
+ buf = ReadBuffer(rel, hblkno);
+ hpage = BufferGetPage(buf);
+
+ LockBuffer(buf, BUFFER_LOCK_SHARE);
+ }
+
+ hoffnum = ItemPointerGetOffsetNumber(htid);
+ hitemid = PageGetItemId(hpage, hoffnum);
+
+ /*
+ * Follow any redirections until we find something useful.
+ */
+ while (ItemIdIsRedirected(hitemid))
+ {
+ hoffnum = ItemIdGetRedirect(hitemid);
+ hitemid = PageGetItemId(hpage, hoffnum);
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ /*
+ * If the heap item has storage, then read the header and use that to
+ * set latestRemovedXid.
+ *
+ * Some LP_DEAD items may not be accessible, so we ignore them.
+ */
+ if (ItemIdHasStorage(hitemid))
+ {
+ HeapTupleHeader htuphdr;
+
+ htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
+
+ HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
+ }
+ else if (ItemIdIsDead(hitemid))
+ {
+ /*
+ * Conjecture: if hitemid is dead then it had xids before the xids
+ * marked on LP_NORMAL items. So we just ignore this item and move
+ * onto the next, for the purposes of calculating
+ * latestRemovedxids.
+ */
+ }
+ else
+ Assert(!ItemIdIsUsed(hitemid));
+
+ }
+
+ if (BufferIsValid(buf))
+ {
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ ReleaseBuffer(buf);
+ }
+
+ return latestRemovedXid;
+}
+
+/*
* Perform XLogInsert to register a heap cleanup info message. These
* messages are sent once per VACUUM and are required because
* of the phasing of removal operations during a lazy VACUUM.
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index e0a5ea4..c425ebe 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -276,6 +276,42 @@ BuildIndexValueDescription(Relation indexRelation,
return buf.data;
}
+/*
+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.
+ */
+TransactionId
+index_compute_xid_horizon_for_tuples(Relation irel,
+ Relation hrel,
+ Buffer ibuf,
+ OffsetNumber *itemnos,
+ int nitems)
+{
+ ItemPointerData *htids = (ItemPointerData *) palloc(sizeof(ItemPointerData) * nitems);
+ TransactionId latestRemovedXid = InvalidTransactionId;
+ Page ipage = BufferGetPage(ibuf);
+ IndexTuple itup;
+
+ /* identify what the index tuples about to be deleted point to */
+ for (int i = 0; i < nitems; i++)
+ {
+ ItemId iitemid;
+
+ iitemid = PageGetItemId(ipage, itemnos[i]);
+ itup = (IndexTuple) PageGetItem(ipage, iitemid);
+
+ ItemPointerCopy(&itup->t_tid, &htids[i]);
+ }
+
+ /* determine the actual xid horizon */
+ latestRemovedXid =
+ heap_compute_xid_horizon_for_tuples(hrel, htids, nitems);
+
+ pfree(htids);
+
+ return latestRemovedXid;
+}
+
/* ----------------------------------------------------------------
* heap-or-index-scan access to system catalogs
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 674b3f1..b917f06 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1034,10 +1034,16 @@ _bt_delitems_delete(Relation rel, Buffer buf,
{
Page page = BufferGetPage(buf);
BTPageOpaque opaque;
+ TransactionId latestRemovedXid = InvalidTransactionId;
/* Shouldn't be called unless there's something to do */
Assert(nitems > 0);
+ if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+ latestRemovedXid =
+ index_compute_xid_horizon_for_tuples(rel, heapRel, buf,
+ itemnos, nitems);
+
/* No ereport(ERROR) until changes are logged */
START_CRIT_SECTION();
@@ -1068,6 +1074,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
xl_btree_delete xlrec_delete;
xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
+ xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.hnode = heapRel->rd_node;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 30f2e62..a8805d1 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -518,159 +518,6 @@ btree_xlog_vacuum(XLogReaderState *record)
UnlockReleaseBuffer(buffer);
}
-/*
- * Get the latestRemovedXid from the heap pages pointed at by the index
- * tuples being deleted. This puts the work for calculating latestRemovedXid
- * into the recovery path rather than the primary path.
- *
- * It's possible that this generates a fair amount of I/O, since an index
- * block may have hundreds of tuples being deleted. Repeat accesses to the
- * same heap blocks are common, though are not yet optimised.
- *
- * XXX optimise later with something like XLogPrefetchBuffer()
- */
-static TransactionId
-btree_xlog_delete_get_latestRemovedXid(XLogReaderState *record)
-{
- xl_btree_delete *xlrec = (xl_btree_delete *) XLogRecGetData(record);
- OffsetNumber *unused;
- Buffer ibuffer,
- hbuffer;
- Page ipage,
- hpage;
- RelFileNode rnode;
- BlockNumber blkno;
- ItemId iitemid,
- hitemid;
- IndexTuple itup;
- HeapTupleHeader htuphdr;
- BlockNumber hblkno;
- OffsetNumber hoffnum;
- TransactionId latestRemovedXid = InvalidTransactionId;
- int i;
-
- /*
- * If there's nothing running on the standby we don't need to derive a
- * full latestRemovedXid value, so use a fast path out of here. This
- * returns InvalidTransactionId, and so will conflict with all HS
- * transactions; but since we just worked out that that's zero people,
- * it's OK.
- *
- * XXX There is a race condition here, which is that a new backend might
- * start just after we look. If so, it cannot need to conflict, but this
- * coding will result in throwing a conflict anyway.
- */
- if (CountDBBackends(InvalidOid) == 0)
- return latestRemovedXid;
-
- /*
- * In what follows, we have to examine the previous state of the index
- * page, as well as the heap page(s) it points to. This is only valid if
- * WAL replay has reached a consistent database state; which means that
- * the preceding check is not just an optimization, but is *necessary*. We
- * won't have let in any user sessions before we reach consistency.
- */
- if (!reachedConsistency)
- elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data");
-
- /*
- * Get index page. If the DB is consistent, this should not fail, nor
- * should any of the heap page fetches below. If one does, we return
- * InvalidTransactionId to cancel all HS transactions. That's probably
- * overkill, but it's safe, and certainly better than panicking here.
- */
- XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
- ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
- if (!BufferIsValid(ibuffer))
- return InvalidTransactionId;
- LockBuffer(ibuffer, BT_READ);
- ipage = (Page) BufferGetPage(ibuffer);
-
- /*
- * Loop through the deleted index items to obtain the TransactionId from
- * the heap items they point to.
- */
- unused = (OffsetNumber *) ((char *) xlrec + SizeOfBtreeDelete);
-
- for (i = 0; i < xlrec->nitems; i++)
- {
- /*
- * Identify the index tuple about to be deleted
- */
- iitemid = PageGetItemId(ipage, unused[i]);
- itup = (IndexTuple) PageGetItem(ipage, iitemid);
-
- /*
- * Locate the heap page that the index tuple points at
- */
- hblkno = ItemPointerGetBlockNumber(&(itup->t_tid));
- hbuffer = XLogReadBufferExtended(xlrec->hnode, MAIN_FORKNUM, hblkno, RBM_NORMAL);
- if (!BufferIsValid(hbuffer))
- {
- UnlockReleaseBuffer(ibuffer);
- return InvalidTransactionId;
- }
- LockBuffer(hbuffer, BT_READ);
- hpage = (Page) BufferGetPage(hbuffer);
-
- /*
- * Look up the heap tuple header that the index tuple points at by
- * using the heap node supplied with the xlrec. We can't use
- * heap_fetch, since it uses ReadBuffer rather than XLogReadBuffer.
- * Note that we are not looking at tuple data here, just headers.
- */
- hoffnum = ItemPointerGetOffsetNumber(&(itup->t_tid));
- hitemid = PageGetItemId(hpage, hoffnum);
-
- /*
- * Follow any redirections until we find something useful.
- */
- while (ItemIdIsRedirected(hitemid))
- {
- hoffnum = ItemIdGetRedirect(hitemid);
- hitemid = PageGetItemId(hpage, hoffnum);
- CHECK_FOR_INTERRUPTS();
- }
-
- /*
- * If the heap item has storage, then read the header and use that to
- * set latestRemovedXid.
- *
- * Some LP_DEAD items may not be accessible, so we ignore them.
- */
- if (ItemIdHasStorage(hitemid))
- {
- htuphdr = (HeapTupleHeader) PageGetItem(hpage, hitemid);
-
- HeapTupleHeaderAdvanceLatestRemovedXid(htuphdr, &latestRemovedXid);
- }
- else if (ItemIdIsDead(hitemid))
- {
- /*
- * Conjecture: if hitemid is dead then it had xids before the xids
- * marked on LP_NORMAL items. So we just ignore this item and move
- * onto the next, for the purposes of calculating
- * latestRemovedxids.
- */
- }
- else
- Assert(!ItemIdIsUsed(hitemid));
-
- UnlockReleaseBuffer(hbuffer);
- }
-
- UnlockReleaseBuffer(ibuffer);
-
- /*
- * If all heap tuples were LP_DEAD then we will be returning
- * InvalidTransactionId here, which avoids conflicts. This matches
- * existing logic which assumes that LP_DEAD tuples must already be older
- * than the latestRemovedXid on the cleanup record that set them as
- * LP_DEAD, hence must already have generated a conflict.
- */
- return latestRemovedXid;
-}
-
static void
btree_xlog_delete(XLogReaderState *record)
{
@@ -693,12 +540,11 @@ btree_xlog_delete(XLogReaderState *record)
*/
if (InHotStandby)
{
- TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record);
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
xlrec->onCatalogTable, rnode);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index c4aba39..6176079 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -186,6 +186,11 @@ extern IndexScanDesc RelationGetIndexScan(Relation indexRelation,
extern void IndexScanEnd(IndexScanDesc scan);
extern char *BuildIndexValueDescription(Relation indexRelation,
Datum *values, bool *isnull);
+extern TransactionId index_compute_xid_horizon_for_tuples(Relation irel,
+ Relation hrel,
+ Buffer ibuf,
+ OffsetNumber *itemnos,
+ int nitems);
/*
* heap-or-index access to system catalogs (in genam.c)
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 455d701..4e3e908 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -264,6 +264,7 @@ typedef struct xl_hash_init_bitmap_page
typedef struct xl_hash_vacuum_one_page
{
bool onCatalogTable;
+ TransactionId latestRemovedXid;
RelFileNode hnode;
int ntuples;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ab08791..2f05b93 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -166,6 +166,10 @@ extern void simple_heap_update(Relation relation, ItemPointer otid,
extern void heap_sync(Relation relation);
extern void heap_update_snapshot(HeapScanDesc scan, Snapshot snapshot);
+extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
+ ItemPointerData *items,
+ int nitems);
+
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern int heap_page_prune(Relation relation, Buffer buffer,
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 23f950f..aa5f1e2 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -124,6 +124,7 @@ typedef struct xl_btree_split
typedef struct xl_btree_delete
{
bool onCatalogTable;
+ TransactionId latestRemovedXid;
RelFileNode hnode; /* RelFileNode of the heap the index currently
* points at */
int nitems;
--
2.1.4
0003-WIP-TAP-test-for-logical-decoding-on-standby.patchapplication/octet-stream; name=0003-WIP-TAP-test-for-logical-decoding-on-standby.patchDownload
From 7f6995a26b15ffd5220536cee023ad7472d7e6cb Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Mon, 4 Mar 2019 12:22:50 +0530
Subject: [PATCH 3/3] New TAP test for logical decoding on standby.
new file: recovery/t/016_logical_decoding_on_replica.pl
Tests originally written by Craig Ringer, with some WIP changes
from Amit Khandekar.
---
.../recovery/t/016_logical_decoding_on_replica.pl | 358 +++++++++++++++++++++
1 file changed, 358 insertions(+)
create mode 100644 src/test/recovery/t/016_logical_decoding_on_replica.pl
diff --git a/src/test/recovery/t/016_logical_decoding_on_replica.pl b/src/test/recovery/t/016_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..8cc029b
--- /dev/null
+++ b/src/test/recovery/t/016_logical_decoding_on_replica.pl
@@ -0,0 +1,358 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 52;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+sleep(2); # ensure walreceiver feedback sent
+
+# If no slot on standby exists to hold down catalog_xmin it must follow xmin,
+# (which is nextXid when no xacts are running on the standby).
+($xmin, $catalog_xmin) = print_phys_xmin();
+ok($xmin, "xmin not null");
+is($xmin, $catalog_xmin, "xmin and catalog_xmin equal");
+
+# We need catalog_xmin advance to take effect on the master and be replayed
+# on standby.
+$node_master->safe_psql('postgres', 'CHECKPOINT');
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now the standby's slot
+# doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream catalog retention
+#########################################################
+
+sub test_catalog_xmin_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $oldestCatalogXmin, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestCatalogXmin:\s*(\d+)/)
+ {
+ $oldestCatalogXmin = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, oldestCatalogXmin $oldestCatalogXmin, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid, $oldestCatalogXmin);
+}
+
+my ($oldestXid, $oldestCatalogXmin) = test_catalog_xmin_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_xmin, $new_catalog_xmin) = print_phys_xmin();
+# We're now back to the old behaviour of hot_standby_feedback
+# reporting nextXid for both thresholds
+ok($new_catalog_xmin, "physical catalog_xmin still non-null");
+cmp_ok($new_catalog_xmin, '==', $new_xmin,
+ 'xmin and catalog_xmin equal after slot drop');
+
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb'), '-P', 'test_decoding', '-S', 'dodropslot', '--create-slot'], 'pg_recvlogical created dodropslot');
+# or BAIL_OUT('slot creation failed, subsequent results would be meaningless');
+# TODO : Above, it bails out even when pg_recvlogical is successful, commented out BAIL_OUT
+$node_replica->command_ok(['pg_recvlogical', '-v', '-d', $node_replica->connstr('postgres'), '-P', 'test_decoding', '-S', 'otherslot', '--create-slot'], 'pg_recvlogical created otherslot');
+# or BAIL_OUT('slot creation failed, subsequent results would be meaningless');
+# TODO : Above, it bails out even when pg_recvlogical is successful, commented out BAIL_OUT
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On 03/01/2019 11:16 PM, Andres Freund wrote:
So, if I understand correctly you do*not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?
Ok, This time around - I performed like this -
.)Master cluster (set wal_level=logical and hot_standby_feedback=on in
postgresql.conf) , start the server and create a physical replication slot
postgres=# SELECT * FROM
pg_create_physical_replication_slot('decoding_standby');
slot_name | lsn
------------------+-----
decoding_standby |
(1 row)
.)Perform pg_basebackup using --slot=decoding_standby with option -R .
modify port=5555 , start the server
.)Connect to slave and create a logical replication slot
postgres=# create table t(n int);
ERROR: cannot execute CREATE TABLE in a read-only transaction
postgres=#
postgres=# SELECT * FROM
pg_create_logical_replication_slot('standby_slot', 'test_decoding');
slot_name | lsn
--------------+-----------
standby_slot | 0/2000060
(1 row)
run pgbench (./pgbench -i -s 10 postgres) against master and
simultaneously- start pg_recvlogical , provide port=5555 ( slave
cluster) and specify slot=standby_slot
./pg_recvlogical -d postgres -p 5555 -s 1 -F 1 -v --slot=standby_slot
--start -f -
[centos@centos-cpula bin]$ ./pg_recvlogical -d postgres -p 5555 -s 1 -F
1 -v --slot=standby_slot --start -f -
pg_recvlogical: starting log streaming at 0/0 (slot standby_slot)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot standby_slot)
pg_recvlogical: confirming write up to 0/30194E8, flush to 0/30194E8
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/301D558, flush to 0/301D558
(slot standby_slot)
BEGIN 476
COMMIT 476
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
BEGIN 477
COMMIT 477
If we do the same for the logical replication slot which created on
Master cluster
[centos@centos-cpula bin]$ ./pg_recvlogical -d postgres -s 1 -F 1 -v
--slot=master_slot --start -f -
pg_recvlogical: starting log streaming at 0/0 (slot master_slot)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot master_slot)
table public.pgbench_accounts: INSERT: aid[integer]:65057 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65058 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65059 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65060 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65061 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65062 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65063 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65064 bid[integer]:1
abalance[integer]:0 filler[character]:' '
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On 03/04/2019 04:54 PM, tushar wrote:
.)Perform pg_basebackup using --slot=decoding_standby with option -R
. modify port=5555 , start the server
set primary_slot_name = 'decoding_standby' in the postgresql.conf file
of slave.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
Hi,
On 2019-03-04 16:54:32 +0530, tushar wrote:
On 03/01/2019 11:16 PM, Andres Freund wrote:
So, if I understand correctly you do*not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?Ok, This time around� - I performed like this -
.)Master cluster (set wal_level=logical and hot_standby_feedback=on in
postgresql.conf) , start the server and create a physical replication slot
Note that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).
Thanks,
Andres
On 03/04/2019 10:57 PM, Andres Freund wrote:
Note that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).
Right, This parameter was enabled on both Master and slave.
Is someone able to reproduce this issue ?
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
There is an another issue , where i am getting error while executing
"pg_logical_slot_get_changes" on SLAVE
Master (running on port=5432) - run "make installcheck" after setting
PATH=<installation/bin:$PATH ) and export PGDATABASE=postgres from
regress/ folder
Slave (running on port=5555) - Connect to regression database and
select pg_logical_slot_get_changes
[centos@mail-arts bin]$ ./psql postgres -p 5555 -f t.sql
You are now connected to database "regression" as user "centos".
slot_name | lsn
-----------+-----------
m61 | 1/D437AD8
(1 row)
psql:t.sql:3: ERROR: could not resolve cmin/cmax of catalog tuple
[centos@mail-arts bin]$ cat t.sql
\c regression
SELECT * from pg_create_logical_replication_slot('m61', 'test_decoding');
select * from pg_logical_slot_get_changes('m61',null,null);
regards,
On 03/04/2019 10:57 PM, Andres Freund wrote:
Hi,
On 2019-03-04 16:54:32 +0530, tushar wrote:
On 03/01/2019 11:16 PM, Andres Freund wrote:
So, if I understand correctly you do*not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?Ok, This time around - I performed like this -
.)Master cluster (set wal_level=logical and hot_standby_feedback=on in
postgresql.conf) , start the server and create a physical replication slotNote that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).Thanks,
Andres
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Mon, 4 Mar 2019 at 14:09, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Fri, 14 Dec 2018 at 06:25, Andres Freund <andres@anarazel.de> wrote:
I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).Attached is a WIP test patch
0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
modified version of Craig Ringer's test cases
Hi Andres,
I am trying to come up with new testcases to test the recovery
conflict handling. Before that I have some queries :
With Craig Ringer's approach, the way to reproduce the recovery
conflict was, I believe, easy : Do a checkpoint, which will log the
global-catalog-xmin-advance WAL record, due to which the standby -
while replaying the message - may find out that it's a recovery
conflict. But with your approach, the latestRemovedXid is passed only
during specific vacuum-related WAL records, so to reproduce the
recovery conflict error, we need to make sure some specific WAL
records are logged, such as XLOG_BTREE_DELETE. So we need to create a
testcase such that while creating an index tuple, it erases dead
tuples from a page, so that it eventually calls
_bt_vacuum_one_page()=>_bt_delitems_delete(), thus logging a
XLOG_BTREE_DELETE record.
I tried to come up with this reproducible testcase without success.
This seems difficult. Do you have an easier option ? May be we can use
some other WAL records that may have easier more reliable test case
for showing up recovery conflict ?
Further, with your patch, in ResolveRecoveryConflictWithSlots(), it
just throws a WARNING error level; so the wal receiver would not make
the backends throw an error; hence the test case won't catch the
error. Is that right ?
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi ,
I am getting a server crash on standby while executing
pg_logical_slot_get_changes function , please refer this scenario
Master cluster( ./initdb -D master)
set wal_level='hot_standby in master/postgresql.conf file
start the server , connect to psql terminal and create a physical
replication slot ( SELECT * from
pg_create_physical_replication_slot('p1');)
perform pg_basebackup using --slot 'p1' (./pg_basebackup -D slave/ -R
--slot p1 -v))
set wal_level='logical' , hot_standby_feedback=on,
primary_slot_name='p1' in slave/postgresql.conf file
start the server , connect to psql terminal and create a logical
replication slot ( SELECT * from
pg_create_logical_replication_slot('t','test_decoding');)
run pgbench ( ./pgbench -i -s 10 postgres) on master and select
pg_logical_slot_get_changes on Slave database
postgres=# select * from pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.274 IST [26817] LOG: starting logical decoding for
slot "t"
2019-03-13 20:34:50.274 IST [26817] DETAIL: Streaming transactions
committing after 0/6C000060, reading WAL from 0/6C000028.
2019-03-13 20:34:50.274 IST [26817] STATEMENT: select * from
pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.275 IST [26817] LOG: logical decoding found
consistent point at 0/6C000028
2019-03-13 20:34:50.275 IST [26817] DETAIL: There are no running
transactions.
2019-03-13 20:34:50.275 IST [26817] STATEMENT: select * from
pg_logical_slot_get_changes('t',null,null);
TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File:
"decode.c", Line: 977)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-03-13
20:34:50.276 IST [26809] LOG: server process (PID 26817) was terminated
by signal 6: Aborted
Stack trace -
(gdb) bt
#0 0x00007f370e673277 in raise () from /lib64/libc.so.6
#1 0x00007f370e674968 in abort () from /lib64/libc.so.6
#2 0x0000000000a30edf in ExceptionalCondition (conditionName=0xc36090
"!(data == tupledata + tuplelen)", errorType=0xc35f5c "FailedAssertion",
fileName=0xc35d70 "decode.c",
lineNumber=977) at assert.c:54
#3 0x0000000000843c6f in DecodeMultiInsert (ctx=0x2ba1ac8,
buf=0x7ffd7a5136d0) at decode.c:977
#4 0x0000000000842b32 in DecodeHeap2Op (ctx=0x2ba1ac8,
buf=0x7ffd7a5136d0) at decode.c:375
#5 0x00000000008424dd in LogicalDecodingProcessRecord (ctx=0x2ba1ac8,
record=0x2ba1d88) at decode.c:125
#6 0x000000000084830d in pg_logical_slot_get_changes_guts
(fcinfo=0x2b95838, confirm=true, binary=false) at logicalfuncs.c:307
#7 0x000000000084846a in pg_logical_slot_get_changes (fcinfo=0x2b95838)
at logicalfuncs.c:376
#8 0x00000000006e5b9f in ExecMakeTableFunctionResult
(setexpr=0x2b93ee8, econtext=0x2b93d98, argContext=0x2b99940,
expectedDesc=0x2b97970, randomAccess=false) at execSRF.c:233
#9 0x00000000006fb738 in FunctionNext (node=0x2b93c80) at
nodeFunctionscan.c:94
#10 0x00000000006e52b1 in ExecScanFetch (node=0x2b93c80,
accessMtd=0x6fb67b <FunctionNext>, recheckMtd=0x6fba77
<FunctionRecheck>) at execScan.c:93
#11 0x00000000006e5326 in ExecScan (node=0x2b93c80, accessMtd=0x6fb67b
<FunctionNext>, recheckMtd=0x6fba77 <FunctionRecheck>) at execScan.c:143
#12 0x00000000006fbac1 in ExecFunctionScan (pstate=0x2b93c80) at
nodeFunctionscan.c:270
#13 0x00000000006e3293 in ExecProcNodeFirst (node=0x2b93c80) at
execProcnode.c:445
#14 0x00000000006d8253 in ExecProcNode (node=0x2b93c80) at
../../../src/include/executor/executor.h:241
#15 0x00000000006daa4e in ExecutePlan (estate=0x2b93a28,
planstate=0x2b93c80, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true, numberTuples=0,
direction=ForwardScanDirection, dest=0x2b907e0, execute_once=true)
at execMain.c:1643
#16 0x00000000006d8865 in standard_ExecutorRun (queryDesc=0x2afff28,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:362
#17 0x00000000006d869b in ExecutorRun (queryDesc=0x2afff28,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:306
#18 0x00000000008ccef1 in PortalRunSelect (portal=0x2b36168,
forward=true, count=0, dest=0x2b907e0) at pquery.c:929
#19 0x00000000008ccb90 in PortalRun (portal=0x2b36168,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x2b907e0, altdest=0x2b907e0, completionTag=0x7ffd7a513e90 "")
at pquery.c:770
#20 0x00000000008c6b58 in exec_simple_query (query_string=0x2adc1e8
"select * from pg_logical_slot_get_changes('t',null,null);") at
postgres.c:1215
#21 0x00000000008cae88 in PostgresMain (argc=1, argv=0x2b06590,
dbname=0x2b063d0 "postgres", username=0x2ad8da8 "centos") at postgres.c:4256
#22 0x0000000000828464 in BackendRun (port=0x2afe3b0) at postmaster.c:4399
#23 0x0000000000827c42 in BackendStartup (port=0x2afe3b0) at
postmaster.c:4090
#24 0x0000000000824036 in ServerLoop () at postmaster.c:1703
#25 0x00000000008238ec in PostmasterMain (argc=3, argv=0x2ad6d00) at
postmaster.c:1376
#26 0x0000000000748aab in main (argc=3, argv=0x2ad6d00) at main.c:228
(gdb)
regards,
On 03/07/2019 09:03 PM, tushar wrote:
There is an another issue , where i am getting error while executing
"pg_logical_slot_get_changes" on SLAVEMaster (running on port=5432) - run "make installcheck" after
setting PATH=<installation/bin:$PATH ) and export
PGDATABASE=postgres from regress/ folder
Slave (running on port=5555) - Connect to regression database and
select pg_logical_slot_get_changes[centos@mail-arts bin]$ ./psql postgres -p 5555 -f t.sql
You are now connected to database "regression" as user "centos".
slot_name | lsn
-----------+-----------
m61 | 1/D437AD8
(1 row)psql:t.sql:3: ERROR: could not resolve cmin/cmax of catalog tuple
[centos@mail-arts bin]$ cat t.sql
\c regression
SELECT * from pg_create_logical_replication_slot('m61',
'test_decoding');
select * from pg_logical_slot_get_changes('m61',null,null);regards,
On 03/04/2019 10:57 PM, Andres Freund wrote:
Hi,
On 2019-03-04 16:54:32 +0530, tushar wrote:
On 03/01/2019 11:16 PM, Andres Freund wrote:
So, if I understand correctly you do*not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?Ok, This time around - I performed like this -
.)Master cluster (set wal_level=logical and hot_standby_feedback=on in
postgresql.conf) , start the server and create a physical
replication slotNote that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).Thanks,
Andres
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Fri, 8 Mar 2019 at 20:59, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Mon, 4 Mar 2019 at 14:09, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Fri, 14 Dec 2018 at 06:25, Andres Freund <andres@anarazel.de> wrote:
I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).Attached is a WIP test patch
0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
modified version of Craig Ringer's test casesHi Andres,
I am trying to come up with new testcases to test the recovery
conflict handling. Before that I have some queries :With Craig Ringer's approach, the way to reproduce the recovery
conflict was, I believe, easy : Do a checkpoint, which will log the
global-catalog-xmin-advance WAL record, due to which the standby -
while replaying the message - may find out that it's a recovery
conflict. But with your approach, the latestRemovedXid is passed only
during specific vacuum-related WAL records, so to reproduce the
recovery conflict error, we need to make sure some specific WAL
records are logged, such as XLOG_BTREE_DELETE. So we need to create a
testcase such that while creating an index tuple, it erases dead
tuples from a page, so that it eventually calls
_bt_vacuum_one_page()=>_bt_delitems_delete(), thus logging a
XLOG_BTREE_DELETE record.I tried to come up with this reproducible testcase without success.
This seems difficult. Do you have an easier option ? May be we can use
some other WAL records that may have easier more reliable test case
for showing up recovery conflict ?
I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;
This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477
But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.
Further, with your patch, in ResolveRecoveryConflictWithSlots(), it
just throws a WARNING error level; so the wal receiver would not make
the backends throw an error; hence the test case won't catch the
error. Is that right ?
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.
I am going ahead with drop-the-slot way of handling the recovery
conflict. I am trying out using ReplicationSlotDropPtr() to drop the
slot. It seems the required locks are already in place inside the for
loop of ResolveRecoveryConflictWithSlots(), so we can directly call
ReplicationSlotDropPtr() when the slot xmin conflict is found.
As explained above, the only way I could reproduce the conflict is by
turning hot_standby_feedback off on slave, creating and inserting into
a table on master and then running VACUUM FULL. But after doing this,
I am not able to verify whether the slot is dropped, because on slave,
any simple psql command thereon, waits on a lock acquired on sys
catache, e.g. pg_authid. Working on it.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi,
On 2019-04-02 15:26:52 +0530, Amit Khandekar wrote:
On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.I am going ahead with drop-the-slot way of handling the recovery
conflict. I am trying out using ReplicationSlotDropPtr() to drop the
slot. It seems the required locks are already in place inside the for
loop of ResolveRecoveryConflictWithSlots(), so we can directly call
ReplicationSlotDropPtr() when the slot xmin conflict is found.
Cool.
As explained above, the only way I could reproduce the conflict is by
turning hot_standby_feedback off on slave, creating and inserting into
a table on master and then running VACUUM FULL. But after doing this,
I am not able to verify whether the slot is dropped, because on slave,
any simple psql command thereon, waits on a lock acquired on sys
catache, e.g. pg_authid. Working on it.
I think that indicates a bug somewhere. If replay progressed, it should
have killed the slot, and continued replaying past the VACUUM
FULL. Those symptoms suggest replay is stuck somewhere. I suggest a)
compiling with WAL_DEBUG enabled, and turning on wal_debug=1, b) looking
at a backtrace of the startup process.
Greetings,
Andres Freund
On Tue, 2 Apr 2019 at 21:34, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-04-02 15:26:52 +0530, Amit Khandekar wrote:
On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.I am going ahead with drop-the-slot way of handling the recovery
conflict. I am trying out using ReplicationSlotDropPtr() to drop the
slot. It seems the required locks are already in place inside the for
loop of ResolveRecoveryConflictWithSlots(), so we can directly call
ReplicationSlotDropPtr() when the slot xmin conflict is found.Cool.
As explained above, the only way I could reproduce the conflict is by
turning hot_standby_feedback off on slave, creating and inserting into
a table on master and then running VACUUM FULL. But after doing this,
I am not able to verify whether the slot is dropped, because on slave,
any simple psql command thereon, waits on a lock acquired on sys
catache, e.g. pg_authid. Working on it.I think that indicates a bug somewhere. If replay progressed, it should
have killed the slot, and continued replaying past the VACUUM
FULL. Those symptoms suggest replay is stuck somewhere. I suggest a)
compiling with WAL_DEBUG enabled, and turning on wal_debug=1, b) looking
at a backtrace of the startup process.
Oops, it was my own change that caused the hang. Sorry for the noise.
After using wal_debug, found out that after replaying the LOCK records
for the catalog pg_auth, it was not releasing it because it had
actually got stuck in ReplicationSlotDropPtr() itself. In
ResolveRecoveryConflictWithSlots(), a shared
ReplicationSlotControlLock was already held before iterating through
the slots, and now ReplicationSlotDropPtr() again tries to take the
same lock in exclusive mode for setting slot->in_use, leading to a
deadlock. I fixed that by releasing the shared lock before calling
ReplicationSlotDropPtr(), and then re-starting the slots' scan over
again since we released it. We do similar thing for
ReplicationSlotCleanup().
Attached is a rebased version of your patch
logical-decoding-on-standby.patch. This v2 version also has the above
changes. It also includes the tap test file which is still in WIP
state, mainly because I have yet to add the conflict recovery handling
scenarios.
I see that you have already committed the
move-latestRemovedXid-computation-for-nbtree-xlog related changes.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v2.patchapplication/octet-stream; name=logical-decoding-on-standby_v2.patchDownload
From a508f1c38ff689ec5a8d9df371fd941d547fa479 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Wed, 3 Apr 2019 19:26:49 +0530
Subject: [PATCH] Logical decoding on standby.
-Andres Freund.
Besides the above main changes by Andres, following changes done by
Amit Khandekar :
1. Handle slot conflict recovery by dropping the conflicting slots.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
This test is originally written by Craig Ringer, with some changes
from Amit Khandekar. Still in WIP state. Yet to add scenarios to test
conflict recovery.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 3 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/replication/logical/logical.c | 2 +
src/backend/replication/slot.c | 79 +++++
src/backend/storage/ipc/standby.c | 7 +-
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
.../recovery/t/016_logical_decoding_on_replica.pl | 358 +++++++++++++++++++++
24 files changed, 513 insertions(+), 18 deletions(-)
create mode 100644 src/test/recovery/t/016_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index cb80ab0..ccb761f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -342,7 +342,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -563,7 +564,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -758,6 +759,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index e17f017..b67e4e6 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 05ceb65..f5439d2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7097,12 +7097,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7138,6 +7139,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7188,6 +7190,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7218,7 +7221,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7228,6 +7231,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7648,7 +7652,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7684,7 +7689,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7780,7 +7786,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7917,7 +7925,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 392b35e..6959119 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,7 +465,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 8ade165..745cbc5 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 0a85d8b..2617d55 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index b9311ce..ef4910f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 71836ee..c66137a 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -913,6 +913,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 6e5bc12..e8b7af4 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,7 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +112,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 006446b..5785d2f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void)
}
}
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ NameData slotname;
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_xmin, xid)));
+ }
+
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_catalog_xmin, xid)));
+ }
+
+
+ if (found_conflict)
+ {
+ elog(WARNING, "Dropping conflicting slot %s", s->data.name.data);
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
+ ReplicationSlotDropPtr(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
/*
* Flush all replication slots to disk.
*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 215f146..75dbdb9 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 1089556..92a6ed1 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1896,6 +1898,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 2f87b67..5eb0c71 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -47,10 +47,10 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
RelFileNode hnode; /* RelFileNode of the heap the index currently
* points at */
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -95,6 +95,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22cd13c..482c874 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 6527fc9..50f334a 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8f1d66..4e0776a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 2361243..f276c7e 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool catalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 9606d02..78bc639 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 5402851..d6437d6 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/recovery/t/016_logical_decoding_on_replica.pl b/src/test/recovery/t/016_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..8cc029b
--- /dev/null
+++ b/src/test/recovery/t/016_logical_decoding_on_replica.pl
@@ -0,0 +1,358 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 52;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+sleep(2); # ensure walreceiver feedback sent
+
+# If no slot on standby exists to hold down catalog_xmin it must follow xmin,
+# (which is nextXid when no xacts are running on the standby).
+($xmin, $catalog_xmin) = print_phys_xmin();
+ok($xmin, "xmin not null");
+is($xmin, $catalog_xmin, "xmin and catalog_xmin equal");
+
+# We need catalog_xmin advance to take effect on the master and be replayed
+# on standby.
+$node_master->safe_psql('postgres', 'CHECKPOINT');
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now the standby's slot
+# doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream catalog retention
+#########################################################
+
+sub test_catalog_xmin_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $oldestCatalogXmin, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestCatalogXmin:\s*(\d+)/)
+ {
+ $oldestCatalogXmin = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, oldestCatalogXmin $oldestCatalogXmin, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid, $oldestCatalogXmin);
+}
+
+my ($oldestXid, $oldestCatalogXmin) = test_catalog_xmin_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_xmin, $new_catalog_xmin) = print_phys_xmin();
+# We're now back to the old behaviour of hot_standby_feedback
+# reporting nextXid for both thresholds
+ok($new_catalog_xmin, "physical catalog_xmin still non-null");
+cmp_ok($new_catalog_xmin, '==', $new_xmin,
+ 'xmin and catalog_xmin equal after slot drop');
+
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb'), '-P', 'test_decoding', '-S', 'dodropslot', '--create-slot'], 'pg_recvlogical created dodropslot');
+# or BAIL_OUT('slot creation failed, subsequent results would be meaningless');
+# TODO : Above, it bails out even when pg_recvlogical is successful, commented out BAIL_OUT
+$node_replica->command_ok(['pg_recvlogical', '-v', '-d', $node_replica->connstr('postgres'), '-P', 'test_decoding', '-S', 'otherslot', '--create-slot'], 'pg_recvlogical created otherslot');
+# or BAIL_OUT('slot creation failed, subsequent results would be meaningless');
+# TODO : Above, it bails out even when pg_recvlogical is successful, commented out BAIL_OUT
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On Wed, 3 Apr 2019 at 19:57, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Oops, it was my own change that caused the hang. Sorry for the noise.
After using wal_debug, found out that after replaying the LOCK records
for the catalog pg_auth, it was not releasing it because it had
actually got stuck in ReplicationSlotDropPtr() itself. In
ResolveRecoveryConflictWithSlots(), a shared
ReplicationSlotControlLock was already held before iterating through
the slots, and now ReplicationSlotDropPtr() again tries to take the
same lock in exclusive mode for setting slot->in_use, leading to a
deadlock. I fixed that by releasing the shared lock before calling
ReplicationSlotDropPtr(), and then re-starting the slots' scan over
again since we released it. We do similar thing for
ReplicationSlotCleanup().Attached is a rebased version of your patch
logical-decoding-on-standby.patch. This v2 version also has the above
changes. It also includes the tap test file which is still in WIP
state, mainly because I have yet to add the conflict recovery handling
scenarios.
Attached v3 patch includes a new scenario to test conflict recovery
handling by verifying that the conflicting slot gets dropped.
WIth this, I am done with the test changes, except the below question
that I had posted earlier which I would like to have inputs :
Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :
ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
# Failed test 'xmin and catalog_xmin equal after slot drop'
# at t/016_logical_decoding_on_replica.pl line 272.
# got:
# expected: 2584
I am not sure what is expected. What actually happens is : the
physical xlot catalog_xmin remains NULL initially, but becomes
non-NULL after the logical replication slot is created on standby.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v3.patchapplication/octet-stream; name=logical-decoding-on-standby_v3.patchDownload
From a508f1c38ff689ec5a8d9df371fd941d547fa479 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Wed, 3 Apr 2019 19:26:49 +0530
Subject: [PATCH] Logical decoding on standby.
-Andres Freund.
Besides the above main changes by Andres, following changes done by
Amit Khandekar :
1. Handle slot conflict recovery by dropping the conflicting slots.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
This test is originally written by Craig Ringer, with some changes
from Amit Khandekar. Still in WIP state. Yet to add scenarios to test
conflict recovery.
Incremental changes in v3 : Added recovery handling scenario.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 3 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/replication/logical/logical.c | 2 +
src/backend/replication/slot.c | 79 +++++
src/backend/storage/ipc/standby.c | 7 +-
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
.../recovery/t/016_logical_decoding_on_replica.pl | 358 +++++++++++++++++++++
24 files changed, 513 insertions(+), 18 deletions(-)
create mode 100644 src/test/recovery/t/016_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 4fb1855..59a7910 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -342,7 +342,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -544,7 +545,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -736,6 +737,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index e17f017..b67e4e6 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a05b6a0..bfbb9d3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7100,12 +7100,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7141,6 +7142,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7191,6 +7193,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7221,7 +7224,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7231,6 +7234,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7651,7 +7655,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7687,7 +7692,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7783,7 +7789,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7920,7 +7928,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c9d8312..fad08e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -475,7 +475,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 8ade165..745cbc5 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 0a85d8b..2617d55 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index b9311ce..ef4910f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 6e5bc12..e8b7af4 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,7 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +112,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 006446b..5785d2f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void)
}
}
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ NameData slotname;
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_xmin, xid)));
+ }
+
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_catalog_xmin, xid)));
+ }
+
+
+ if (found_conflict)
+ {
+ elog(WARNING, "Dropping conflicting slot %s", s->data.name.data);
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
+ ReplicationSlotDropPtr(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
/*
* Flush all replication slots to disk.
*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 215f146..75dbdb9 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 1089556..92a6ed1 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1896,6 +1898,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 9990d97..887a377 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -47,10 +47,10 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
RelFileNode hnode; /* RelFileNode of the heap the index currently
* points at */
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -95,6 +95,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22cd13c..482c874 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index ee8fc6f..d535441 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8f1d66..4e0776a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 2361243..f276c7e 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool catalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 9606d02..78bc639 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 89a7fbf..c36e228 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/recovery/t/016_logical_decoding_on_replica.pl b/src/test/recovery/t/016_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..7998d85
--- /dev/null
+++ b/src/test/recovery/t/016_logical_decoding_on_replica.pl
@@ -0,0 +1,386 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 55;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+sleep(2); # ensure walreceiver feedback sent
+
+# If no slot on standby exists to hold down catalog_xmin it must follow xmin,
+# (which is nextXid when no xacts are running on the standby).
+($xmin, $catalog_xmin) = print_phys_xmin();
+ok($xmin, "xmin not null");
+is($xmin, $catalog_xmin, "xmin and catalog_xmin equal");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now the standby's slot
+# doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream catalog retention
+#########################################################
+
+sub test_catalog_xmin_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $oldestCatalogXmin, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestCatalogXmin:\s*(\d+)/)
+ {
+ $oldestCatalogXmin = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, oldestCatalogXmin $oldestCatalogXmin, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid, $oldestCatalogXmin);
+}
+
+my ($oldestXid, $oldestCatalogXmin) = test_catalog_xmin_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+#
+#
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+sleep(2); # ensure walreceiver feedback sent
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+sleep(2); # ensure walreceiver feedback sent
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_xmin, $new_catalog_xmin) = print_phys_xmin();
+# We're now back to the old behaviour of hot_standby_feedback
+# reporting nextXid for both thresholds
+ok($new_catalog_xmin, "physical catalog_xmin still non-null");
+cmp_ok($new_catalog_xmin, '==', $new_xmin,
+ 'xmin and catalog_xmin equal after slot drop');
+
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb'), '-P', 'test_decoding', '-S', 'dodropslot', '--create-slot'], 'pg_recvlogical created dodropslot');
+# or BAIL_OUT('slot creation failed, subsequent results would be meaningless');
+# TODO : Above, it bails out even when pg_recvlogical is successful, commented out BAIL_OUT
+$node_replica->command_ok(['pg_recvlogical', '-v', '-d', $node_replica->connstr('postgres'), '-P', 'test_decoding', '-S', 'otherslot', '--create-slot'], 'pg_recvlogical created otherslot');
+# or BAIL_OUT('slot creation failed, subsequent results would be meaningless');
+# TODO : Above, it bails out even when pg_recvlogical is successful, commented out BAIL_OUT
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
Hi,
Thanks for the new version of the patch. Btw, could you add Craig as a
co-author in the commit message of the next version of the patch? Don't
want to forget him.
On 2019-04-05 17:08:39 +0530, Amit Khandekar wrote:
Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
# Failed test 'xmin and catalog_xmin equal after slot drop'
# at t/016_logical_decoding_on_replica.pl line 272.
# got:
# expected: 2584I am not sure what is expected. What actually happens is : the
physical xlot catalog_xmin remains NULL initially, but becomes
non-NULL after the logical replication slot is created on standby.
That seems like the correct behaviour to me - why would we still have a
catalog xmin if there's no slot logical slot?
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 006446b..5785d2f 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void) } }+void +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid) +{ + int i; + bool found_conflict = false; + + if (max_replication_slots <= 0) + return; + +restart: + if (found_conflict) + { + CHECK_FOR_INTERRUPTS(); + /* + * Wait awhile for them to die so that we avoid flooding an + * unresponsive backend when system is heavily loaded. + */ + pg_usleep(100000); + found_conflict = false; + } + + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); + for (i = 0; i < max_replication_slots; i++) + { + ReplicationSlot *s; + NameData slotname; + TransactionId slot_xmin; + TransactionId slot_catalog_xmin; + + s = &ReplicationSlotCtl->replication_slots[i]; + + /* cannot change while ReplicationSlotCtlLock is held */ + if (!s->in_use) + continue; + + /* not our database, skip */ + if (s->data.database != InvalidOid && s->data.database != dboid) + continue; + + SpinLockAcquire(&s->mutex); + slotname = s->data.name; + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + SpinLockRelease(&s->mutex); + + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid)) + { + found_conflict = true; + + ereport(WARNING, + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_xmin, xid))); + } + + if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + { + found_conflict = true; + + ereport(WARNING, + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_catalog_xmin, xid))); + } + + + if (found_conflict) + { + elog(WARNING, "Dropping conflicting slot %s", s->data.name.data); + LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */ + ReplicationSlotDropPtr(s); + + /* We released the lock above; so re-scan the slots. */ + goto restart; + } + }
I think this should be refactored so that the two found_conflict cases
set a 'reason' variable (perhaps an enum?) to the particular reason, and
then only one warning should be emitted. I also think that LOG might be
more appropriate than WARNING - as confusing as that is, LOG is more
severe than WARNING (see docs about log_min_messages).
@@ -0,0 +1,386 @@ +# Demonstrate that logical can follow timeline switches. +# +# Test logical decoding on a standby. +# +use strict; +use warnings; +use 5.8.0; + +use PostgresNode; +use TestLib; +use Test::More tests => 55; +use RecursiveCopy; +use File::Copy; + +my ($stdin, $stdout, $stderr, $ret, $handle, $return); +my $backup_name; + +# Initialize master node +my $node_master = get_new_node('master'); +$node_master->init(allows_streaming => 1, has_archiving => 1); +$node_master->append_conf('postgresql.conf', q{ +wal_level = 'logical' +max_replication_slots = 4 +max_wal_senders = 4 +log_min_messages = 'debug2' +log_error_verbosity = verbose +# send status rapidly so we promptly advance xmin on master +wal_receiver_status_interval = 1 +# very promptly terminate conflicting backends +max_standby_streaming_delay = '2s' +}); +$node_master->dump_info; +$node_master->start; + +$node_master->psql('postgres', q[CREATE DATABASE testdb]); + +$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]); +$backup_name = 'b1'; +my $backup_dir = $node_master->backup_dir . "/" . $backup_name; +TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby'); + +sub print_phys_xmin +{ + my $slot = $node_master->slot('decoding_standby'); + return ($slot->{'xmin'}, $slot->{'catalog_xmin'}); +} + +my ($xmin, $catalog_xmin) = print_phys_xmin(); +# After slot creation, xmins must be null +is($xmin, '', "xmin null"); +is($catalog_xmin, '', "catalog_xmin null"); + +my $node_replica = get_new_node('replica'); +$node_replica->init_from_backup( + $node_master, $backup_name, + has_streaming => 1, + has_restoring => 1); +$node_replica->append_conf('postgresql.conf', + q[primary_slot_name = 'decoding_standby']); + +$node_replica->start; +$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush')); + +# with hot_standby_feedback off, xmin and catalog_xmin must still be null +($xmin, $catalog_xmin) = print_phys_xmin(); +is($xmin, '', "xmin null after replica join"); +is($catalog_xmin, '', "catalog_xmin null after replica join"); + +$node_replica->append_conf('postgresql.conf',q[ +hot_standby_feedback = on +]); +$node_replica->restart; +sleep(2); # ensure walreceiver feedback sent
Can we make this more robust? E.g. by waiting till pg_stat_replication
shows the change on the primary? Because I can guarantee that this'll
fail on slow buildfarm machines (say the valgrind animals).
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush')); +sleep(2); # ensure walreceiver feedback sent
Similar.
Greetings,
Andres Freund
On Sat, 6 Apr 2019 at 04:45, Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the new version of the patch. Btw, could you add Craig as a
co-author in the commit message of the next version of the patch? Don't
want to forget him.
I had put his name in the earlier patch. But now I have made it easier to spot.
On 2019-04-05 17:08:39 +0530, Amit Khandekar wrote:
Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
# Failed test 'xmin and catalog_xmin equal after slot drop'
# at t/016_logical_decoding_on_replica.pl line 272.
# got:
# expected: 2584I am not sure what is expected. What actually happens is : the
physical xlot catalog_xmin remains NULL initially, but becomes
non-NULL after the logical replication slot is created on standby.That seems like the correct behaviour to me - why would we still have a
catalog xmin if there's no slot logical slot?
Yeah ... In the earlier implementation, maybe it was different, that's
why the catalog_xmin didn't become NULL. Not sure. Anyways, I have
changed this check. Details in the following sections.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 006446b..5785d2f 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void) } }+void +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid) +{ + int i; + bool found_conflict = false; + + if (max_replication_slots <= 0) + return; + +restart: + if (found_conflict) + { + CHECK_FOR_INTERRUPTS(); + /* + * Wait awhile for them to die so that we avoid flooding an + * unresponsive backend when system is heavily loaded. + */ + pg_usleep(100000); + found_conflict = false; + } + + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); + for (i = 0; i < max_replication_slots; i++) + { + ReplicationSlot *s; + NameData slotname; + TransactionId slot_xmin; + TransactionId slot_catalog_xmin; + + s = &ReplicationSlotCtl->replication_slots[i]; + + /* cannot change while ReplicationSlotCtlLock is held */ + if (!s->in_use) + continue; + + /* not our database, skip */ + if (s->data.database != InvalidOid && s->data.database != dboid) + continue; + + SpinLockAcquire(&s->mutex); + slotname = s->data.name; + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + SpinLockRelease(&s->mutex); + + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid)) + { + found_conflict = true; + + ereport(WARNING, + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_xmin, xid))); + } + + if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + { + found_conflict = true; + + ereport(WARNING, + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_catalog_xmin, xid))); + } + + + if (found_conflict) + { + elog(WARNING, "Dropping conflicting slot %s", s->data.name.data); + LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */ + ReplicationSlotDropPtr(s); + + /* We released the lock above; so re-scan the slots. */ + goto restart; + } + }I think this should be refactored so that the two found_conflict cases
set a 'reason' variable (perhaps an enum?) to the particular reason, and
then only one warning should be emitted. I also think that LOG might be
more appropriate than WARNING - as confusing as that is, LOG is more
severe than WARNING (see docs about log_min_messages).
What I have in mind is :
ereport(LOG,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("Dropping conflicting slot %s", s->data.name.data),
errdetail("%s, removed xid %d.", conflict_str, xid)));
where conflict_str is a dynamically generated string containing
something like : "slot xmin : 1234, slot catalog_xmin: 5678"
So for the user, the errdetail will look like :
"slot xmin: 1234, catalog_xmin: 5678, removed xid : 9012"
I think the user can figure out whether it was xmin or catalog_xmin or
both that conflicted with removed xid.
If we don't do this way, we may not be able to show in a single
message if both xmin and catalog_xmin are conflicting at the same
time.
Does this message look good to you, or you had in mind something quite
different ?
@@ -0,0 +1,386 @@ +# Demonstrate that logical can follow timeline switches. +# +# Test logical decoding on a standby. +# +use strict; +use warnings; +use 5.8.0; + +use PostgresNode; +use TestLib; +use Test::More tests => 55; +use RecursiveCopy; +use File::Copy; + +my ($stdin, $stdout, $stderr, $ret, $handle, $return); +my $backup_name; + +# Initialize master node +my $node_master = get_new_node('master'); +$node_master->init(allows_streaming => 1, has_archiving => 1); +$node_master->append_conf('postgresql.conf', q{ +wal_level = 'logical' +max_replication_slots = 4 +max_wal_senders = 4 +log_min_messages = 'debug2' +log_error_verbosity = verbose +# send status rapidly so we promptly advance xmin on master +wal_receiver_status_interval = 1 +# very promptly terminate conflicting backends +max_standby_streaming_delay = '2s' +}); +$node_master->dump_info; +$node_master->start; + +$node_master->psql('postgres', q[CREATE DATABASE testdb]); + +$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]); +$backup_name = 'b1'; +my $backup_dir = $node_master->backup_dir . "/" . $backup_name; +TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby'); + +sub print_phys_xmin +{ + my $slot = $node_master->slot('decoding_standby'); + return ($slot->{'xmin'}, $slot->{'catalog_xmin'}); +} + +my ($xmin, $catalog_xmin) = print_phys_xmin(); +# After slot creation, xmins must be null +is($xmin, '', "xmin null"); +is($catalog_xmin, '', "catalog_xmin null"); + +my $node_replica = get_new_node('replica'); +$node_replica->init_from_backup( + $node_master, $backup_name, + has_streaming => 1, + has_restoring => 1); +$node_replica->append_conf('postgresql.conf', + q[primary_slot_name = 'decoding_standby']); + +$node_replica->start; +$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush')); + +# with hot_standby_feedback off, xmin and catalog_xmin must still be null +($xmin, $catalog_xmin) = print_phys_xmin(); +is($xmin, '', "xmin null after replica join"); +is($catalog_xmin, '', "catalog_xmin null after replica join"); + +$node_replica->append_conf('postgresql.conf',q[ +hot_standby_feedback = on +]); +$node_replica->restart; +sleep(2); # ensure walreceiver feedback sentCan we make this more robust? E.g. by waiting till pg_stat_replication
shows the change on the primary? Because I can guarantee that this'll
fail on slow buildfarm machines (say the valgrind animals).+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush')); +sleep(2); # ensure walreceiver feedback sentSimilar.
Ok. I have put a copy of the get_slot_xmins() function from
t/001_stream_rep.pl() into 016_logical_decoding_on_replica.pl. Renamed
it to wait_for_phys_mins(). And used this to wait for the
hot_standby_feedback change to propagate to master. This function
waits for the physical slot's xmin and catalog_xmin to get the right
values depending on whether there is a logical slot in standby and
whether hot_standby_feedback is on on standby.
I was not sure how pg_stat_replication could be used to identify about
hot_standby_feedback change reaching to master. So i did the above
way, which I think pretty much does what we want, I think.
Attached v4 patch only has the testcase change, and some minor cleanup
in the test file.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v4.patchapplication/x-patch; name=logical-decoding-on-standby_v4.patchDownload
From 1e3c68a644da4aa45ca72190cfa254ccd171f9e3 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Tue, 9 Apr 2019 22:06:25 +0530
Subject: [PATCH] Logical decoding on standby.
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 3 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/replication/logical/logical.c | 2 +
src/backend/replication/slot.c | 79 +++++
src/backend/storage/ipc/standby.c | 7 +-
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
.../recovery/t/016_logical_decoding_on_replica.pl | 391 +++++++++++++++++++++
24 files changed, 546 insertions(+), 18 deletions(-)
create mode 100644 src/test/recovery/t/016_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 4fb1855..59a7910 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -342,7 +342,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -544,7 +545,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -736,6 +737,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index e17f017..b67e4e6 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a05b6a0..bfbb9d3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7100,12 +7100,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7141,6 +7142,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7191,6 +7193,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7221,7 +7224,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7231,6 +7234,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7651,7 +7655,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7687,7 +7692,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7783,7 +7789,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7920,7 +7928,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c9d8312..fad08e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -475,7 +475,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 8ade165..745cbc5 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 0a85d8b..2617d55 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index b9311ce..ef4910f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 6e5bc12..e8b7af4 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,7 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +112,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 006446b..5785d2f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void)
}
}
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ NameData slotname;
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_xmin, xid)));
+ }
+
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_catalog_xmin, xid)));
+ }
+
+
+ if (found_conflict)
+ {
+ elog(WARNING, "Dropping conflicting slot %s", s->data.name.data);
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
+ ReplicationSlotDropPtr(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
/*
* Flush all replication slots to disk.
*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 215f146..75dbdb9 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index b4f2d0f..f4da4bc 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 9990d97..887a377 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -47,10 +47,10 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
RelFileNode hnode; /* RelFileNode of the heap the index currently
* points at */
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -95,6 +95,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22cd13c..482c874 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index ee8fc6f..d535441 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8f1d66..4e0776a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 2361243..f276c7e 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool catalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 9606d02..78bc639 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 89a7fbf..c36e228 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/recovery/t/016_logical_decoding_on_replica.pl b/src/test/recovery/t/016_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..9ee79b0
--- /dev/null
+++ b/src/test/recovery/t/016_logical_decoding_on_replica.pl
@@ -0,0 +1,391 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 51;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_phys_mins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now that the standby's
+# slot doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be non-NULL since hs_feedback is on and
+# there is a logical slot present on standby.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NOT NULL");
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb'), '-P', 'test_decoding', '-S', 'dodropslot', '--create-slot'], 'pg_recvlogical created dodropslot');
+$node_replica->command_ok(['pg_recvlogical', '-v', '-d', $node_replica->connstr('postgres'), '-P', 'test_decoding', '-S', 'otherslot', '--create-slot'], 'pg_recvlogical created otherslot');
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On 03/13/2019 08:40 PM, tushar wrote:
Hi ,
I am getting a server crash on standby while executing
pg_logical_slot_get_changes function , please refer this scenarioMaster cluster( ./initdb -D master)
set wal_level='hot_standby in master/postgresql.conf file
start the server , connect to psql terminal and create a physical
replication slot ( SELECT * from
pg_create_physical_replication_slot('p1');)perform pg_basebackup using --slot 'p1' (./pg_basebackup -D slave/ -R
--slot p1 -v))
set wal_level='logical' , hot_standby_feedback=on,
primary_slot_name='p1' in slave/postgresql.conf file
start the server , connect to psql terminal and create a logical
replication slot ( SELECT * from
pg_create_logical_replication_slot('t','test_decoding');)run pgbench ( ./pgbench -i -s 10 postgres) on master and select
pg_logical_slot_get_changes on Slave databasepostgres=# select * from pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.274 IST [26817] LOG: starting logical decoding
for slot "t"
2019-03-13 20:34:50.274 IST [26817] DETAIL: Streaming transactions
committing after 0/6C000060, reading WAL from 0/6C000028.
2019-03-13 20:34:50.274 IST [26817] STATEMENT: select * from
pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.275 IST [26817] LOG: logical decoding found
consistent point at 0/6C000028
2019-03-13 20:34:50.275 IST [26817] DETAIL: There are no running
transactions.
2019-03-13 20:34:50.275 IST [26817] STATEMENT: select * from
pg_logical_slot_get_changes('t',null,null);
TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File:
"decode.c", Line: 977)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-03-13
20:34:50.276 IST [26809] LOG: server process (PID 26817) was
terminated by signal 6: Aborted
Andres - Do you think - this is an issue which needs to be fixed ?
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
Hi,
On 2019-04-10 12:11:21 +0530, tushar wrote:
On 03/13/2019 08:40 PM, tushar wrote:
Hi ,
I am getting a server crash on standby while executing
pg_logical_slot_get_changes function � , please refer this scenarioMaster cluster( ./initdb -D master)
set wal_level='hot_standby in master/postgresql.conf file
start the server , connect to� psql terminal and create a physical
replication slot ( SELECT * from
pg_create_physical_replication_slot('p1');)perform pg_basebackup using --slot 'p1'� (./pg_basebackup -D slave/ -R
--slot p1 -v))
set wal_level='logical' , hot_standby_feedback=on,
primary_slot_name='p1' in slave/postgresql.conf file
start the server , connect to psql terminal and create a logical
replication slot (� SELECT * from
pg_create_logical_replication_slot('t','test_decoding');)run pgbench ( ./pgbench -i -s 10 postgres) on master and select
pg_logical_slot_get_changes on Slave databasepostgres=# select * from pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.274 IST [26817] LOG:� starting logical decoding for
slot "t"
2019-03-13 20:34:50.274 IST [26817] DETAIL:� Streaming transactions
committing after 0/6C000060, reading WAL from 0/6C000028.
2019-03-13 20:34:50.274 IST [26817] STATEMENT:� select * from
pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.275 IST [26817] LOG:� logical decoding found
consistent point at 0/6C000028
2019-03-13 20:34:50.275 IST [26817] DETAIL:� There are no running
transactions.
2019-03-13 20:34:50.275 IST [26817] STATEMENT:� select * from
pg_logical_slot_get_changes('t',null,null);
TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File:
"decode.c", Line: 977)
server closed the connection unexpectedly
��� This probably means the server terminated abnormally
��� before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-03-13
20:34:50.276 IST [26809] LOG:� server process (PID 26817) was terminated
by signal 6: AbortedAndres - Do you think - this is an issue which needs to� be fixed ?
Yes, it definitely needs to be fixed. I just haven't had sufficient time
to look into it. Have you reproduced this with Amit's latest version?
Amit, have you spent any time looking into it? I know that you're not
that deeply steeped into the internals of logical decoding, but perhaps
there's something obvious going on.
Greetings,
Andres Freund
On 04/10/2019 09:39 PM, Andres Freund wrote:
Have you reproduced this with Amit's latest version?
Yes-it is very much reproducible.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Wed, 10 Apr 2019 at 21:39, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-04-10 12:11:21 +0530, tushar wrote:
On 03/13/2019 08:40 PM, tushar wrote:
Hi ,
I am getting a server crash on standby while executing
pg_logical_slot_get_changes function , please refer this scenarioMaster cluster( ./initdb -D master)
set wal_level='hot_standby in master/postgresql.conf file
start the server , connect to psql terminal and create a physical
replication slot ( SELECT * from
pg_create_physical_replication_slot('p1');)perform pg_basebackup using --slot 'p1' (./pg_basebackup -D slave/ -R
--slot p1 -v))
set wal_level='logical' , hot_standby_feedback=on,
primary_slot_name='p1' in slave/postgresql.conf file
start the server , connect to psql terminal and create a logical
replication slot ( SELECT * from
pg_create_logical_replication_slot('t','test_decoding');)run pgbench ( ./pgbench -i -s 10 postgres) on master and select
pg_logical_slot_get_changes on Slave databasepostgres=# select * from pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.274 IST [26817] LOG: starting logical decoding for
slot "t"
2019-03-13 20:34:50.274 IST [26817] DETAIL: Streaming transactions
committing after 0/6C000060, reading WAL from 0/6C000028.
2019-03-13 20:34:50.274 IST [26817] STATEMENT: select * from
pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.275 IST [26817] LOG: logical decoding found
consistent point at 0/6C000028
2019-03-13 20:34:50.275 IST [26817] DETAIL: There are no running
transactions.
2019-03-13 20:34:50.275 IST [26817] STATEMENT: select * from
pg_logical_slot_get_changes('t',null,null);
TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File:
"decode.c", Line: 977)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-03-13
20:34:50.276 IST [26809] LOG: server process (PID 26817) was terminated
by signal 6: AbortedAndres - Do you think - this is an issue which needs to be fixed ?
Yes, it definitely needs to be fixed. I just haven't had sufficient time
to look into it. Have you reproduced this with Amit's latest version?Amit, have you spent any time looking into it? I know that you're not
that deeply steeped into the internals of logical decoding, but perhaps
there's something obvious going on.
I tried to see if I can quickly understand what's going on.
Here, master wal_level is hot_standby, not logical, though slave
wal_level is logical.
On slave, when pg_logical_slot_get_changes() is run, in
DecodeMultiInsert(), it does not get any WAL records having
XLH_INSERT_CONTAINS_NEW_TUPLE set. So data pointer is never
incremented, it remains at tupledata. So at the end of the function,
this assertion fails :
Assert(data == tupledata + tuplelen);
because data is actually at tupledata.
Not sure why this is happening. On slave, wal_level is logical, so
logical records should have tuple data. Not sure what does that have
to do with wal_level of master. Everything should be there on slave
after it replays the inserts; and also slave wal_level is logical.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi,
On 2019-04-12 23:34:02 +0530, Amit Khandekar wrote:
I tried to see if I can quickly understand what's going on.
Here, master wal_level is hot_standby, not logical, though slave
wal_level is logical.
Oh, that's well diagnosed. Cool. Also nicely tested - this'd be ugly
in production.
I assume the problem isn't present if you set the primary to wal_level =
logical?
Not sure why this is happening. On slave, wal_level is logical, so
logical records should have tuple data. Not sure what does that have
to do with wal_level of master. Everything should be there on slave
after it replays the inserts; and also slave wal_level is logical.
The standby doesn't write its own WAL, only primaries do. I thought we
forbade running with wal_level=logical on a standby, when the primary is
only set to replica. But that's not what we do, see
CheckRequiredParameterValues().
I've not yet thought this through, but I think we'll have to somehow
error out in this case. I guess we could just check at the start of
decoding what ControlFile->wal_level is set to, and then raise an error
in decode.c when we pass an XLOG_PARAMETER_CHANGE record that sets
wal_level to something lower?
Could you try to implement that?
Greetings,
Andres Freund
On Sat, 13 Apr 2019 at 00:57, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-04-12 23:34:02 +0530, Amit Khandekar wrote:
I tried to see if I can quickly understand what's going on.
Here, master wal_level is hot_standby, not logical, though slave
wal_level is logical.Oh, that's well diagnosed. Cool. Also nicely tested - this'd be ugly
in production.
Tushar had made me aware of the fact that this reproduces only when
master wal_level is hot_standby.
I assume the problem isn't present if you set the primary to wal_level =
logical?
Right.
Not sure why this is happening. On slave, wal_level is logical, so
logical records should have tuple data. Not sure what does that have
to do with wal_level of master. Everything should be there on slave
after it replays the inserts; and also slave wal_level is logical.The standby doesn't write its own WAL, only primaries do. I thought we
forbade running with wal_level=logical on a standby, when the primary is
only set to replica. But that's not what we do, see
CheckRequiredParameterValues().I've not yet thought this through, but I think we'll have to somehow
error out in this case. I guess we could just check at the start of
decoding what ControlFile->wal_level is set to,
By "start of decoding", I didn't get where exactly. Do you mean
CheckLogicalDecodingRequirements() ?
and then raise an error
in decode.c when we pass an XLOG_PARAMETER_CHANGE record that sets
wal_level to something lower?
Didn't get where exactly we should error out. We don't do
XLOG_PARAMETER_CHANGE handling in decode.c , so obviously you meant
something else, which I didn't understand.
What I am thinking is :
In CheckLogicalDecodingRequirements(), besides checking wal_level,
also check ControlFile->wal_level when InHotStandby. I mean, when we
are InHotStandby, both wal_level and ControlFile->wal_level should be
= WAL_LEVEL_LOGICAL. This will allow us to error out when using logical
slot when master has incompatible wal_level.
ControlFile is not accessible outside xlog.c so need to have an API to
extract this field.
Could you try to implement that?
Greetings,
Andres Freund
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi,
Sorry for the late response.
On 2019-04-16 12:27:46 +0530, Amit Khandekar wrote:
On Sat, 13 Apr 2019 at 00:57, Andres Freund <andres@anarazel.de> wrote:
Not sure why this is happening. On slave, wal_level is logical, so
logical records should have tuple data. Not sure what does that have
to do with wal_level of master. Everything should be there on slave
after it replays the inserts; and also slave wal_level is logical.The standby doesn't write its own WAL, only primaries do. I thought we
forbade running with wal_level=logical on a standby, when the primary is
only set to replica. But that's not what we do, see
CheckRequiredParameterValues().I've not yet thought this through, but I think we'll have to somehow
error out in this case. I guess we could just check at the start of
decoding what ControlFile->wal_level is set to,By "start of decoding", I didn't get where exactly. Do you mean
CheckLogicalDecodingRequirements() ?
Right.
and then raise an error
in decode.c when we pass an XLOG_PARAMETER_CHANGE record that sets
wal_level to something lower?Didn't get where exactly we should error out. We don't do
XLOG_PARAMETER_CHANGE handling in decode.c , so obviously you meant
something else, which I didn't understand.
I was indeed thinking of checking XLOG_PARAMETER_CHANGE in
decode.c. Adding handling for that, and just checking wal_level, ought
to be fairly doable? But, see below:
What I am thinking is :
In CheckLogicalDecodingRequirements(), besides checking wal_level,
also check ControlFile->wal_level when InHotStandby. I mean, when we
are InHotStandby, both wal_level and ControlFile->wal_level should be= WAL_LEVEL_LOGICAL. This will allow us to error out when using logical
slot when master has incompatible wal_level.
That still allows the primary to change wal_level after logical decoding
has started, so we need the additional checks.
I'm not yet sure how to best deal with the fact that wal_level might be
changed by the primary at basically all times. We would eventually get
an error when logical decoding reaches the XLOG_PARAMETER_CHANGE. But
that's not necessarily sufficient - if a primary changes its wal_level
to lower, it could remove information logical decoding needs *before*
logical decoding reaches the XLOG_PARAMETER_CHANGE record.
So I suspect we need conflict handling in xlog_redo's
XLOG_PARAMETER_CHANGE case. If we there check against existing logical
slots, we ought to be safe.
Therefore I think the check in CheckLogicalDecodingRequirements() needs
to be something like:
if (RecoveryInProgress())
{
if (!InHotStandby)
ereport(ERROR, "logical decoding on a standby required hot_standby to be enabled");
/*
* This check is racy, but whenever XLOG_PARAMETER_CHANGE indicates that
* wal_level has changed, we verify that there are no existin glogical
* replication slots. And to avoid races around creating a new slot,
* CheckLogicalDecodingRequirements() is called once before creating the slot,
* andd once when logical decoding is initially starting up.
*/
if (ControlFile->wal_level != LOGICAL)
ereport(ERROR, "...");
}
And then add a second CheckLogicalDecodingRequirements() call into
CreateInitDecodingContext().
What do you think?
Greetings,
Andres Freund
Hi,
I am going through you comments. Meanwhile, attached is a rebased
version of the v4 patch.
On Tue, 21 May 2019 at 21:49, Andres Freund <andres@anarazel.de> wrote:
Hi,
Sorry for the late response.
On 2019-04-16 12:27:46 +0530, Amit Khandekar wrote:
On Sat, 13 Apr 2019 at 00:57, Andres Freund <andres@anarazel.de> wrote:
Not sure why this is happening. On slave, wal_level is logical, so
logical records should have tuple data. Not sure what does that have
to do with wal_level of master. Everything should be there on slave
after it replays the inserts; and also slave wal_level is logical.The standby doesn't write its own WAL, only primaries do. I thought we
forbade running with wal_level=logical on a standby, when the primary is
only set to replica. But that's not what we do, see
CheckRequiredParameterValues().I've not yet thought this through, but I think we'll have to somehow
error out in this case. I guess we could just check at the start of
decoding what ControlFile->wal_level is set to,By "start of decoding", I didn't get where exactly. Do you mean
CheckLogicalDecodingRequirements() ?Right.
and then raise an error
in decode.c when we pass an XLOG_PARAMETER_CHANGE record that sets
wal_level to something lower?Didn't get where exactly we should error out. We don't do
XLOG_PARAMETER_CHANGE handling in decode.c , so obviously you meant
something else, which I didn't understand.I was indeed thinking of checking XLOG_PARAMETER_CHANGE in
decode.c. Adding handling for that, and just checking wal_level, ought
to be fairly doable? But, see below:What I am thinking is :
In CheckLogicalDecodingRequirements(), besides checking wal_level,
also check ControlFile->wal_level when InHotStandby. I mean, when we
are InHotStandby, both wal_level and ControlFile->wal_level should be= WAL_LEVEL_LOGICAL. This will allow us to error out when using logical
slot when master has incompatible wal_level.
That still allows the primary to change wal_level after logical decoding
has started, so we need the additional checks.I'm not yet sure how to best deal with the fact that wal_level might be
changed by the primary at basically all times. We would eventually get
an error when logical decoding reaches the XLOG_PARAMETER_CHANGE. But
that's not necessarily sufficient - if a primary changes its wal_level
to lower, it could remove information logical decoding needs *before*
logical decoding reaches the XLOG_PARAMETER_CHANGE record.So I suspect we need conflict handling in xlog_redo's
XLOG_PARAMETER_CHANGE case. If we there check against existing logical
slots, we ought to be safe.Therefore I think the check in CheckLogicalDecodingRequirements() needs
to be something like:if (RecoveryInProgress())
{
if (!InHotStandby)
ereport(ERROR, "logical decoding on a standby required hot_standby to be enabled");
/*
* This check is racy, but whenever XLOG_PARAMETER_CHANGE indicates that
* wal_level has changed, we verify that there are no existin glogical
* replication slots. And to avoid races around creating a new slot,
* CheckLogicalDecodingRequirements() is called once before creating the slot,
* andd once when logical decoding is initially starting up.
*/
if (ControlFile->wal_level != LOGICAL)
ereport(ERROR, "...");
}And then add a second CheckLogicalDecodingRequirements() call into
CreateInitDecodingContext().What do you think?
Greetings,
Andres Freund
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v4_rebased.patchapplication/octet-stream; name=logical-decoding-on-standby_v4_rebased.patchDownload
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..385ea1f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index e17f017..b67e4e6 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 19d2c52..7a15b35 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7117,12 +7117,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7158,6 +7159,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7208,6 +7210,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7238,7 +7241,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7248,6 +7251,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7668,7 +7672,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7704,7 +7709,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7800,7 +7806,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7937,7 +7945,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9e17acc1..a8b73e4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index e7c40cb..75a6c24 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 6532a25..b874bda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index fc85c6f..ca750e6 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index acb4d9a..31951bd 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,7 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +112,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 55c306e..1bc7a3c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void)
}
}
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ NameData slotname;
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_xmin, xid)));
+ }
+
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(WARNING,
+ (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_catalog_xmin, xid)));
+ }
+
+
+ if (found_conflict)
+ {
+ elog(WARNING, "Dropping conflicting slot %s", s->data.name.data);
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
+ ReplicationSlotDropPtr(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
/*
* Flush all replication slots to disk.
*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 215f146..75dbdb9 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index b4f2d0f..f4da4bc 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index e66b034..61ca0e8 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -47,6 +47,7 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
@@ -94,6 +95,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22cd13c..482c874 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index ee8fc6f..d535441 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index a8f1d66..4e0776a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 2361243..f276c7e 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool catalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 9606d02..78bc639 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d7f33ab..8c90fd7 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/recovery/t/016_logical_decoding_on_replica.pl b/src/test/recovery/t/016_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..9ee79b0
--- /dev/null
+++ b/src/test/recovery/t/016_logical_decoding_on_replica.pl
@@ -0,0 +1,391 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 51;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_phys_mins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now that the standby's
+# slot doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be non-NULL since hs_feedback is on and
+# there is a logical slot present on standby.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NOT NULL");
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb'), '-P', 'test_decoding', '-S', 'dodropslot', '--create-slot'], 'pg_recvlogical created dodropslot');
+$node_replica->command_ok(['pg_recvlogical', '-v', '-d', $node_replica->connstr('postgres'), '-P', 'test_decoding', '-S', 'otherslot', '--create-slot'], 'pg_recvlogical created otherslot');
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
On Tue, 9 Apr 2019 at 22:23, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Sat, 6 Apr 2019 at 04:45, Andres Freund <andres@anarazel.de> wrote:
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 006446b..5785d2f 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void) } }+void +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid) +{ + int i; + bool found_conflict = false; + + if (max_replication_slots <= 0) + return; + +restart: + if (found_conflict) + { + CHECK_FOR_INTERRUPTS(); + /* + * Wait awhile for them to die so that we avoid flooding an + * unresponsive backend when system is heavily loaded. + */ + pg_usleep(100000); + found_conflict = false; + } + + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); + for (i = 0; i < max_replication_slots; i++) + { + ReplicationSlot *s; + NameData slotname; + TransactionId slot_xmin; + TransactionId slot_catalog_xmin; + + s = &ReplicationSlotCtl->replication_slots[i]; + + /* cannot change while ReplicationSlotCtlLock is held */ + if (!s->in_use) + continue; + + /* not our database, skip */ + if (s->data.database != InvalidOid && s->data.database != dboid) + continue; + + SpinLockAcquire(&s->mutex); + slotname = s->data.name; + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + SpinLockRelease(&s->mutex); + + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid)) + { + found_conflict = true; + + ereport(WARNING, + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_xmin, xid))); + } + + if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + { + found_conflict = true; + + ereport(WARNING, + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_catalog_xmin, xid))); + } + + + if (found_conflict) + { + elog(WARNING, "Dropping conflicting slot %s", s->data.name.data); + LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */ + ReplicationSlotDropPtr(s); + + /* We released the lock above; so re-scan the slots. */ + goto restart; + } + }I think this should be refactored so that the two found_conflict cases
set a 'reason' variable (perhaps an enum?) to the particular reason, and
then only one warning should be emitted. I also think that LOG might be
more appropriate than WARNING - as confusing as that is, LOG is more
severe than WARNING (see docs about log_min_messages).What I have in mind is :
ereport(LOG,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("Dropping conflicting slot %s", s->data.name.data),
errdetail("%s, removed xid %d.", conflict_str, xid)));
where conflict_str is a dynamically generated string containing
something like : "slot xmin : 1234, slot catalog_xmin: 5678"
So for the user, the errdetail will look like :
"slot xmin: 1234, catalog_xmin: 5678, removed xid : 9012"
I think the user can figure out whether it was xmin or catalog_xmin or
both that conflicted with removed xid.
If we don't do this way, we may not be able to show in a single
message if both xmin and catalog_xmin are conflicting at the same
time.Does this message look good to you, or you had in mind something quite
different ?
The above one is yet another point that needs to be concluded on. Till
then I will use the above way to display the error message in the
upcoming patch version.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Tue, 21 May 2019 at 21:49, Andres Freund <andres@anarazel.de> wrote:
Hi,
Sorry for the late response.
On 2019-04-16 12:27:46 +0530, Amit Khandekar wrote:
On Sat, 13 Apr 2019 at 00:57, Andres Freund <andres@anarazel.de> wrote:
Not sure why this is happening. On slave, wal_level is logical, so
logical records should have tuple data. Not sure what does that have
to do with wal_level of master. Everything should be there on slave
after it replays the inserts; and also slave wal_level is logical.The standby doesn't write its own WAL, only primaries do. I thought we
forbade running with wal_level=logical on a standby, when the primary is
only set to replica. But that's not what we do, see
CheckRequiredParameterValues().I've not yet thought this through, but I think we'll have to somehow
error out in this case. I guess we could just check at the start of
decoding what ControlFile->wal_level is set to,By "start of decoding", I didn't get where exactly. Do you mean
CheckLogicalDecodingRequirements() ?Right.
and then raise an error
in decode.c when we pass an XLOG_PARAMETER_CHANGE record that sets
wal_level to something lower?Didn't get where exactly we should error out. We don't do
XLOG_PARAMETER_CHANGE handling in decode.c , so obviously you meant
something else, which I didn't understand.I was indeed thinking of checking XLOG_PARAMETER_CHANGE in
decode.c. Adding handling for that, and just checking wal_level, ought
to be fairly doable? But, see below:What I am thinking is :
In CheckLogicalDecodingRequirements(), besides checking wal_level,
also check ControlFile->wal_level when InHotStandby. I mean, when we
are InHotStandby, both wal_level and ControlFile->wal_level should be= WAL_LEVEL_LOGICAL. This will allow us to error out when using logical
slot when master has incompatible wal_level.
That still allows the primary to change wal_level after logical decoding
has started, so we need the additional checks.I'm not yet sure how to best deal with the fact that wal_level might be
changed by the primary at basically all times. We would eventually get
an error when logical decoding reaches the XLOG_PARAMETER_CHANGE. But
that's not necessarily sufficient - if a primary changes its wal_level
to lower, it could remove information logical decoding needs *before*
logical decoding reaches the XLOG_PARAMETER_CHANGE record.So I suspect we need conflict handling in xlog_redo's
XLOG_PARAMETER_CHANGE case. If we there check against existing logical
slots, we ought to be safe.Therefore I think the check in CheckLogicalDecodingRequirements() needs
to be something like:if (RecoveryInProgress())
{
if (!InHotStandby)
ereport(ERROR, "logical decoding on a standby required hot_standby to be enabled");
/*
* This check is racy, but whenever XLOG_PARAMETER_CHANGE indicates that
* wal_level has changed, we verify that there are no existin glogical
* replication slots. And to avoid races around creating a new slot,
* CheckLogicalDecodingRequirements() is called once before creating the slot,
* andd once when logical decoding is initially starting up.
*/
if (ControlFile->wal_level != LOGICAL)
ereport(ERROR, "...");
}And then add a second CheckLogicalDecodingRequirements() call into
CreateInitDecodingContext().What do you think?
Yeah, I agree we should add such checks to minimize the possibility of
reading logical records from a master that has insufficient wal_level.
So to summarize :
a. CheckLogicalDecodingRequirements() : Add Controlfile wal_level checks
b. Call this function call in CreateInitDecodingContext() as well.
c. While decoding XLOG_PARAMETER_CHANGE record, emit recovery conflict
error if there is an existing logical slot.
This made me think more of the race conditions. For instance, in
pg_create_logical_replication_slot(), just after
CheckLogicalDecodingRequirements and before actually creating the
slot, suppose concurrently Controlfile->wal_level is changed from
logical to replica. So suppose a new slot does get created. Later the
slot is read, so in pg_logical_slot_get_changes_guts(),
CheckLogicalDecodingRequirements() is called where it checks
ControlFile->wal_level value. But just before it does that,
ControlFile->wal_level concurrently changes back to logical, because
of replay of another param-change record. So this logical reader will
think that the wal_level is sufficient, and will proceed to read the
records, but those records are *before* the wal_level change, so these
records don't have logical data.
Do you think this is possible, or I am missing something? If that's
possible, I was considering some other mechanisms. Like, while reading
each wal_level-change record by a logical reader, save the value in
the ReplicationSlotPersistentData. So while reading the WAL records,
the reader knows whether the records have logical data. If they don't
have, error out. But not sure how will the reader know the very first
record status, i.e. before it gets the wal_level-change record.
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Thu, May 23, 2019 at 8:08 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
This made me think more of the race conditions. For instance, in
pg_create_logical_replication_slot(), just after
CheckLogicalDecodingRequirements and before actually creating the
slot, suppose concurrently Controlfile->wal_level is changed from
logical to replica. So suppose a new slot does get created. Later the
slot is read, so in pg_logical_slot_get_changes_guts(),
CheckLogicalDecodingRequirements() is called where it checks
ControlFile->wal_level value. But just before it does that,
ControlFile->wal_level concurrently changes back to logical, because
of replay of another param-change record. So this logical reader will
think that the wal_level is sufficient, and will proceed to read the
records, but those records are *before* the wal_level change, so these
records don't have logical data.Do you think this is possible, or I am missing something?
wal_level is PGC_POSTMASTER.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hello
wal_level is PGC_POSTMASTER.
But primary can be restarted without restart on standby. We require wal_level replica or highter (currently only logical) on standby. So online change from logical to replica wal_level is possible on standby's controlfile.
regards, Sergei
On Thu, May 23, 2019 at 9:30 AM Sergei Kornilov <sk@zsrv.org> wrote:
wal_level is PGC_POSTMASTER.
But primary can be restarted without restart on standby. We require wal_level replica or highter (currently only logical) on standby. So online change from logical to replica wal_level is possible on standby's controlfile.
That's true, but Amit's scenario involved a change in wal_level during
the execution of pg_create_logical_replication_slot(), which I think
can't happen.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2019-05-23 17:39:21 +0530, Amit Khandekar wrote:
On Tue, 21 May 2019 at 21:49, Andres Freund <andres@anarazel.de> wrote:
Yeah, I agree we should add such checks to minimize the possibility of
reading logical records from a master that has insufficient wal_level.
So to summarize :
a. CheckLogicalDecodingRequirements() : Add Controlfile wal_level checks
b. Call this function call in CreateInitDecodingContext() as well.
c. While decoding XLOG_PARAMETER_CHANGE record, emit recovery conflict
error if there is an existing logical slot.This made me think more of the race conditions. For instance, in
pg_create_logical_replication_slot(), just after
CheckLogicalDecodingRequirements and before actually creating the
slot, suppose concurrently Controlfile->wal_level is changed from
logical to replica. So suppose a new slot does get created. Later the
slot is read, so in pg_logical_slot_get_changes_guts(),
CheckLogicalDecodingRequirements() is called where it checks
ControlFile->wal_level value. But just before it does that,
ControlFile->wal_level concurrently changes back to logical, because
of replay of another param-change record. So this logical reader will
think that the wal_level is sufficient, and will proceed to read the
records, but those records are *before* the wal_level change, so these
records don't have logical data.
I don't think that's an actual problem, because there's no decoding
before the slot exists and CreateInitDecodingContext() has determined
the start LSN. And by that point the slot exists, slo
XLOG_PARAMETER_CHANGE replay can error out.
Greetings,
Andres Freund
Hi,
On 2019-05-23 09:37:50 -0400, Robert Haas wrote:
On Thu, May 23, 2019 at 9:30 AM Sergei Kornilov <sk@zsrv.org> wrote:
wal_level is PGC_POSTMASTER.
But primary can be restarted without restart on standby. We require wal_level replica or highter (currently only logical) on standby. So online change from logical to replica wal_level is possible on standby's controlfile.
That's true, but Amit's scenario involved a change in wal_level during
the execution of pg_create_logical_replication_slot(), which I think
can't happen.
I don't see why not - we're talking about the wal_level in the WAL
stream, not the setting on the standby. And that can change during the
execution of pg_create_logical_replication_slot(), if a PARAMTER_CHANGE
record is replayed. I don't think it's actually a problem, as I
outlined in my response to Amit, though.
Greetings,
Andres Freund
On Thu, 23 May 2019 at 21:29, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-05-23 17:39:21 +0530, Amit Khandekar wrote:
On Tue, 21 May 2019 at 21:49, Andres Freund <andres@anarazel.de> wrote:
Yeah, I agree we should add such checks to minimize the possibility of
reading logical records from a master that has insufficient wal_level.
So to summarize :
a. CheckLogicalDecodingRequirements() : Add Controlfile wal_level checks
b. Call this function call in CreateInitDecodingContext() as well.
c. While decoding XLOG_PARAMETER_CHANGE record, emit recovery conflict
error if there is an existing logical slot.This made me think more of the race conditions. For instance, in
pg_create_logical_replication_slot(), just after
CheckLogicalDecodingRequirements and before actually creating the
slot, suppose concurrently Controlfile->wal_level is changed from
logical to replica. So suppose a new slot does get created. Later the
slot is read, so in pg_logical_slot_get_changes_guts(),
CheckLogicalDecodingRequirements() is called where it checks
ControlFile->wal_level value. But just before it does that,
ControlFile->wal_level concurrently changes back to logical, because
of replay of another param-change record. So this logical reader will
think that the wal_level is sufficient, and will proceed to read the
records, but those records are *before* the wal_level change, so these
records don't have logical data.I don't think that's an actual problem, because there's no decoding
before the slot exists and CreateInitDecodingContext() has determined
the start LSN. And by that point the slot exists, slo
XLOG_PARAMETER_CHANGE replay can error out.
So between the start lsn and the lsn for
parameter-change(logical=>replica) record, there can be some records ,
and these don't have logical data. So the slot created will read from
the start lsn, and proceed to read these records, before reading the
parameter-change record.
Can you re-write the below phrase please ? I suspect there is some
letters missing there :
"And by that point the slot exists, slo XLOG_PARAMETER_CHANGE replay
can error out"
Are you saying we want to error out when the postgres replays the
param change record and there is existing logical slot ? I thought you
were suggesting earlier that it's the decoder.c code which should
error out when reading the param-change record.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi,
On 2019-05-23 23:08:55 +0530, Amit Khandekar wrote:
On Thu, 23 May 2019 at 21:29, Andres Freund <andres@anarazel.de> wrote:
On 2019-05-23 17:39:21 +0530, Amit Khandekar wrote:
On Tue, 21 May 2019 at 21:49, Andres Freund <andres@anarazel.de> wrote:
Yeah, I agree we should add such checks to minimize the possibility of
reading logical records from a master that has insufficient wal_level.
So to summarize :
a. CheckLogicalDecodingRequirements() : Add Controlfile wal_level checks
b. Call this function call in CreateInitDecodingContext() as well.
c. While decoding XLOG_PARAMETER_CHANGE record, emit recovery conflict
error if there is an existing logical slot.This made me think more of the race conditions. For instance, in
pg_create_logical_replication_slot(), just after
CheckLogicalDecodingRequirements and before actually creating the
slot, suppose concurrently Controlfile->wal_level is changed from
logical to replica. So suppose a new slot does get created. Later the
slot is read, so in pg_logical_slot_get_changes_guts(),
CheckLogicalDecodingRequirements() is called where it checks
ControlFile->wal_level value. But just before it does that,
ControlFile->wal_level concurrently changes back to logical, because
of replay of another param-change record. So this logical reader will
think that the wal_level is sufficient, and will proceed to read the
records, but those records are *before* the wal_level change, so these
records don't have logical data.I don't think that's an actual problem, because there's no decoding
before the slot exists and CreateInitDecodingContext() has determined
the start LSN. And by that point the slot exists, slo
XLOG_PARAMETER_CHANGE replay can error out.So between the start lsn and the lsn for
parameter-change(logical=>replica) record, there can be some records ,
and these don't have logical data. So the slot created will read from
the start lsn, and proceed to read these records, before reading the
parameter-change record.
I don't think that's possible. By the time CreateInitDecodingContext()
is called, the slot *already* exists (but in a state that'll cause it to
be throw away on error). But the restart point has not yet been
determined. Thus, if there is a XLOG_PARAMETER_CHANGE with a wal_level
change it can error out. And to handle the race of wal_level changing
between CheckLogicalDecodingRequirements() and the slot creation, we
recheck in CreateInitDecodingContext().
Think we might nee dto change ReplicationSlotReserveWal() to use the
replay, rather than the redo pointer for logical slots though.
Can you re-write the below phrase please ? I suspect there is some
letters missing there :
"And by that point the slot exists, slo XLOG_PARAMETER_CHANGE replay
can error out"
I think it's just one additional letter, namely s/slo/so/
Are you saying we want to error out when the postgres replays the
param change record and there is existing logical slot ? I thought you
were suggesting earlier that it's the decoder.c code which should
error out when reading the param-change record.
Yes, that's what I'm saying. See this portion of my previous email on
the topic:
On 2019-05-21 09:19:37 -0700, Andres Freund wrote:
On 2019-04-16 12:27:46 +0530, Amit Khandekar wrote:
What I am thinking is :
In CheckLogicalDecodingRequirements(), besides checking wal_level,
also check ControlFile->wal_level when InHotStandby. I mean, when we
are InHotStandby, both wal_level and ControlFile->wal_level should be= WAL_LEVEL_LOGICAL. This will allow us to error out when using logical
slot when master has incompatible wal_level.
That still allows the primary to change wal_level after logical decoding
has started, so we need the additional checks.I'm not yet sure how to best deal with the fact that wal_level might be
changed by the primary at basically all times. We would eventually get
an error when logical decoding reaches the XLOG_PARAMETER_CHANGE. But
that's not necessarily sufficient - if a primary changes its wal_level
to lower, it could remove information logical decoding needs *before*
logical decoding reaches the XLOG_PARAMETER_CHANGE record.So I suspect we need conflict handling in xlog_redo's
XLOG_PARAMETER_CHANGE case. If we there check against existing logical
slots, we ought to be safe.Therefore I think the check in CheckLogicalDecodingRequirements() needs
to be something like:if (RecoveryInProgress())
{
if (!InHotStandby)
ereport(ERROR, "logical decoding on a standby required hot_standby to be enabled");
/*
* This check is racy, but whenever XLOG_PARAMETER_CHANGE indicates that
* wal_level has changed, we verify that there are no existin glogical
* replication slots. And to avoid races around creating a new slot,
* CheckLogicalDecodingRequirements() is called once before creating the slot,
* andd once when logical decoding is initially starting up.
*/
if (ControlFile->wal_level != LOGICAL)
ereport(ERROR, "...");
}And then add a second CheckLogicalDecodingRequirements() call into
CreateInitDecodingContext().What do you think?
Greetings,
Andres Freund
On Thu, 23 May 2019 at 23:18, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-05-23 23:08:55 +0530, Amit Khandekar wrote:
On Thu, 23 May 2019 at 21:29, Andres Freund <andres@anarazel.de> wrote:
On 2019-05-23 17:39:21 +0530, Amit Khandekar wrote:
On Tue, 21 May 2019 at 21:49, Andres Freund <andres@anarazel.de> wrote:
Yeah, I agree we should add such checks to minimize the possibility of
reading logical records from a master that has insufficient wal_level.
So to summarize :
a. CheckLogicalDecodingRequirements() : Add Controlfile wal_level checks
b. Call this function call in CreateInitDecodingContext() as well.
c. While decoding XLOG_PARAMETER_CHANGE record, emit recovery conflict
error if there is an existing logical slot.This made me think more of the race conditions. For instance, in
pg_create_logical_replication_slot(), just after
CheckLogicalDecodingRequirements and before actually creating the
slot, suppose concurrently Controlfile->wal_level is changed from
logical to replica. So suppose a new slot does get created. Later the
slot is read, so in pg_logical_slot_get_changes_guts(),
CheckLogicalDecodingRequirements() is called where it checks
ControlFile->wal_level value. But just before it does that,
ControlFile->wal_level concurrently changes back to logical, because
of replay of another param-change record. So this logical reader will
think that the wal_level is sufficient, and will proceed to read the
records, but those records are *before* the wal_level change, so these
records don't have logical data.I don't think that's an actual problem, because there's no decoding
before the slot exists and CreateInitDecodingContext() has determined
the start LSN. And by that point the slot exists, slo
XLOG_PARAMETER_CHANGE replay can error out.So between the start lsn and the lsn for
parameter-change(logical=>replica) record, there can be some records ,
and these don't have logical data. So the slot created will read from
the start lsn, and proceed to read these records, before reading the
parameter-change record.I don't think that's possible. By the time CreateInitDecodingContext()
is called, the slot *already* exists (but in a state that'll cause it to
be throw away on error). But the restart point has not yet been
determined. Thus, if there is a XLOG_PARAMETER_CHANGE with a wal_level
change it can error out. And to handle the race of wal_level changing
between CheckLogicalDecodingRequirements() and the slot creation, we
recheck in CreateInitDecodingContext().
ok, got it now. I was concerned that there might be some such cases
unhandled because we are not using locks to handle such concurrency
conditions. But as you have explained, the checks we are adding will
avoid this race condition.
Think we might nee dto change ReplicationSlotReserveWal() to use the
replay, rather than the redo pointer for logical slots though.
Not thought of this; will get back.
Working on the patch now ....
Are you saying we want to error out when the postgres replays the
param change record and there is existing logical slot ? I thought you
were suggesting earlier that it's the decoder.c code which should
error out when reading the param-change record.Yes, that's what I'm saying. See this portion of my previous email on
the topic:
Yeah, thanks for pointing that.
On 2019-05-21 09:19:37 -0700, Andres Freund wrote:
So I suspect we need conflict handling in xlog_redo's
XLOG_PARAMETER_CHANGE case. If we there check against existing logical
slots, we ought to be safe.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Fri, 24 May 2019 at 19:26, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Working on the patch now ....
Attached is an incremental WIP patch
handle_wal_level_changes_WIP.patch to be applied over the earlier main
patch logical-decoding-on-standby_v4_rebased.patch.
On 2019-05-21 09:19:37 -0700, Andres Freund wrote:
So I suspect we need conflict handling in xlog_redo's
XLOG_PARAMETER_CHANGE case. If we there check against existing logical
slots, we ought to be safe.
Yet to do this. Andres, how do you want to handle this scenario ? Just
drop all the existing logical slots like what we decided for conflict
recovery for conflicting xids ?
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
handle_wal_level_changes_WIP.patchapplication/octet-stream; name=handle_wal_level_changes_WIP.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 527522f..b26a20a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4928,6 +4928,15 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file.
+ */
+int
+ControlFileWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index eec3a22..2c638e9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 31951bd..aab2f747 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,23 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+ if (RecoveryInProgress())
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (ControlFileWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
+
#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
@@ -243,6 +260,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 2af938b..8280d39 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern int ControlFileWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
On Fri, 24 May 2019 at 21:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Fri, 24 May 2019 at 19:26, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Working on the patch now ....
Attached is an incremental WIP patch
handle_wal_level_changes_WIP.patch to be applied over the earlier main
patch logical-decoding-on-standby_v4_rebased.patch.
I found an issue with these changes : When we change master wal_level
from logical to hot_standby, and again back to logical, and then
create a logical replication slot on slave, it gets created; but when
I do pg_logical_slot_get_changes() with that slot, it seems to read
records *before* I created the logical slot, so it encounters
parameter-change(logical=>hot_standby) record, so returns an error as
per the patch, because now in DecodeXLogOp() I error out when
XLOG_PARAMETER_CHANGE is found :
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
I thought it won't read records *before* the slot was created. Am I
missing something ?
On 2019-05-21 09:19:37 -0700, Andres Freund wrote:
So I suspect we need conflict handling in xlog_redo's
XLOG_PARAMETER_CHANGE case. If we there check against existing logical
slots, we ought to be safe.Yet to do this. Andres, how do you want to handle this scenario ? Just
drop all the existing logical slots like what we decided for conflict
recovery for conflicting xids ?
I went ahead and added handling that drops existing slots when we
encounter XLOG_PARAMETER_CHANGE in xlog_redo().
Attached is logical-decoding-on-standby_v5.patch, that contains all
the changes so far.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v5.patchapplication/octet-stream; name=logical-decoding-on-standby_v5.patchDownload
From 5c4dff8c936b4285031ba2c4241a8667d99805fa Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Mon, 27 May 2019 16:59:51 +0530
Subject: [PATCH] Logical decoding on standby.
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
3. Handle slot conflicts when master wal_level becomes less than logical.
-Amit Khandekar
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 3 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/access/transam/xlog.c | 20 ++
src/backend/replication/logical/decode.c | 14 +-
src/backend/replication/logical/logical.c | 21 ++
src/backend/replication/slot.c | 93 +++++
src/backend/storage/ipc/standby.c | 7 +-
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/access/xlog.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
.../recovery/t/016_logical_decoding_on_replica.pl | 391 +++++++++++++++++++++
27 files changed, 613 insertions(+), 19 deletions(-)
create mode 100644 src/test/recovery/t/016_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..385ea1f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 5321762..e28465a 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 419da87..4093281 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7117,12 +7117,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7158,6 +7159,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7208,6 +7210,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7238,7 +7241,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7248,6 +7251,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7668,7 +7672,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7704,7 +7709,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7800,7 +7806,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7937,7 +7945,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..bf34d3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index de4d4ef..9b1231e 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 6532a25..b874bda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2b1662a..eaaf631 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1c7dd51..d5d0522 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4928,6 +4928,15 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file.
+ */
+int
+ControlFileWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9845,6 +9854,17 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+ /*
+ * Drop logical slots if we are in hot standby and master does not have
+ * logical data. Don't bother to search for the slots if standby is
+ * running with wal_level lower than logical, because in that case,
+ * we would have disallowed creation of logical slots.
+ */
+ if (InRecovery && InHotStandby &&
+ xlrec.wal_level < WAL_LEVEL_LOGICAL &&
+ wal_level >= WAL_LEVEL_LOGICAL)
+ ResolveRecoveryConflictWithSlots(InvalidOid, InvalidTransactionId);
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->MaxConnections = xlrec.MaxConnections;
ControlFile->max_worker_processes = xlrec.max_worker_processes;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..c1bd028 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index bbd38c0..c0dd327 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,24 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+ if (RecoveryInProgress())
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (ControlFileWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
+
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +129,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
@@ -241,6 +260,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 55c306e..9027f06 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1065,6 +1065,99 @@ ReplicationSlotReserveWal(void)
}
/*
+ * Resolve recovery conflicts with slots.
+ *
+ * When xid is valid, it means it's a removed-xid kind of conflict, so need to
+ * drop the appropriate slots whose xmin conflicts with removed xid.
+ * When xid is invalid, drop all logical slots. This is required when the
+ * master wal_level is set back to replica, so existing logical slots need to
+ * be dropped.
+ */
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ NameData slotname;
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* Invalid xid means caller is asking to drop all logical slots */
+ if (!TransactionIdIsValid(xid) && SlotIsLogical(s))
+ found_conflict = true;
+ else
+ {
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(LOG,
+ (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_xmin, xid)));
+ }
+
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(LOG,
+ (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_catalog_xmin, xid)));
+ }
+
+ }
+ if (found_conflict)
+ {
+ elog(LOG, "Dropping conflicting slot %s", s->data.name.data);
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
+ ReplicationSlotDropPtr(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 842fcab..dda6b4d 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index b4f2d0f..f4da4bc 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 969a537..59246c3 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -48,9 +48,9 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -96,6 +96,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f6cdca8..a1d1f11 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 073f740..d3dad69 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 237f4e0..fa02728 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern int ControlFileWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8bc7f52..522153a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index a3f8f82..6dedebc 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool onCatalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bf..579d9ff 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d7f33ab..8c90fd7 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/recovery/t/016_logical_decoding_on_replica.pl b/src/test/recovery/t/016_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..9ee79b0
--- /dev/null
+++ b/src/test/recovery/t/016_logical_decoding_on_replica.pl
@@ -0,0 +1,391 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 51;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_phys_mins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now that the standby's
+# slot doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->psql('testdb', qq[SELECT * FROM pg_create_logical_replication_slot('standby_logical', 'test_decoding')]),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be non-NULL since hs_feedback is on and
+# there is a logical slot present on standby.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NOT NULL");
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb'), '-P', 'test_decoding', '-S', 'dodropslot', '--create-slot'], 'pg_recvlogical created dodropslot');
+$node_replica->command_ok(['pg_recvlogical', '-v', '-d', $node_replica->connstr('postgres'), '-P', 'test_decoding', '-S', 'otherslot', '--create-slot'], 'pg_recvlogical created otherslot');
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On 2019-05-27 17:04:44 +0530, Amit Khandekar wrote:
On Fri, 24 May 2019 at 21:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Fri, 24 May 2019 at 19:26, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Working on the patch now ....
Attached is an incremental WIP patch
handle_wal_level_changes_WIP.patch to be applied over the earlier main
patch logical-decoding-on-standby_v4_rebased.patch.I found an issue with these changes : When we change master wal_level
from logical to hot_standby, and again back to logical, and then
create a logical replication slot on slave, it gets created; but when
I do pg_logical_slot_get_changes() with that slot, it seems to read
records *before* I created the logical slot, so it encounters
parameter-change(logical=>hot_standby) record, so returns an error as
per the patch, because now in DecodeXLogOp() I error out when
XLOG_PARAMETER_CHANGE is found :
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) * can restart from there. */ break; + case XLOG_PARAMETER_CHANGE: + { + xl_parameter_change *xlrec = + (xl_parameter_change *) XLogRecGetData(buf->record); + + /* Cannot proceed if master itself does not have logical data */ + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + break; + }I thought it won't read records *before* the slot was created. Am I
missing something ?
That's why I had mentioned that you'd need to adapt
ReplicationSlotReserveWal(), to use the replay LSN or such.
Greetings,
Andres Freund
On Mon, 27 May 2019 at 19:26, Andres Freund <andres@anarazel.de> wrote:
On 2019-05-27 17:04:44 +0530, Amit Khandekar wrote:
On Fri, 24 May 2019 at 21:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Fri, 24 May 2019 at 19:26, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Working on the patch now ....
Attached is an incremental WIP patch
handle_wal_level_changes_WIP.patch to be applied over the earlier main
patch logical-decoding-on-standby_v4_rebased.patch.I found an issue with these changes : When we change master wal_level
from logical to hot_standby, and again back to logical, and then
create a logical replication slot on slave, it gets created; but when
I do pg_logical_slot_get_changes() with that slot, it seems to read
records *before* I created the logical slot, so it encounters
parameter-change(logical=>hot_standby) record, so returns an error as
per the patch, because now in DecodeXLogOp() I error out when
XLOG_PARAMETER_CHANGE is found :@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) * can restart from there. */ break; + case XLOG_PARAMETER_CHANGE: + { + xl_parameter_change *xlrec = + (xl_parameter_change *) XLogRecGetData(buf->record); + + /* Cannot proceed if master itself does not have logical data */ + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + break; + }I thought it won't read records *before* the slot was created. Am I
missing something ?That's why I had mentioned that you'd need to adapt
ReplicationSlotReserveWal(), to use the replay LSN or such.
Yeah ok. I tried to do this :
@@ -1042,7 +1042,8 @@ ReplicationSlotReserveWal(void)
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
....
}
else
{
- restart_lsn = GetRedoRecPtr();
+ restart_lsn = SlotIsLogical(slot) ?
+ GetXLogReplayRecPtr(&ThisTimeLineID) : GetRedoRecPtr();
But then when I do pg_create_logical_replication_slot(), it hangs in
DecodingContextFindStartpoint(), waiting to find new records
(XLogReadRecord).
Working on it ...
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi,
On 2019-05-30 19:46:26 +0530, Amit Khandekar wrote:
@@ -1042,7 +1042,8 @@ ReplicationSlotReserveWal(void) if (!RecoveryInProgress() && SlotIsLogical(slot)) { .... } else { - restart_lsn = GetRedoRecPtr(); + restart_lsn = SlotIsLogical(slot) ? + GetXLogReplayRecPtr(&ThisTimeLineID) : GetRedoRecPtr();But then when I do pg_create_logical_replication_slot(), it hangs in
DecodingContextFindStartpoint(), waiting to find new records
(XLogReadRecord).
But just till the primary has logged the necessary WAL records? If you
just do CHECKPOINT; or such on the primary, it should succeed quickly?
Greetings,
Andres Freund
On Thu, 30 May 2019 at 20:13, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-05-30 19:46:26 +0530, Amit Khandekar wrote:
@@ -1042,7 +1042,8 @@ ReplicationSlotReserveWal(void) if (!RecoveryInProgress() && SlotIsLogical(slot)) { .... } else { - restart_lsn = GetRedoRecPtr(); + restart_lsn = SlotIsLogical(slot) ? + GetXLogReplayRecPtr(&ThisTimeLineID) : GetRedoRecPtr();But then when I do pg_create_logical_replication_slot(), it hangs in
DecodingContextFindStartpoint(), waiting to find new records
(XLogReadRecord).But just till the primary has logged the necessary WAL records? If you
just do CHECKPOINT; or such on the primary, it should succeed quickly?
Yes, it waits until there is a commit record, or (just tried) until a
checkpoint command.
Greetings,
Andres Freund
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Fri, 31 May 2019 at 11:08, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Thu, 30 May 2019 at 20:13, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-05-30 19:46:26 +0530, Amit Khandekar wrote:
@@ -1042,7 +1042,8 @@ ReplicationSlotReserveWal(void) if (!RecoveryInProgress() && SlotIsLogical(slot)) { .... } else { - restart_lsn = GetRedoRecPtr(); + restart_lsn = SlotIsLogical(slot) ? + GetXLogReplayRecPtr(&ThisTimeLineID) : GetRedoRecPtr();But then when I do pg_create_logical_replication_slot(), it hangs in
DecodingContextFindStartpoint(), waiting to find new records
(XLogReadRecord).But just till the primary has logged the necessary WAL records? If you
just do CHECKPOINT; or such on the primary, it should succeed quickly?Yes, it waits until there is a commit record, or (just tried) until a
checkpoint command.
Is XLOG_RUNNING_XACTS record essential for the logical decoding to
build a consistent snapshot ?
Since the restart_lsn is now ReplayRecPtr, there is no
XLOG_RUNNING_XACTS record, and so the snapshot state is not yet
SNAPBUILD_CONSISTENT. And so
DecodingContextFindStartpoint()=>DecodingContextReady() never returns
true, and hence DecodingContextFindStartpoint() goes in an infinite
loop, until it gets XLOG_RUNNING_XACTS.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Fri, 31 May 2019 at 17:31, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Fri, 31 May 2019 at 11:08, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Thu, 30 May 2019 at 20:13, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-05-30 19:46:26 +0530, Amit Khandekar wrote:
@@ -1042,7 +1042,8 @@ ReplicationSlotReserveWal(void) if (!RecoveryInProgress() && SlotIsLogical(slot)) { .... } else { - restart_lsn = GetRedoRecPtr(); + restart_lsn = SlotIsLogical(slot) ? + GetXLogReplayRecPtr(&ThisTimeLineID) : GetRedoRecPtr();But then when I do pg_create_logical_replication_slot(), it hangs in
DecodingContextFindStartpoint(), waiting to find new records
(XLogReadRecord).But just till the primary has logged the necessary WAL records? If you
just do CHECKPOINT; or such on the primary, it should succeed quickly?Yes, it waits until there is a commit record, or (just tried) until a
checkpoint command.Is XLOG_RUNNING_XACTS record essential for the logical decoding to
build a consistent snapshot ?
Since the restart_lsn is now ReplayRecPtr, there is no
XLOG_RUNNING_XACTS record, and so the snapshot state is not yet
SNAPBUILD_CONSISTENT. And so
DecodingContextFindStartpoint()=>DecodingContextReady() never returns
true, and hence DecodingContextFindStartpoint() goes in an infinite
loop, until it gets XLOG_RUNNING_XACTS.
After giving more thought on this, I think it might make sense to
arrange for the xl_running_xact record to be sent from master to the
standby, when a logical slot is to be created on standby. How about
standby sending a new message type to the master, requesting for
xl_running_xact record ? Then on master, ProcessStandbyMessage() will
process this new message type and call LogStandbySnapshot().
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi,
On 2019-05-31 17:31:34 +0530, Amit Khandekar wrote:
On Fri, 31 May 2019 at 11:08, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Thu, 30 May 2019 at 20:13, Andres Freund <andres@anarazel.de> wrote:
Yes, it waits until there is a commit record, or (just tried) until a
checkpoint command.
That's fine with me.
Is XLOG_RUNNING_XACTS record essential for the logical decoding to
build a consistent snapshot ?
Yes.
Since the restart_lsn is now ReplayRecPtr, there is no
XLOG_RUNNING_XACTS record, and so the snapshot state is not yet
SNAPBUILD_CONSISTENT. And so
DecodingContextFindStartpoint()=>DecodingContextReady() never returns
true, and hence DecodingContextFindStartpoint() goes in an infinite
loop, until it gets XLOG_RUNNING_XACTS.
These seem like conflicting statements? Infinite loops don't terminate
until a record is logged?
Greetings,
Andres Freund
Hi,
On 2019-06-04 15:51:01 +0530, Amit Khandekar wrote:
After giving more thought on this, I think it might make sense to
arrange for the xl_running_xact record to be sent from master to the
standby, when a logical slot is to be created on standby. How about
standby sending a new message type to the master, requesting for
xl_running_xact record ? Then on master, ProcessStandbyMessage() will
process this new message type and call LogStandbySnapshot().
I think that should be a secondary feature. You don't necessarily know
the upstream master, as the setup could be cascading one. I think for
now just having to wait, perhaps with a comment to manually start a
checkpoint, ought to suffice?
Greetings,
Andres Freund
On Tue, 4 Jun 2019 at 21:28, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-06-04 15:51:01 +0530, Amit Khandekar wrote:
After giving more thought on this, I think it might make sense to
arrange for the xl_running_xact record to be sent from master to the
standby, when a logical slot is to be created on standby. How about
standby sending a new message type to the master, requesting for
xl_running_xact record ? Then on master, ProcessStandbyMessage() will
process this new message type and call LogStandbySnapshot().I think that should be a secondary feature. You don't necessarily know
the upstream master, as the setup could be cascading one.
Oh yeah, cascading setup makes it more complicated.
I think for
now just having to wait, perhaps with a comment to manually start a
checkpoint, ought to suffice?
Ok.
Since this requires the test to handle the
fire-create-slot-and-then-fire-checkpoint-from-master actions, I was
modifying the test file to do this. After doing that, I found that the
slave gets an assertion failure in XLogReadRecord()=>XRecOffIsValid().
This happens only when the restart_lsn is set to ReplayRecPtr.
Somehow, this does not happen when I manually create the logical slot.
It happens only while running testcase. Working on it ...
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Mon, 10 Jun 2019 at 10:37, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Tue, 4 Jun 2019 at 21:28, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-06-04 15:51:01 +0530, Amit Khandekar wrote:
After giving more thought on this, I think it might make sense to
arrange for the xl_running_xact record to be sent from master to the
standby, when a logical slot is to be created on standby. How about
standby sending a new message type to the master, requesting for
xl_running_xact record ? Then on master, ProcessStandbyMessage() will
process this new message type and call LogStandbySnapshot().I think that should be a secondary feature. You don't necessarily know
the upstream master, as the setup could be cascading one.Oh yeah, cascading setup makes it more complicated.
I think for
now just having to wait, perhaps with a comment to manually start a
checkpoint, ought to suffice?Ok.
Since this requires the test to handle the
fire-create-slot-and-then-fire-checkpoint-from-master actions, I was
modifying the test file to do this. After doing that, I found that the
slave gets an assertion failure in XLogReadRecord()=>XRecOffIsValid().
This happens only when the restart_lsn is set to ReplayRecPtr.
Somehow, this does not happen when I manually create the logical slot.
It happens only while running testcase. Working on it ...
Like I mentioned above, I get an assertion failure for
Assert(XRecOffIsValid(RecPtr)) while reading WAL records looking for a
start position (DecodingContextFindStartpoint()). This is because in
CreateInitDecodingContext()=>ReplicationSlotReserveWal(), I now set
the logical slot's restart_lsn to XLogCtl->lastReplayedEndRecPtr. And
just after bringing up slave, lastReplayedEndRecPtr's initial values
are in this order : 0/2000028, 0/2000060, 0/20000D8, 0/2000100,
0/3000000, 0/3000060. You can see that 0/3000000 is not a valid value
because it points to the start of a WAL block, meaning it points to
the XLog page header (I think it's possible because it is 1 + endof
last replayed record, which can be start of next block). So when we
try to create a slot when it's in that position, then XRecOffIsValid()
fails while looking for a starting point.
One option I considered was : If lastReplayedEndRecPtr points to XLog
page header, get a position of the first record on that WAL block,
probably with XLogFindNextRecord(). But it is not trivial because
while in ReplicationSlotReserveWal(), XLogReaderState is not created
yet. Or else, do you think we can just increment the record pointer by
doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
SizeOfXLogShortPHD() ?
Do you think that we can solve this using some other approach ? I am
not sure whether it's only the initial conditions that cause
lastReplayedEndRecPtr value to *not* point to a valid record, or is it
just a coincidence and that lastReplayedEndRecPtr can also have such a
value any time afterwards. If it's only possible initially, we can
just use GetRedoRecPtr() instead of lastReplayedEndRecPtr if
lastReplayedEndRecPtr is invalid.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On 2019-May-23, Andres Freund wrote:
On 2019-05-23 09:37:50 -0400, Robert Haas wrote:
On Thu, May 23, 2019 at 9:30 AM Sergei Kornilov <sk@zsrv.org> wrote:
wal_level is PGC_POSTMASTER.
But primary can be restarted without restart on standby. We require wal_level replica or highter (currently only logical) on standby. So online change from logical to replica wal_level is possible on standby's controlfile.
That's true, but Amit's scenario involved a change in wal_level during
the execution of pg_create_logical_replication_slot(), which I think
can't happen.I don't see why not - we're talking about the wal_level in the WAL
stream, not the setting on the standby. And that can change during the
execution of pg_create_logical_replication_slot(), if a PARAMTER_CHANGE
record is replayed. I don't think it's actually a problem, as I
outlined in my response to Amit, though.
I don't know if this is directly relevant, but in commit_ts.c we go to
great lengths to ensure that things continue to work across restarts and
changes of the GUC in the primary, by decoupling activation and
deactivation of the module from start-time initialization. Maybe that
idea is applicable for this too?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 11 Jun 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Mon, 10 Jun 2019 at 10:37, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Tue, 4 Jun 2019 at 21:28, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-06-04 15:51:01 +0530, Amit Khandekar wrote:
After giving more thought on this, I think it might make sense to
arrange for the xl_running_xact record to be sent from master to the
standby, when a logical slot is to be created on standby. How about
standby sending a new message type to the master, requesting for
xl_running_xact record ? Then on master, ProcessStandbyMessage() will
process this new message type and call LogStandbySnapshot().I think that should be a secondary feature. You don't necessarily know
the upstream master, as the setup could be cascading one.Oh yeah, cascading setup makes it more complicated.
I think for
now just having to wait, perhaps with a comment to manually start a
checkpoint, ought to suffice?Ok.
Since this requires the test to handle the
fire-create-slot-and-then-fire-checkpoint-from-master actions, I was
modifying the test file to do this. After doing that, I found that the
slave gets an assertion failure in XLogReadRecord()=>XRecOffIsValid().
This happens only when the restart_lsn is set to ReplayRecPtr.
Somehow, this does not happen when I manually create the logical slot.
It happens only while running testcase. Working on it ...Like I mentioned above, I get an assertion failure for
Assert(XRecOffIsValid(RecPtr)) while reading WAL records looking for a
start position (DecodingContextFindStartpoint()). This is because in
CreateInitDecodingContext()=>ReplicationSlotReserveWal(), I now set
the logical slot's restart_lsn to XLogCtl->lastReplayedEndRecPtr. And
just after bringing up slave, lastReplayedEndRecPtr's initial values
are in this order : 0/2000028, 0/2000060, 0/20000D8, 0/2000100,
0/3000000, 0/3000060. You can see that 0/3000000 is not a valid value
because it points to the start of a WAL block, meaning it points to
the XLog page header (I think it's possible because it is 1 + endof
last replayed record, which can be start of next block). So when we
try to create a slot when it's in that position, then XRecOffIsValid()
fails while looking for a starting point.One option I considered was : If lastReplayedEndRecPtr points to XLog
page header, get a position of the first record on that WAL block,
probably with XLogFindNextRecord(). But it is not trivial because
while in ReplicationSlotReserveWal(), XLogReaderState is not created
yet.
In the attached v6 version of the patch, I did the above. That is, I
used XLogFindNextRecord() to bump up the restart_lsn of the slot to
the first valid record. But since XLogReaderState is not available in
ReplicationSlotReserveWal(), I did this in
DecodingContextFindStartpoint(). And then updated the slot restart_lsn
with this corrected position.
Since XLogFindNextRecord() is currently disabled using #if 0, removed
this directive.
Or else, do you think we can just increment the record pointer by
doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
SizeOfXLogShortPHD() ?
I found out that we can't do this, because we don't know whether the
xlog header is SizeOfXLogShortPHD or SizeOfXLogLongPHD. In fact, in
our context, it is SizeOfXLogLongPHD. So we indeed need the
XLogReaderState handle.
Do you think that we can solve this using some other approach ? I am
not sure whether it's only the initial conditions that cause
lastReplayedEndRecPtr value to *not* point to a valid record, or is it
just a coincidence and that lastReplayedEndRecPtr can also have such a
value any time afterwards. If it's only possible initially, we can
just use GetRedoRecPtr() instead of lastReplayedEndRecPtr if
lastReplayedEndRecPtr is invalid.
So now as the v6 patch stands, lastReplayedEndRecPtr is used to set
the restart_lsn, but its position is later adjusted in
DecodingContextFindStartpoint().
Also, modified the test to handle the requirement that the logical
slot creation on standby requires a checkpoint (or any other
transaction commit) to be given from master. For that, in
src/test/perl/PostgresNode.pm, added a new function
create_logical_slot_on_standby() which does the reqiured steps.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v6.patchapplication/octet-stream; name=logical-decoding-on-standby_v6.patchDownload
From 0ec74a3ab5e1d728223fab2c018f5b8a0612848b Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Wed, 12 Jun 2019 17:18:42 +0530
Subject: [PATCH] Logical decoding on standby - v6.
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
3. Handle slot conflicts when master wal_level becomes less than logical.
Changes in this v6 patch :
While creating the slot, lastReplayedEndRecPtr is used to set the
restart_lsn, but its position is later adjusted in
DecodingContextFindStartpoint() in case it does not point to a
valid record location. This can happen because replay pointer
points to 1 + end of last record replayed, which means it can
coincide with first byte of a new WAL block, i.e. inside block
header.
Also, modified the test to handle the requirement that the
logical slot creation on standby requires a checkpoint
(or any other transaction commit) to be given from master. For
that, in src/test/perl/PostgresNode.pm, added a new function
create_logical_slot_on_standby() which does the reqiured steps.
-Amit Khandekar.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 3 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/access/transam/xlog.c | 20 ++
src/backend/access/transam/xlogreader.c | 4 -
src/backend/replication/logical/decode.c | 14 +-
src/backend/replication/logical/logical.c | 41 +++
src/backend/replication/slot.c | 131 ++++++-
src/backend/storage/ipc/standby.c | 7 +-
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/access/xlog.h | 1 +
src/include/access/xlogreader.h | 2 -
src/include/replication/slot.h | 2 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
src/test/perl/PostgresNode.pm | 27 ++
.../recovery/t/016_logical_decoding_on_replica.pl | 395 +++++++++++++++++++++
30 files changed, 683 insertions(+), 44 deletions(-)
create mode 100644 src/test/recovery/t/016_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..385ea1f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 5321762..e28465a 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8ac0f8a..0791a4e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7108,12 +7108,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7149,6 +7150,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7199,6 +7201,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7229,7 +7232,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7239,6 +7242,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7659,7 +7663,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7695,7 +7700,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7791,7 +7797,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7928,7 +7936,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..bf34d3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index de4d4ef..9b1231e 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 6532a25..b874bda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2b1662a..eaaf631 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e08320e..f092800 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4926,6 +4926,15 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file.
+ */
+int
+ControlFileWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,17 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+ /*
+ * Drop logical slots if we are in hot standby and master does not have
+ * logical data. Don't bother to search for the slots if standby is
+ * running with wal_level lower than logical, because in that case,
+ * we would have disallowed creation of logical slots.
+ */
+ if (InRecovery && InHotStandby &&
+ xlrec.wal_level < WAL_LEVEL_LOGICAL &&
+ wal_level >= WAL_LEVEL_LOGICAL)
+ ResolveRecoveryConflictWithSlots(InvalidOid, InvalidTransactionId);
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->MaxConnections = xlrec.MaxConnections;
ControlFile->max_worker_processes = xlrec.max_worker_processes;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 88be7fe..431a302 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -878,7 +878,6 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
return true;
}
-#ifdef FRONTEND
/*
* Functions that are currently not needed in the backend, but are better
* implemented inside xlogreader.c because of the internal facilities available
@@ -1003,9 +1002,6 @@ out:
return found;
}
-#endif /* FRONTEND */
-
-
/* ----------------------------------------
* Functions for decoding the data and block references in a record.
* ----------------------------------------
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..c1bd028 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index bbd38c0..9f6e0ac 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,24 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+ if (RecoveryInProgress())
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (ControlFileWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
+
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +129,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
@@ -241,6 +260,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
@@ -474,6 +495,26 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
(uint32) (slot->data.restart_lsn >> 32),
(uint32) slot->data.restart_lsn);
+ /*
+ * It is not guaranteed that the restart_lsn points to a valid
+ * record location. E.g. on standby, restart_lsn initially points to lastReplayedEndRecPtr,
+ * which is 1 + the end of last replayed record, which means it can point the next
+ * block header start. So bump it to the next valid record.
+ */
+ if (!XRecOffIsValid(startptr))
+ {
+ elog(DEBUG1, "Invalid restart lsn %X/%X",
+ (uint32) (startptr >> 32), (uint32) startptr);
+ startptr = XLogFindNextRecord(ctx->reader, startptr);
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = startptr;
+ SpinLockRelease(&slot->mutex);
+
+ elog(DEBUG1, "Moved slot restart lsn to %X/%X",
+ (uint32) (startptr >> 32), (uint32) startptr);
+ }
+
/* Wait for a consistent starting point */
for (;;)
{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 55c306e..7ffd264 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1016,37 +1016,37 @@ ReplicationSlotReserveWal(void)
/*
* For logical slots log a standby snapshot and start logical decoding
* at exactly that position. That allows the slot to start up more
- * quickly.
+ * quickly. But on a standby we cannot do WAL writes, so just use the
+ * replay pointer; effectively, an attempt to create a logical slot on
+ * standby will cause it to wait for an xl_running_xact record so that
+ * a snapshot can be built using the record.
*
- * That's not needed (or indeed helpful) for physical slots as they'll
- * start replay at the last logged checkpoint anyway. Instead return
- * the location of the last redo LSN. While that slightly increases
- * the chance that we have to retry, it's where a base backup has to
- * start replay at.
+ * None of this is needed (or indeed helpful) for physical slots as
+ * they'll start replay at the last logged checkpoint anyway. Instead
+ * return the location of the last redo LSN. While that slightly
+ * increases the chance that we have to retry, it's where a base backup
+ * has to start replay at.
*/
+
+ restart_lsn =
+ (SlotIsPhysical(slot) ? GetRedoRecPtr() :
+ (RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) :
+ GetXLogInsertRecPtr()));
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = restart_lsn;
+ SpinLockRelease(&slot->mutex);
+
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
XLogRecPtr flushptr;
- /* start at current insert position */
- restart_lsn = GetXLogInsertRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
-
/* make sure we have enough information to start */
flushptr = LogStandbySnapshot();
/* and make sure it's fsynced to disk */
XLogFlush(flushptr);
}
- else
- {
- restart_lsn = GetRedoRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
- }
/* prevent WAL removal as fast as possible */
ReplicationSlotsComputeRequiredLSN();
@@ -1065,6 +1065,99 @@ ReplicationSlotReserveWal(void)
}
/*
+ * Resolve recovery conflicts with slots.
+ *
+ * When xid is valid, it means it's a removed-xid kind of conflict, so need to
+ * drop the appropriate slots whose xmin conflicts with removed xid.
+ * When xid is invalid, drop all logical slots. This is required when the
+ * master wal_level is set back to replica, so existing logical slots need to
+ * be dropped.
+ */
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+ NameData slotname;
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* Invalid xid means caller is asking to drop all logical slots */
+ if (!TransactionIdIsValid(xid) && SlotIsLogical(s))
+ found_conflict = true;
+ else
+ {
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(LOG,
+ (errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_xmin, xid)));
+ }
+
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ {
+ found_conflict = true;
+
+ ereport(LOG,
+ (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+ NameStr(slotname), slot_catalog_xmin, xid)));
+ }
+
+ }
+ if (found_conflict)
+ {
+ elog(LOG, "Dropping conflicting slot %s", s->data.name.data);
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
+ ReplicationSlotDropPtr(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 25b7e31..93c4439 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index b4f2d0f..f4da4bc 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 969a537..59246c3 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -48,9 +48,9 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -96,6 +96,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f6cdca8..a1d1f11 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 073f740..d3dad69 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 237f4e0..fa02728 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern int ControlFileWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 04228e2..a5ffffc 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -215,9 +215,7 @@ extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
/* Invalidate read state */
extern void XLogReaderInvalReadState(XLogReaderState *state);
-#ifdef FRONTEND
extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
-#endif /* FRONTEND */
/* Functions for decoding an XLogRecord */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8bc7f52..522153a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index a3f8f82..6dedebc 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool onCatalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bf..579d9ff 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d7f33ab..8c90fd7 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 8d5ad6b..a9a1ac7 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2009,6 +2009,33 @@ sub pg_recvlogical_upto
=pod
+=item $node->create_logical_slot_on_standby(self, master, slot_name, dbname)
+
+Create logical replication slot on given standby
+
+=cut
+
+sub create_logical_slot_on_standby
+{
+ my ($self, $master, $slot_name, $dbname) = @_;
+ my ($stdout, $stderr);
+
+ my $handle;
+
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $self->connstr($dbname), '-P', 'test_decoding', '-S', $slot_name, '--create-slot'], '>', \$stdout, '2>', \$stderr);
+ sleep(1);
+
+ # Slot creation on standby waits for an xl_running_xacts record. So arrange
+ # for it.
+ $master->safe_psql('postgres', 'CHECKPOINT');
+
+ $handle->finish();
+
+ return 0;
+}
+
+=pod
+
=back
=cut
diff --git a/src/test/recovery/t/016_logical_decoding_on_replica.pl b/src/test/recovery/t/016_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..304f32a
--- /dev/null
+++ b/src/test/recovery/t/016_logical_decoding_on_replica.pl
@@ -0,0 +1,395 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 51;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_phys_mins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now that the standby's
+# slot doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be non-NULL since hs_feedback is on and
+# there is a logical slot present on standby.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NOT NULL");
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+is($node_replica->create_logical_slot_on_standby($node_master, 'dodropslot', 'testdb'),
+ 0, 'created dodropslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+is($node_replica->create_logical_slot_on_standby($node_master, 'otherslot', 'postgres'),
+ 0, 'created otherslot on postgres')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On Wed, 22 May 2019 at 15:05, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Tue, 9 Apr 2019 at 22:23, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Sat, 6 Apr 2019 at 04:45, Andres Freund <andres@anarazel.de> wrote:
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 006446b..5785d2f 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void) } }+void +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid) +{ + int i; + bool found_conflict = false; + + if (max_replication_slots <= 0) + return; + +restart: + if (found_conflict) + { + CHECK_FOR_INTERRUPTS(); + /* + * Wait awhile for them to die so that we avoid flooding an + * unresponsive backend when system is heavily loaded. + */ + pg_usleep(100000); + found_conflict = false; + } + + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); + for (i = 0; i < max_replication_slots; i++) + { + ReplicationSlot *s; + NameData slotname; + TransactionId slot_xmin; + TransactionId slot_catalog_xmin; + + s = &ReplicationSlotCtl->replication_slots[i]; + + /* cannot change while ReplicationSlotCtlLock is held */ + if (!s->in_use) + continue; + + /* not our database, skip */ + if (s->data.database != InvalidOid && s->data.database != dboid) + continue; + + SpinLockAcquire(&s->mutex); + slotname = s->data.name; + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + SpinLockRelease(&s->mutex); + + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid)) + { + found_conflict = true; + + ereport(WARNING, + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_xmin, xid))); + } + + if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + { + found_conflict = true; + + ereport(WARNING, + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_catalog_xmin, xid))); + } + + + if (found_conflict) + { + elog(WARNING, "Dropping conflicting slot %s", s->data.name.data); + LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */ + ReplicationSlotDropPtr(s); + + /* We released the lock above; so re-scan the slots. */ + goto restart; + } + }I think this should be refactored so that the two found_conflict cases
set a 'reason' variable (perhaps an enum?) to the particular reason, and
then only one warning should be emitted. I also think that LOG might be
more appropriate than WARNING - as confusing as that is, LOG is more
severe than WARNING (see docs about log_min_messages).What I have in mind is :
ereport(LOG,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("Dropping conflicting slot %s", s->data.name.data),
errdetail("%s, removed xid %d.", conflict_str, xid)));
where conflict_str is a dynamically generated string containing
something like : "slot xmin : 1234, slot catalog_xmin: 5678"
So for the user, the errdetail will look like :
"slot xmin: 1234, catalog_xmin: 5678, removed xid : 9012"
I think the user can figure out whether it was xmin or catalog_xmin or
both that conflicted with removed xid.
If we don't do this way, we may not be able to show in a single
message if both xmin and catalog_xmin are conflicting at the same
time.Does this message look good to you, or you had in mind something quite
different ?The above one is yet another point that needs to be concluded on. Till
then I will use the above way to display the error message in the
upcoming patch version.
Attached is v7 version that has the above changes regarding having a
single error message.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v7.patchapplication/octet-stream; name=logical-decoding-on-standby_v7.patchDownload
From 183355d4128f34488aef5b20ba4612d3fcbe358e Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Fri, 14 Jun 2019 16:46:41 +0530
Subject: [PATCH] Logical decoding on standby - v7.
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
3. Handle slot conflicts when master wal_level becomes less than logical.
Changes in v6 patch :
While creating the slot, lastReplayedEndRecPtr is used to set the
restart_lsn, but its position is later adjusted in
DecodingContextFindStartpoint() in case it does not point to a
valid record location. This can happen because replay pointer
points to 1 + end of last record replayed, which means it can
coincide with first byte of a new WAL block, i.e. inside block
header.
Also, modified the test to handle the requirement that the
logical slot creation on standby requires a checkpoint
(or any other transaction commit) to be given from master. For
that, in src/test/perl/PostgresNode.pm, added a new function
create_logical_slot_on_standby() which does the reqiured steps.
Changes in v7 patch :
Merge the two conflict messages for xmin and catalog_xmin into
a single one.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 3 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/access/transam/xlog.c | 21 ++
src/backend/access/transam/xlogreader.c | 4 -
src/backend/replication/logical/decode.c | 14 +-
src/backend/replication/logical/logical.c | 41 +++
src/backend/replication/slot.c | 146 +++++++-
src/backend/storage/ipc/standby.c | 7 +-
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/access/xlog.h | 1 +
src/include/access/xlogreader.h | 2 -
src/include/replication/slot.h | 2 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
src/test/perl/PostgresNode.pm | 27 ++
.../recovery/t/018_logical_decoding_on_replica.pl | 395 +++++++++++++++++++++
30 files changed, 699 insertions(+), 44 deletions(-)
create mode 100644 src/test/recovery/t/018_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..385ea1f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 5321762..e28465a 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a775760..58ec991 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7150,12 +7150,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7191,6 +7192,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7241,6 +7243,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7271,7 +7274,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7281,6 +7284,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7701,7 +7705,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7737,7 +7742,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7833,7 +7839,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7970,7 +7978,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..bf34d3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index de4d4ef..9b1231e 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 6532a25..b874bda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2b1662a..eaaf631 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e08320e..78d3ad1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4926,6 +4926,15 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file.
+ */
+int
+ControlFileWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,18 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+ /*
+ * Drop logical slots if we are in hot standby and master does not have
+ * logical data. Don't bother to search for the slots if standby is
+ * running with wal_level lower than logical, because in that case,
+ * we would have disallowed creation of logical slots.
+ */
+ if (InRecovery && InHotStandby &&
+ xlrec.wal_level < WAL_LEVEL_LOGICAL &&
+ wal_level >= WAL_LEVEL_LOGICAL)
+ ResolveRecoveryConflictWithSlots(InvalidOid, InvalidTransactionId,
+ gettext_noop("logical decoding on standby requires wal_level >= logical on master"));
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->MaxConnections = xlrec.MaxConnections;
ControlFile->max_worker_processes = xlrec.max_worker_processes;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 88be7fe..431a302 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -878,7 +878,6 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
return true;
}
-#ifdef FRONTEND
/*
* Functions that are currently not needed in the backend, but are better
* implemented inside xlogreader.c because of the internal facilities available
@@ -1003,9 +1002,6 @@ out:
return found;
}
-#endif /* FRONTEND */
-
-
/* ----------------------------------------
* Functions for decoding the data and block references in a record.
* ----------------------------------------
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..c1bd028 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index bbd38c0..9f6e0ac 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,24 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+ if (RecoveryInProgress())
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (ControlFileWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
+
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +129,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
@@ -241,6 +260,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
@@ -474,6 +495,26 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
(uint32) (slot->data.restart_lsn >> 32),
(uint32) slot->data.restart_lsn);
+ /*
+ * It is not guaranteed that the restart_lsn points to a valid
+ * record location. E.g. on standby, restart_lsn initially points to lastReplayedEndRecPtr,
+ * which is 1 + the end of last replayed record, which means it can point the next
+ * block header start. So bump it to the next valid record.
+ */
+ if (!XRecOffIsValid(startptr))
+ {
+ elog(DEBUG1, "Invalid restart lsn %X/%X",
+ (uint32) (startptr >> 32), (uint32) startptr);
+ startptr = XLogFindNextRecord(ctx->reader, startptr);
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = startptr;
+ SpinLockRelease(&slot->mutex);
+
+ elog(DEBUG1, "Moved slot restart lsn to %X/%X",
+ (uint32) (startptr >> 32), (uint32) startptr);
+ }
+
/* Wait for a consistent starting point */
for (;;)
{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 55c306e..8c8d174 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1016,37 +1016,37 @@ ReplicationSlotReserveWal(void)
/*
* For logical slots log a standby snapshot and start logical decoding
* at exactly that position. That allows the slot to start up more
- * quickly.
+ * quickly. But on a standby we cannot do WAL writes, so just use the
+ * replay pointer; effectively, an attempt to create a logical slot on
+ * standby will cause it to wait for an xl_running_xact record so that
+ * a snapshot can be built using the record.
*
- * That's not needed (or indeed helpful) for physical slots as they'll
- * start replay at the last logged checkpoint anyway. Instead return
- * the location of the last redo LSN. While that slightly increases
- * the chance that we have to retry, it's where a base backup has to
- * start replay at.
+ * None of this is needed (or indeed helpful) for physical slots as
+ * they'll start replay at the last logged checkpoint anyway. Instead
+ * return the location of the last redo LSN. While that slightly
+ * increases the chance that we have to retry, it's where a base backup
+ * has to start replay at.
*/
+
+ restart_lsn =
+ (SlotIsPhysical(slot) ? GetRedoRecPtr() :
+ (RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) :
+ GetXLogInsertRecPtr()));
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = restart_lsn;
+ SpinLockRelease(&slot->mutex);
+
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
XLogRecPtr flushptr;
- /* start at current insert position */
- restart_lsn = GetXLogInsertRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
-
/* make sure we have enough information to start */
flushptr = LogStandbySnapshot();
/* and make sure it's fsynced to disk */
XLogFlush(flushptr);
}
- else
- {
- restart_lsn = GetRedoRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
- }
/* prevent WAL removal as fast as possible */
ReplicationSlotsComputeRequiredLSN();
@@ -1065,6 +1065,114 @@ ReplicationSlotReserveWal(void)
}
/*
+ * Resolve recovery conflicts with slots.
+ *
+ * When xid is valid, it means it's a removed-xid kind of conflict, so need to
+ * drop the appropriate slots whose xmin conflicts with removed xid.
+ * When xid is invalid, drop all logical slots. This is required when the
+ * master wal_level is set back to replica, so existing logical slots need to
+ * be dropped. Also, when xid is invalid, a common 'reason' is provided for the
+ * error detail; otherwise reason is NULL.
+ */
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid, char *reason)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* Invalid xid means caller is asking to drop all logical slots */
+ if (!TransactionIdIsValid(xid) && SlotIsLogical(s))
+ found_conflict = true;
+ else
+ {
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+ StringInfoData conflict_str;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ /*
+ * Build the conflict_str which will look like :
+ * "slot xmin: 1234, catalog_xmin: 5678, removed xid : 9012"
+ */
+ initStringInfo(&conflict_str);
+ if (TransactionIdIsValid(slot_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ appendStringInfo(&conflict_str, "slot xmin: %d", slot_xmin);
+
+ if (TransactionIdIsValid(slot_catalog_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ appendStringInfo(&conflict_str, "%sslot catalog_xmin: %d",
+ conflict_str.len > 0 ? ", " : "",
+ slot_catalog_xmin);
+
+ if (conflict_str.len > 0)
+ {
+ appendStringInfo(&conflict_str, ", %s xid : %d",
+ gettext_noop("removed"), xid);
+ found_conflict = true;
+ reason = conflict_str.data;
+ }
+ }
+
+ if (found_conflict)
+ {
+ NameData slotname;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ SpinLockRelease(&s->mutex);
+
+ ereport(LOG,
+ (errmsg("Dropping conflicting slot %s", NameStr(slotname)),
+ errdetail("%s", reason)));
+
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
+ ReplicationSlotDropPtr(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 25b7e31..a45345c 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid, NULL);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c13c08a..bd35bc1 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 969a537..59246c3 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -48,9 +48,9 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -96,6 +96,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f6cdca8..a1d1f11 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 073f740..d3dad69 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 237f4e0..fa02728 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern int ControlFileWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 04228e2..a5ffffc 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -215,9 +215,7 @@ extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
/* Invalidate read state */
extern void XLogReaderInvalReadState(XLogReaderState *state);
-#ifdef FRONTEND
extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
-#endif /* FRONTEND */
/* Functions for decoding an XLogRecord */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8fbddea..3a90aac 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid, char *reason);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index a3f8f82..6dedebc 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool onCatalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bf..579d9ff 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d7f33ab..8c90fd7 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 8d5ad6b..a9a1ac7 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2009,6 +2009,33 @@ sub pg_recvlogical_upto
=pod
+=item $node->create_logical_slot_on_standby(self, master, slot_name, dbname)
+
+Create logical replication slot on given standby
+
+=cut
+
+sub create_logical_slot_on_standby
+{
+ my ($self, $master, $slot_name, $dbname) = @_;
+ my ($stdout, $stderr);
+
+ my $handle;
+
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $self->connstr($dbname), '-P', 'test_decoding', '-S', $slot_name, '--create-slot'], '>', \$stdout, '2>', \$stderr);
+ sleep(1);
+
+ # Slot creation on standby waits for an xl_running_xacts record. So arrange
+ # for it.
+ $master->safe_psql('postgres', 'CHECKPOINT');
+
+ $handle->finish();
+
+ return 0;
+}
+
+=pod
+
=back
=cut
diff --git a/src/test/recovery/t/018_logical_decoding_on_replica.pl b/src/test/recovery/t/018_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..304f32a
--- /dev/null
+++ b/src/test/recovery/t/018_logical_decoding_on_replica.pl
@@ -0,0 +1,395 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 51;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_phys_mins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now that the standby's
+# slot doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be non-NULL since hs_feedback is on and
+# there is a logical slot present on standby.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NOT NULL");
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+is($node_replica->create_logical_slot_on_standby($node_master, 'dodropslot', 'testdb'),
+ 0, 'created dodropslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+is($node_replica->create_logical_slot_on_standby($node_master, 'otherslot', 'postgres'),
+ 0, 'created otherslot on postgres')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On Wed, 12 Jun 2019 at 00:06, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-May-23, Andres Freund wrote:
On 2019-05-23 09:37:50 -0400, Robert Haas wrote:
On Thu, May 23, 2019 at 9:30 AM Sergei Kornilov <sk@zsrv.org> wrote:
wal_level is PGC_POSTMASTER.
But primary can be restarted without restart on standby. We require wal_level replica or highter (currently only logical) on standby. So online change from logical to replica wal_level is possible on standby's controlfile.
That's true, but Amit's scenario involved a change in wal_level during
the execution of pg_create_logical_replication_slot(), which I think
can't happen.I don't see why not - we're talking about the wal_level in the WAL
stream, not the setting on the standby. And that can change during the
execution of pg_create_logical_replication_slot(), if a PARAMTER_CHANGE
record is replayed. I don't think it's actually a problem, as I
outlined in my response to Amit, though.I don't know if this is directly relevant, but in commit_ts.c we go to
great lengths to ensure that things continue to work across restarts and
changes of the GUC in the primary, by decoupling activation and
deactivation of the module from start-time initialization. Maybe that
idea is applicable for this too?
We do kind of handle change in wal_level differently at run-time
versus at initialization. E.g. we drop the existing slots if the
wal_level becomes less than logical. But I think we don't have to do a
significant work unlike how it seems to have been done in
ActivateCommitTs when commit_ts is activated.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi,
On 2019-06-12 17:30:02 +0530, Amit Khandekar wrote:
On Tue, 11 Jun 2019 at 12:24, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Mon, 10 Jun 2019 at 10:37, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Since this requires the test to handle the
fire-create-slot-and-then-fire-checkpoint-from-master actions, I was
modifying the test file to do this. After doing that, I found that the
slave gets an assertion failure in XLogReadRecord()=>XRecOffIsValid().
This happens only when the restart_lsn is set to ReplayRecPtr.
Somehow, this does not happen when I manually create the logical slot.
It happens only while running testcase. Working on it ...Like I mentioned above, I get an assertion failure for
Assert(XRecOffIsValid(RecPtr)) while reading WAL records looking for a
start position (DecodingContextFindStartpoint()). This is because in
CreateInitDecodingContext()=>ReplicationSlotReserveWal(), I now set
the logical slot's restart_lsn to XLogCtl->lastReplayedEndRecPtr. And
just after bringing up slave, lastReplayedEndRecPtr's initial values
are in this order : 0/2000028, 0/2000060, 0/20000D8, 0/2000100,
0/3000000, 0/3000060. You can see that 0/3000000 is not a valid value
because it points to the start of a WAL block, meaning it points to
the XLog page header (I think it's possible because it is 1 + endof
last replayed record, which can be start of next block). So when we
try to create a slot when it's in that position, then XRecOffIsValid()
fails while looking for a starting point.One option I considered was : If lastReplayedEndRecPtr points to XLog
page header, get a position of the first record on that WAL block,
probably with XLogFindNextRecord(). But it is not trivial because
while in ReplicationSlotReserveWal(), XLogReaderState is not created
yet.In the attached v6 version of the patch, I did the above. That is, I
used XLogFindNextRecord() to bump up the restart_lsn of the slot to
the first valid record. But since XLogReaderState is not available in
ReplicationSlotReserveWal(), I did this in
DecodingContextFindStartpoint(). And then updated the slot restart_lsn
with this corrected position.
Since XLogFindNextRecord() is currently disabled using #if 0, removed
this directive.
Well, ifdef FRONTEND. I don't think that's a problem. It's a bit
overkill here, because I think we know the address has to be on a record
boundary (rather than being in the middle of a page spanning WAL
record). So we could just add add the size of the header manually - but
I think that's not worth doing.
Or else, do you think we can just increment the record pointer by
doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
SizeOfXLogShortPHD() ?I found out that we can't do this, because we don't know whether the
xlog header is SizeOfXLogShortPHD or SizeOfXLogLongPHD. In fact, in
our context, it is SizeOfXLogLongPHD. So we indeed need the
XLogReaderState handle.
Well, we can determine whether a long or a short header is going to be
used, as that's solely dependent on the LSN:
/*
* If first page of an XLOG segment file, make it a long header.
*/
if ((XLogSegmentOffset(NewPage->xlp_pageaddr, wal_segment_size)) == 0)
{
XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;
NewLongPage->xlp_sysid = ControlFile->system_identifier;
NewLongPage->xlp_seg_size = wal_segment_size;
NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
NewPage->xlp_info |= XLP_LONG_HEADER;
}
but I don't think that's worth it.
Do you think that we can solve this using some other approach ? I am
not sure whether it's only the initial conditions that cause
lastReplayedEndRecPtr value to *not* point to a valid record, or is it
just a coincidence and that lastReplayedEndRecPtr can also have such a
value any time afterwards.
It's always possible. All that means is that the last record filled the
entire last WAL page.
If it's only possible initially, we can
just use GetRedoRecPtr() instead of lastReplayedEndRecPtr if
lastReplayedEndRecPtr is invalid.
I don't think so? The redo pointer will point to something *much*
earlier, where we'll not yet have done all the necessary conflict
handling during recovery? So we'd not necessarily notice that a slot
is not actually usable for decoding.
We could instead just handle that by starting decoding at the redo
pointer, and just ignore all WAL records until they're after
lastReplayedEndRecPtr, but that has no advantages, and will read a lot
more WAL.
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*//* XLOG stuff */ + xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid); xlrec_reuse.node = rel->rd_node; xlrec_reuse.block = blkno; xlrec_reuse.latestRemovedXid = latestRemovedXid; @@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf, XLogRecPtr recptr; xl_btree_delete xlrec_delete;+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
Can we instead pass the heap rel down to here? I think there's only one
caller, and it has the heap relation available these days (it didn't at
the time of the prototype, possibly). There's a few other users of
get_rel_logical_catalog() where that might be harder, but it's easy
here.
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
This one seems harder, but I'm not actually sure why we make it so
hard. It seems like we just ought to add the table to IndexVacuumInfo.
/* + * Get the wal_level from the control file. + */ +int +ControlFileWalLevel(void) +{ + return ControlFile->wal_level; +}
Any reason not to return the type enum WalLevel instead? I'm not sure I
like the function name - perhaps something like GetActiveWalLevel() or
such? The fact that it's in the control file doesn't seem relevant
here. I think it should be close to DataChecksumsEnabled() etc, which
all return information from the control file.
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,17 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));+ /* + * Drop logical slots if we are in hot standby and master does not have + * logical data. Don't bother to search for the slots if standby is + * running with wal_level lower than logical, because in that case, + * we would have disallowed creation of logical slots. + */
s/disallowed creation/disallowed creation or previously dropped/
+ if (InRecovery && InHotStandby && + xlrec.wal_level < WAL_LEVEL_LOGICAL && + wal_level >= WAL_LEVEL_LOGICAL) + ResolveRecoveryConflictWithSlots(InvalidOid, InvalidTransactionId); + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); ControlFile->MaxConnections = xlrec.MaxConnections; ControlFile->max_worker_processes = xlrec.max_worker_processes;
Not for this patch, but I kinda feel the individual replay routines
ought to be broken out of xlog_redo().
/* ---------------------------------------- * Functions for decoding the data and block references in a record. * ---------------------------------------- diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 151c3ef..c1bd028 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) * can restart from there. */ break; + case XLOG_PARAMETER_CHANGE: + { + xl_parameter_change *xlrec = + (xl_parameter_change *) XLogRecGetData(buf->record); + + /* Cannot proceed if master itself does not have logical data */ + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + break; + }
This should also HINT to drop the replication slot.
+ /* + * It is not guaranteed that the restart_lsn points to a valid + * record location. E.g. on standby, restart_lsn initially points to lastReplayedEndRecPtr, + * which is 1 + the end of last replayed record, which means it can point the next + * block header start. So bump it to the next valid record. + */
I'd rephrase this as something like:
restart_lsn initially may point one past the end of the record. If that
is a XLOG page boundary, it will not be a valid LSN for the start of a
record. If that's the case, look for the start of the first record.
+ if (!XRecOffIsValid(startptr))
+ {
Hm, could you before this add an Assert(startptr != InvalidXLogRecPtr)
or such?
+ elog(DEBUG1, "Invalid restart lsn %X/%X", + (uint32) (startptr >> 32), (uint32) startptr); + startptr = XLogFindNextRecord(ctx->reader, startptr); + + SpinLockAcquire(&slot->mutex); + slot->data.restart_lsn = startptr; + SpinLockRelease(&slot->mutex); + elog(DEBUG1, "Moved slot restart lsn to %X/%X", + (uint32) (startptr >> 32), (uint32) startptr); + }
Minor nit: normally debug messages don't start with upper case.
/* Wait for a consistent starting point */ for (;;) { diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 55c306e..7ffd264 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1016,37 +1016,37 @@ ReplicationSlotReserveWal(void) /* * For logical slots log a standby snapshot and start logical decoding * at exactly that position. That allows the slot to start up more - * quickly. + * quickly. But on a standby we cannot do WAL writes, so just use the + * replay pointer; effectively, an attempt to create a logical slot on + * standby will cause it to wait for an xl_running_xact record so that + * a snapshot can be built using the record.
I'd add "to be logged independently on the primary" after "wait for an
xl_running_xact record".
- * That's not needed (or indeed helpful) for physical slots as they'll - * start replay at the last logged checkpoint anyway. Instead return - * the location of the last redo LSN. While that slightly increases - * the chance that we have to retry, it's where a base backup has to - * start replay at. + * None of this is needed (or indeed helpful) for physical slots as + * they'll start replay at the last logged checkpoint anyway. Instead + * return the location of the last redo LSN. While that slightly + * increases the chance that we have to retry, it's where a base backup + * has to start replay at. */ + + restart_lsn = + (SlotIsPhysical(slot) ? GetRedoRecPtr() : + (RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) : + GetXLogInsertRecPtr()));
Please rewrite this to use normal if blocks. I'm also not convinced that
it's useful to have this if block, and then another if block that
basically tests the same conditions again.
+ SpinLockAcquire(&slot->mutex); + slot->data.restart_lsn = restart_lsn; + SpinLockRelease(&slot->mutex); + if (!RecoveryInProgress() && SlotIsLogical(slot)) { XLogRecPtr flushptr;- /* start at current insert position */
- restart_lsn = GetXLogInsertRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
-
/* make sure we have enough information to start */
flushptr = LogStandbySnapshot();/* and make sure it's fsynced to disk */
XLogFlush(flushptr);
}
- else
- {
- restart_lsn = GetRedoRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
- }
/* + * Resolve recovery conflicts with slots. + * + * When xid is valid, it means it's a removed-xid kind of conflict, so need to + * drop the appropriate slots whose xmin conflicts with removed xid.
I don't think "removed-xid kind of conflict" is that descriptive. I'd
suggest something like "When xid is valid, it means that rows older than
xid might have been removed. Therefore we need to drop slots that depend
on seeing those rows."
+ * When xid is invalid, drop all logical slots. This is required when the + * master wal_level is set back to replica, so existing logical slots need to + * be dropped. + */ +void +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid) +{ + int i; + bool found_conflict = false; + + if (max_replication_slots <= 0) + return; + +restart: + if (found_conflict) + { + CHECK_FOR_INTERRUPTS(); + /* + * Wait awhile for them to die so that we avoid flooding an + * unresponsive backend when system is heavily loaded. + */ + pg_usleep(100000); + found_conflict = false; + }
Hm, I wonder if we could use the condition variable the slot
infrastructure has these days for this instead.
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); + for (i = 0; i < max_replication_slots; i++) + { + ReplicationSlot *s; + NameData slotname; + TransactionId slot_xmin; + TransactionId slot_catalog_xmin; + + s = &ReplicationSlotCtl->replication_slots[i]; + + /* cannot change while ReplicationSlotCtlLock is held */ + if (!s->in_use) + continue; + + /* Invalid xid means caller is asking to drop all logical slots */ + if (!TransactionIdIsValid(xid) && SlotIsLogical(s)) + found_conflict = true;
I'd just add
if (!SlotIsLogical(s))
continue;
because all of this doesn't need to happen for slots that aren't
logical.
+ else + { + /* not our database, skip */ + if (s->data.database != InvalidOid && s->data.database != dboid) + continue; + + SpinLockAcquire(&s->mutex); + slotname = s->data.name; + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + SpinLockRelease(&s->mutex); + + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid)) + { + found_conflict = true; + + ereport(LOG, + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_xmin, xid))); + }
s/removed xid/xid horizon being increased to %u/
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + { + found_conflict = true; + + ereport(LOG, + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_catalog_xmin, xid))); + } + + } + if (found_conflict) + {
Hm, as far as I can tell you just ignore that the slot might currently
be in use. You can't just drop a slot that somebody is using. I think
you need to send a recovery conflict to that backend.
I guess the easiest way to do that would be something roughly like:
SetInvalidVirtualTransactionId(vxid);
LWLockAcquire(ProcArrayLock, LW_SHARED);
cancel_proc = BackendPidGetProcWithLock(active_pid);
if (cancel_proc)
vxid = GET_VXID_FROM_PGPROC(cancel_proc);
LWLockRelease(ProcArrayLock);
if (VirtualTransactionIdIsValid(vixd))
{
CancelVirtualTransaction(vxid);
/* Wait here until we get signaled, and then restart */
ConditionVariableSleep(&slot->active_cv,
WAIT_EVENT_REPLICATION_SLOT_DROP);
}
ConditionVariableCancelSleep();
when the slot is currently active. Part of this would need to be split
into a procarray.c helper function (mainly all the stuff dealing with
ProcArrayLock).
+ elog(LOG, "Dropping conflicting slot %s", s->data.name.data);
This definitely needs to be expanded, and follow the message style
guideline.
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
Instead of saying "deadlock" I'd just say that ReplicationSlotDropPtr()
will acquire that lock.
+ ReplicationSlotDropPtr(s);
But more importantly, I don't think this is
correct. ReplicationSlotDropPtr() assumes that the to-be-dropped slot is
acquired by the current backend - without that somebody else could
concurrently acquire that slot.
SO I think you need to do something like ReplicationSlotsDropDBSlots()
does:
/* acquire slot, so ReplicationSlotDropAcquired can be reused */
SpinLockAcquire(&s->mutex);
/* can't change while ReplicationSlotControlLock is held */
slotname = NameStr(s->data.name);
active_pid = s->active_pid;
if (active_pid == 0)
{
MyReplicationSlot = s;
s->active_pid = MyProcPid;
}
SpinLockRelease(&s->mutex);
Greetings,
Andres Freund
I am yet to work on Andres's latest detailed review comments, but I
thought before that, I should submit a patch for the below reported
issue because I was almost ready with the fix. Now I will start to
work on Andres's comments, for which I will reply separately.
On Fri, 1 Mar 2019 at 13:33, tushar <tushar.ahuja@enterprisedb.com> wrote:
Hi,
While testing this feature found that - if lots of insert happened on
the master cluster then pg_recvlogical is not showing the DATA
information on logical replication slot which created on SLAVE.Please refer this scenario -
1)
Create a Master cluster with wal_level=logcal and create logical
replication slot -
SELECT * FROM pg_create_logical_replication_slot('master_slot',
'test_decoding');2)
Create a Standby cluster using pg_basebackup ( ./pg_basebackup -D
slave/ -v -R) and create logical replication slot -
SELECT * FROM pg_create_logical_replication_slot('standby_slot',
'test_decoding');3)
X terminal - start pg_recvlogical , provide port=5555 ( slave
cluster) and specify slot=standby_slot
./pg_recvlogical -d postgres -p 5555 -s 1 -F 1 -v --slot=standby_slot
--start -f -Y terminal - start pg_recvlogical , provide port=5432 ( master
cluster) and specify slot=master_slot
./pg_recvlogical -d postgres -p 5432 -s 1 -F 1 -v --slot=master_slot
--start -f -Z terminal - run pg_bench against Master cluster ( ./pg_bench -i -s 10
postgres)Able to see DATA information on Y terminal but not on X.
but same able to see by firing this below query on SLAVE cluster -
SELECT * FROM pg_logical_slot_get_changes('standby_slot', NULL, NULL);
Is it expected ?
Actually it shows up records after quite a long time. In general,
walsender on standby is sending each record after significant time (1
sec), and pg_recvlogical shows all the inserted records only after the
commit, so for huge inserts, it looks like it is hanging forever.
In XLogSendLogical(), GetFlushRecPtr() was used to get the flushed
point. On standby, GetFlushRecPtr() does not give a valid value, so it
was wrongly determined that the sent record is beyond flush point, as
a result of which, WalSndCaughtUp was set to true, causing
WalSndLoop() to sleep for some duration after every record. This is
why pg_recvlogical appears to be hanging forever in case of huge
number of rows inserted.
Fix : Use GetStandbyFlushRecPtr() if am_cascading_walsender.
Attached patch v8.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v8.patchapplication/octet-stream; name=logical-decoding-on-standby_v8.patchDownload
From bc92aff893a63eb04912b93e980b18984b939135 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Thu, 20 Jun 2019 15:17:26 +0530
Subject: [PATCH] Logical decoding on standby - v8.
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
3. Handle slot conflicts when master wal_level becomes less than logical.
Changes in v6 patch :
While creating the slot, lastReplayedEndRecPtr is used to set the
restart_lsn, but its position is later adjusted in
DecodingContextFindStartpoint() in case it does not point to a
valid record location. This can happen because replay pointer
points to 1 + end of last record replayed, which means it can
coincide with first byte of a new WAL block, i.e. inside block
header.
Also, modified the test to handle the requirement that the
logical slot creation on standby requires a checkpoint
(or any other transaction commit) to be given from master. For
that, in src/test/perl/PostgresNode.pm, added a new function
create_logical_slot_on_standby() which does the reqiured steps.
Changes in v7 patch :
Merge the two conflict messages for xmin and catalog_xmin into
a single one.
Changes in v8 :
Fix incorrect flush ptr on standby.
In XLogSendLogical(), GetFlushRecPtr() was used to get the flushed
point. On standby, GetFlushRecPtr() does not give a valid value, so it
was wrongly determined that the sent record is beyond flush point, as
a result of which, WalSndCaughtUp was set to true, causing
WalSndLoop() to sleep for some duration after every record.
This was reported by Tushar Ahuja, where pg_recvlogical seems like it
is hanging when there are loads of insert.
Fix: Use GetStandbyFlushRecPtr() if am_cascading_walsender.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 3 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/access/transam/xlog.c | 21 ++
src/backend/access/transam/xlogreader.c | 4 -
src/backend/replication/logical/decode.c | 14 +-
src/backend/replication/logical/logical.c | 41 +++
src/backend/replication/slot.c | 146 +++++++-
src/backend/replication/walsender.c | 8 +-
src/backend/storage/ipc/standby.c | 7 +-
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/access/xlog.h | 1 +
src/include/access/xlogreader.h | 2 -
src/include/replication/slot.h | 2 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
src/test/perl/PostgresNode.pm | 27 ++
.../recovery/t/018_logical_decoding_on_replica.pl | 395 +++++++++++++++++++++
31 files changed, 704 insertions(+), 47 deletions(-)
create mode 100644 src/test/recovery/t/018_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..385ea1f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 5321762..e28465a 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d768b9b..10b7857 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7240,6 +7242,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7270,7 +7273,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7280,6 +7283,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7700,7 +7704,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7736,7 +7741,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7832,7 +7838,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7969,7 +7977,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..bf34d3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index de4d4ef..9b1231e 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 6532a25..b874bda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2b1662a..eaaf631 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e08320e..78d3ad1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4926,6 +4926,15 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file.
+ */
+int
+ControlFileWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,18 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+ /*
+ * Drop logical slots if we are in hot standby and master does not have
+ * logical data. Don't bother to search for the slots if standby is
+ * running with wal_level lower than logical, because in that case,
+ * we would have disallowed creation of logical slots.
+ */
+ if (InRecovery && InHotStandby &&
+ xlrec.wal_level < WAL_LEVEL_LOGICAL &&
+ wal_level >= WAL_LEVEL_LOGICAL)
+ ResolveRecoveryConflictWithSlots(InvalidOid, InvalidTransactionId,
+ gettext_noop("logical decoding on standby requires wal_level >= logical on master"));
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->MaxConnections = xlrec.MaxConnections;
ControlFile->max_worker_processes = xlrec.max_worker_processes;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 88be7fe..431a302 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -878,7 +878,6 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
return true;
}
-#ifdef FRONTEND
/*
* Functions that are currently not needed in the backend, but are better
* implemented inside xlogreader.c because of the internal facilities available
@@ -1003,9 +1002,6 @@ out:
return found;
}
-#endif /* FRONTEND */
-
-
/* ----------------------------------------
* Functions for decoding the data and block references in a record.
* ----------------------------------------
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..c1bd028 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index bbd38c0..9f6e0ac 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,24 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+ if (RecoveryInProgress())
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (ControlFileWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
+
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +129,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
@@ -241,6 +260,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
@@ -474,6 +495,26 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
(uint32) (slot->data.restart_lsn >> 32),
(uint32) slot->data.restart_lsn);
+ /*
+ * It is not guaranteed that the restart_lsn points to a valid
+ * record location. E.g. on standby, restart_lsn initially points to lastReplayedEndRecPtr,
+ * which is 1 + the end of last replayed record, which means it can point the next
+ * block header start. So bump it to the next valid record.
+ */
+ if (!XRecOffIsValid(startptr))
+ {
+ elog(DEBUG1, "Invalid restart lsn %X/%X",
+ (uint32) (startptr >> 32), (uint32) startptr);
+ startptr = XLogFindNextRecord(ctx->reader, startptr);
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = startptr;
+ SpinLockRelease(&slot->mutex);
+
+ elog(DEBUG1, "Moved slot restart lsn to %X/%X",
+ (uint32) (startptr >> 32), (uint32) startptr);
+ }
+
/* Wait for a consistent starting point */
for (;;)
{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 55c306e..8c8d174 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1016,37 +1016,37 @@ ReplicationSlotReserveWal(void)
/*
* For logical slots log a standby snapshot and start logical decoding
* at exactly that position. That allows the slot to start up more
- * quickly.
+ * quickly. But on a standby we cannot do WAL writes, so just use the
+ * replay pointer; effectively, an attempt to create a logical slot on
+ * standby will cause it to wait for an xl_running_xact record so that
+ * a snapshot can be built using the record.
*
- * That's not needed (or indeed helpful) for physical slots as they'll
- * start replay at the last logged checkpoint anyway. Instead return
- * the location of the last redo LSN. While that slightly increases
- * the chance that we have to retry, it's where a base backup has to
- * start replay at.
+ * None of this is needed (or indeed helpful) for physical slots as
+ * they'll start replay at the last logged checkpoint anyway. Instead
+ * return the location of the last redo LSN. While that slightly
+ * increases the chance that we have to retry, it's where a base backup
+ * has to start replay at.
*/
+
+ restart_lsn =
+ (SlotIsPhysical(slot) ? GetRedoRecPtr() :
+ (RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) :
+ GetXLogInsertRecPtr()));
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = restart_lsn;
+ SpinLockRelease(&slot->mutex);
+
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
XLogRecPtr flushptr;
- /* start at current insert position */
- restart_lsn = GetXLogInsertRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
-
/* make sure we have enough information to start */
flushptr = LogStandbySnapshot();
/* and make sure it's fsynced to disk */
XLogFlush(flushptr);
}
- else
- {
- restart_lsn = GetRedoRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
- }
/* prevent WAL removal as fast as possible */
ReplicationSlotsComputeRequiredLSN();
@@ -1065,6 +1065,114 @@ ReplicationSlotReserveWal(void)
}
/*
+ * Resolve recovery conflicts with slots.
+ *
+ * When xid is valid, it means it's a removed-xid kind of conflict, so need to
+ * drop the appropriate slots whose xmin conflicts with removed xid.
+ * When xid is invalid, drop all logical slots. This is required when the
+ * master wal_level is set back to replica, so existing logical slots need to
+ * be dropped. Also, when xid is invalid, a common 'reason' is provided for the
+ * error detail; otherwise reason is NULL.
+ */
+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid, char *reason)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ /*
+ * Wait awhile for them to die so that we avoid flooding an
+ * unresponsive backend when system is heavily loaded.
+ */
+ pg_usleep(100000);
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* Invalid xid means caller is asking to drop all logical slots */
+ if (!TransactionIdIsValid(xid) && SlotIsLogical(s))
+ found_conflict = true;
+ else
+ {
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+ StringInfoData conflict_str;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ /*
+ * Build the conflict_str which will look like :
+ * "slot xmin: 1234, catalog_xmin: 5678, removed xid : 9012"
+ */
+ initStringInfo(&conflict_str);
+ if (TransactionIdIsValid(slot_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ appendStringInfo(&conflict_str, "slot xmin: %d", slot_xmin);
+
+ if (TransactionIdIsValid(slot_catalog_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ appendStringInfo(&conflict_str, "%sslot catalog_xmin: %d",
+ conflict_str.len > 0 ? ", " : "",
+ slot_catalog_xmin);
+
+ if (conflict_str.len > 0)
+ {
+ appendStringInfo(&conflict_str, ", %s xid : %d",
+ gettext_noop("removed"), xid);
+ found_conflict = true;
+ reason = conflict_str.data;
+ }
+ }
+
+ if (found_conflict)
+ {
+ NameData slotname;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ SpinLockRelease(&s->mutex);
+
+ ereport(LOG,
+ (errmsg("Dropping conflicting slot %s", NameStr(slotname)),
+ errdetail("%s", reason)));
+
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
+ ReplicationSlotDropPtr(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 92fa86f..4ce7096 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2814,6 +2814,7 @@ XLogSendLogical(void)
{
XLogRecord *record;
char *errm;
+ XLogRecPtr flushPtr;
/*
* Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2830,10 +2831,11 @@ XLogSendLogical(void)
if (errm != NULL)
elog(ERROR, "%s", errm);
+ flushPtr = (am_cascading_walsender ?
+ GetStandbyFlushRecPtr() : GetFlushRecPtr());
+
if (record != NULL)
{
- /* XXX: Note that logical decoding cannot be used while in recovery */
- XLogRecPtr flushPtr = GetFlushRecPtr();
/*
* Note the lack of any call to LagTrackerWrite() which is handled by
@@ -2857,7 +2859,7 @@ XLogSendLogical(void)
* If the record we just wanted read is at or beyond the flushed
* point, then we're caught up.
*/
- if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
+ if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
{
WalSndCaughtUp = true;
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 25b7e31..a45345c 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithSlots(node.dbNode, latestRemovedXid, NULL);
}
void
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c13c08a..bd35bc1 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 969a537..59246c3 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -48,9 +48,9 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -96,6 +96,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f6cdca8..a1d1f11 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 073f740..d3dad69 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 237f4e0..fa02728 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern int ControlFileWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 04228e2..a5ffffc 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -215,9 +215,7 @@ extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
/* Invalidate read state */
extern void XLogReaderInvalReadState(XLogReaderState *state);
-#ifdef FRONTEND
extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
-#endif /* FRONTEND */
/* Functions for decoding an XLogRecord */
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8fbddea..3a90aac 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid, char *reason);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index a3f8f82..6dedebc 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool onCatalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bf..579d9ff 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d7f33ab..8c90fd7 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 8d5ad6b..a9a1ac7 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2009,6 +2009,33 @@ sub pg_recvlogical_upto
=pod
+=item $node->create_logical_slot_on_standby(self, master, slot_name, dbname)
+
+Create logical replication slot on given standby
+
+=cut
+
+sub create_logical_slot_on_standby
+{
+ my ($self, $master, $slot_name, $dbname) = @_;
+ my ($stdout, $stderr);
+
+ my $handle;
+
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $self->connstr($dbname), '-P', 'test_decoding', '-S', $slot_name, '--create-slot'], '>', \$stdout, '2>', \$stderr);
+ sleep(1);
+
+ # Slot creation on standby waits for an xl_running_xacts record. So arrange
+ # for it.
+ $master->safe_psql('postgres', 'CHECKPOINT');
+
+ $handle->finish();
+
+ return 0;
+}
+
+=pod
+
=back
=cut
diff --git a/src/test/recovery/t/018_logical_decoding_on_replica.pl b/src/test/recovery/t/018_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..304f32a
--- /dev/null
+++ b/src/test/recovery/t/018_logical_decoding_on_replica.pl
@@ -0,0 +1,395 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 51;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_phys_mins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now that the standby's
+# slot doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be non-NULL since hs_feedback is on and
+# there is a logical slot present on standby.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NOT NULL");
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+is($node_replica->create_logical_slot_on_standby($node_master, 'dodropslot', 'testdb'),
+ 0, 'created dodropslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+is($node_replica->create_logical_slot_on_standby($node_master, 'otherslot', 'postgres'),
+ 0, 'created otherslot on postgres')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On Thu, 20 Jun 2019 at 00:31, Andres Freund <andres@anarazel.de> wrote:
On 2019-06-12 17:30:02 +0530, Amit Khandekar wrote:
In the attached v6 version of the patch, I did the above. That is, I
used XLogFindNextRecord() to bump up the restart_lsn of the slot to
the first valid record. But since XLogReaderState is not available in
ReplicationSlotReserveWal(), I did this in
DecodingContextFindStartpoint(). And then updated the slot restart_lsn
with this corrected position.Since XLogFindNextRecord() is currently disabled using #if 0, removed
this directive.Well, ifdef FRONTEND. I don't think that's a problem. It's a bit
overkill here, because I think we know the address has to be on a record
boundary (rather than being in the middle of a page spanning WAL
record). So we could just add add the size of the header manually
- but I think that's not worth doing.Or else, do you think we can just increment the record pointer by
doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
SizeOfXLogShortPHD() ?I found out that we can't do this, because we don't know whether the
xlog header is SizeOfXLogShortPHD or SizeOfXLogLongPHD. In fact, in
our context, it is SizeOfXLogLongPHD. So we indeed need the
XLogReaderState handle.Well, we can determine whether a long or a short header is going to be
used, as that's solely dependent on the LSN:/*
* If first page of an XLOG segment file, make it a long header.
*/
if ((XLogSegmentOffset(NewPage->xlp_pageaddr, wal_segment_size)) == 0)
{
XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;NewLongPage->xlp_sysid = ControlFile->system_identifier;
NewLongPage->xlp_seg_size = wal_segment_size;
NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
NewPage->xlp_info |= XLP_LONG_HEADER;
}but I don't think that's worth it.
Ok, so what you are saying is : In case of ReplayRecPtr, it is always
possible to know whether it is pointing at a long header or short
header, just by looking at its value. And then we just increment it by
the header size after knowing the header size. Why do you think it is
no worth it ? In fact, I thought we *have* to increment it to set it
to the next record. Didn't understand what other option we have.
Do you think that we can solve this using some other approach ? I am
not sure whether it's only the initial conditions that cause
lastReplayedEndRecPtr value to *not* point to a valid record, or is it
just a coincidence and that lastReplayedEndRecPtr can also have such a
value any time afterwards.It's always possible. All that means is that the last record filled the
entire last WAL page.
Ok that means we *have* to bump the pointer ahead.
If it's only possible initially, we can
just use GetRedoRecPtr() instead of lastReplayedEndRecPtr if
lastReplayedEndRecPtr is invalid.I don't think so? The redo pointer will point to something *much*
earlier, where we'll not yet have done all the necessary conflict
handling during recovery? So we'd not necessarily notice that a slot
is not actually usable for decoding.We could instead just handle that by starting decoding at the redo
pointer, and just ignore all WAL records until they're after
lastReplayedEndRecPtr, but that has no advantages, and will read a lot
more WAL.
Yeah I agree : just doing this for initial case is a bad idea.
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*//* XLOG stuff */ + xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid); xlrec_reuse.node = rel->rd_node; xlrec_reuse.block = blkno; xlrec_reuse.latestRemovedXid = latestRemovedXid; @@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf, XLogRecPtr recptr; xl_btree_delete xlrec_delete;+ xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;Can we instead pass the heap rel down to here? I think there's only one
caller, and it has the heap relation available these days (it didn't at
the time of the prototype, possibly). There's a few other users of
get_rel_logical_catalog() where that might be harder, but it's easy
here.
For _bt_log_reuse_page(), it's only caller is _bt_getbuf() which does
not have heapRel parameter. Let me know which caller you were
referring to that has heapRel.
For _bt_delitems_delete(), it itself has heapRel param, so I will use
this for onCatalogTable.
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;This one seems harder, but I'm not actually sure why we make it so
hard. It seems like we just ought to add the table to IndexVacuumInfo.
This means we have to add heapRel assignment wherever we initialize
IndexVacuumInfo structure, namely in lazy_vacuum_index(),
lazy_cleanup_index(), validate_index(), analyze_rel(), and make sure
these functions have a heap rel handle. Do you think we should do this
as part of this patch ?
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + { + found_conflict = true; + + ereport(LOG, + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_catalog_xmin, xid))); + } + + } + if (found_conflict) + {
The above changes seem to be from the older version (v6) of the patch.
Just wanted to make sure you are using v8 patch.
Hm, as far as I can tell you just ignore that the slot might currently
be in use. You can't just drop a slot that somebody is using. I think
you need to send a recovery conflict to that backend.
Yeah, I am currently working on this. As you suggested, I am going to
call CancelVirtualTransaction() and for its sigmode parameter, I will
pass a new ProcSignalReason value PROCSIG_RECOVERY_CONFLICT_SLOT.
+ elog(LOG, "Dropping conflicting slot %s", s->data.name.data);
This definitely needs to be expanded, and follow the message style
guideline.
This message , with the v8 patch, looks like this :
ereport(LOG,
(errmsg("Dropping conflicting slot %s", NameStr(slotname)),
errdetail("%s", reason)));
where reason is a char string.
On Thu, 20 Jun 2019 at 00:31, Andres Freund <andres@anarazel.de> wrote:
Or else, do you think we can just increment the record pointer by
doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
SizeOfXLogShortPHD() ?I found out that we can't do this, because we don't know whether the
xlog header is SizeOfXLogShortPHD or SizeOfXLogLongPHD. In fact, in
our context, it is SizeOfXLogLongPHD. So we indeed need the
XLogReaderState handle.Well, we can determine whether a long or a short header is going to be
used, as that's solely dependent on the LSN:
Discussion of this point (plus some more points) is in a separate
reply. You can reply to my comments there :
/messages/by-id/CAJ3gD9f_HjQ6qP=+1jwzwy77fwcbT4-M3UvVsqpAzsY-jqM8nw@mail.gmail.com
/* + * Get the wal_level from the control file. + */ +int +ControlFileWalLevel(void) +{ + return ControlFile->wal_level; +}Any reason not to return the type enum WalLevel instead? I'm not sure I
like the function name - perhaps something like GetActiveWalLevel() or
such? The fact that it's in the control file doesn't seem relevant
here. I think it should be close to DataChecksumsEnabled() etc, which
all return information from the control file.
Done.
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,17 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));+ /* + * Drop logical slots if we are in hot standby and master does not have + * logical data. Don't bother to search for the slots if standby is + * running with wal_level lower than logical, because in that case, + * we would have disallowed creation of logical slots. + */s/disallowed creation/disallowed creation or previously dropped/
Did this :
* we would have either disallowed creation of logical slots or dropped
* existing ones.
+ if (InRecovery && InHotStandby && + xlrec.wal_level < WAL_LEVEL_LOGICAL && + wal_level >= WAL_LEVEL_LOGICAL) + ResolveRecoveryConflictWithSlots(InvalidOid, InvalidTransactionId); + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); ControlFile->MaxConnections = xlrec.MaxConnections; ControlFile->max_worker_processes = xlrec.max_worker_processes;Not for this patch, but I kinda feel the individual replay routines
ought to be broken out of xlog_redo().
Yeah, agree.
/* ---------------------------------------- * Functions for decoding the data and block references in a record. * ---------------------------------------- diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 151c3ef..c1bd028 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) * can restart from there. */ break; + case XLOG_PARAMETER_CHANGE: + { + xl_parameter_change *xlrec = + (xl_parameter_change *) XLogRecGetData(buf->record); + + /* Cannot proceed if master itself does not have logical data */ + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + break; + }This should also HINT to drop the replication slot.
In this case, DecodeXLogOp() is being called because somebody is using
the slot itself. Not sure if it makes sense to hint the user to drop
the very slot that he/she is using. It would have made better sense to
hint about dropping the slot if something else was being done that
does not require a slot, but because the slot is becoming a nuisance,
we hint to drop the slot so as to avoid the error. What do you say ?
Probably the error message itself hints at setting the wal-level back
to logical.
+ /* + * It is not guaranteed that the restart_lsn points to a valid + * record location. E.g. on standby, restart_lsn initially points to lastReplayedEndRecPtr, + * which is 1 + the end of last replayed record, which means it can point the next + * block header start. So bump it to the next valid record. + */I'd rephrase this as something like:
restart_lsn initially may point one past the end of the record. If that
is a XLOG page boundary, it will not be a valid LSN for the start of a
record. If that's the case, look for the start of the first record.
Done.
+ if (!XRecOffIsValid(startptr))
+ {Hm, could you before this add an Assert(startptr != InvalidXLogRecPtr)
or such?
Yeah, done
+ elog(DEBUG1, "Invalid restart lsn %X/%X", + (uint32) (startptr >> 32), (uint32) startptr); + startptr = XLogFindNextRecord(ctx->reader, startptr); + + SpinLockAcquire(&slot->mutex); + slot->data.restart_lsn = startptr; + SpinLockRelease(&slot->mutex); + elog(DEBUG1, "Moved slot restart lsn to %X/%X", + (uint32) (startptr >> 32), (uint32) startptr); + }Minor nit: normally debug messages don't start with upper case.
Done.
/* Wait for a consistent starting point */ for (;;) { diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 55c306e..7ffd264 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1016,37 +1016,37 @@ ReplicationSlotReserveWal(void) /* * For logical slots log a standby snapshot and start logical decoding * at exactly that position. That allows the slot to start up more - * quickly. + * quickly. But on a standby we cannot do WAL writes, so just use the + * replay pointer; effectively, an attempt to create a logical slot on + * standby will cause it to wait for an xl_running_xact record so that + * a snapshot can be built using the record.I'd add "to be logged independently on the primary" after "wait for an
xl_running_xact record".
Done.
- * That's not needed (or indeed helpful) for physical slots as they'll - * start replay at the last logged checkpoint anyway. Instead return - * the location of the last redo LSN. While that slightly increases - * the chance that we have to retry, it's where a base backup has to - * start replay at. + * None of this is needed (or indeed helpful) for physical slots as + * they'll start replay at the last logged checkpoint anyway. Instead + * return the location of the last redo LSN. While that slightly + * increases the chance that we have to retry, it's where a base backup + * has to start replay at. */ + + restart_lsn = + (SlotIsPhysical(slot) ? GetRedoRecPtr() : + (RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) : + GetXLogInsertRecPtr()));Please rewrite this to use normal if blocks. I'm also not convinced that
it's useful to have this if block, and then another if block that
basically tests the same conditions again.
Will check and get back on this one.
/* + * Resolve recovery conflicts with slots. + * + * When xid is valid, it means it's a removed-xid kind of conflict, so need to + * drop the appropriate slots whose xmin conflicts with removed xid.I don't think "removed-xid kind of conflict" is that descriptive. I'd
suggest something like "When xid is valid, it means that rows older than
xid might have been removed. Therefore we need to drop slots that depend
on seeing those rows."
Done.
+ * When xid is invalid, drop all logical slots. This is required when the + * master wal_level is set back to replica, so existing logical slots need to + * be dropped. + */ +void +ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid) +{ + int i; + bool found_conflict = false; + + if (max_replication_slots <= 0) + return; + +restart: + if (found_conflict) + { + CHECK_FOR_INTERRUPTS(); + /* + * Wait awhile for them to die so that we avoid flooding an + * unresponsive backend when system is heavily loaded. + */ + pg_usleep(100000); + found_conflict = false; + }Hm, I wonder if we could use the condition variable the slot
infrastructure has these days for this instead.
Removed the pg_usleep, since in the attached patch, we now sleep on
the condition variable just after sending a recovery conflict signal
is sent. Details down below.
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); + for (i = 0; i < max_replication_slots; i++) + { + ReplicationSlot *s; + NameData slotname; + TransactionId slot_xmin; + TransactionId slot_catalog_xmin; + + s = &ReplicationSlotCtl->replication_slots[i]; + + /* cannot change while ReplicationSlotCtlLock is held */ + if (!s->in_use) + continue; + + /* Invalid xid means caller is asking to drop all logical slots */ + if (!TransactionIdIsValid(xid) && SlotIsLogical(s)) + found_conflict = true;I'd just add
if (!SlotIsLogical(s))
continue;because all of this doesn't need to happen for slots that aren't
logical.
Yeah right. Done. Also renamed the function to
ResolveRecoveryConflictWithLogicalSlots() to emphasize that it is only
for logical slots.
+ else + { + /* not our database, skip */ + if (s->data.database != InvalidOid && s->data.database != dboid) + continue; + + SpinLockAcquire(&s->mutex); + slotname = s->data.name; + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + SpinLockRelease(&s->mutex); + + if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid)) + { + found_conflict = true; + + ereport(LOG, + (errmsg("slot %s w/ xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_xmin, xid))); + }s/removed xid/xid horizon being increased to %u/
BTW, this message belongs to an older patch. Check v7 onwards for
latest way I used for generating the message. Anyway, I have used the
above suggestion. Now the message detail will look like :
slot xmin: 1234, slot catalog_xmin: 5678, conflicts with xid horizon
being increased to 9012"
+ if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + { + found_conflict = true; + + ereport(LOG, + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u", + NameStr(slotname), slot_catalog_xmin, xid))); + } + + } + if (found_conflict) + {Hm, as far as I can tell you just ignore that the slot might currently
be in use. You can't just drop a slot that somebody is using.
Yeah, I missed that.
I think
you need to send a recovery conflict to that backend.I guess the easiest way to do that would be something roughly like:
SetInvalidVirtualTransactionId(vxid);
LWLockAcquire(ProcArrayLock, LW_SHARED);
cancel_proc = BackendPidGetProcWithLock(active_pid);
if (cancel_proc)
vxid = GET_VXID_FROM_PGPROC(cancel_proc);
LWLockRelease(ProcArrayLock);if (VirtualTransactionIdIsValid(vixd))
{
CancelVirtualTransaction(vxid);/* Wait here until we get signaled, and then restart */
ConditionVariableSleep(&slot->active_cv,
WAIT_EVENT_REPLICATION_SLOT_DROP);
}
ConditionVariableCancelSleep();when the slot is currently active.
Did that now. Check the new function ReplicationSlotDropConflicting().
Also the below code is something that I added :
* Note: Even if vxid.localTransactionId is invalid, we need to cancel
* that backend, because there is no other way to make it release the
* slot. So don't bother to validate vxid.localTransactionId.
*/
if (vxid.backendId == InvalidBackendId)
continue;
This was done so that we could kill walsender in case pg_recvlogical
made it acquire the slot that we want to drop. walsender does not have
a local transaction id it seems. But CancelVirtualTransaction() works
also if vxid.localTransactionId is invalid. I have added comments to
explain this in CancelVirtualTransaction().
Part of this would need to be split
into a procarray.c helper function (mainly all the stuff dealing with
ProcArrayLock).
I didn't have to split it, by the way.
+ elog(LOG, "Dropping conflicting slot %s", s->data.name.data);
This definitely needs to be expanded, and follow the message style
guideline.
v7 patch onvwards, the message looks :
ereport(LOG,
(errmsg("Dropping conflicting slot %s", NameStr(slotname)),
errdetail("%s", conflict_reason)));
Does that suffice ?
+ LWLockRelease(ReplicationSlotControlLock); /* avoid deadlock */
Instead of saying "deadlock" I'd just say that ReplicationSlotDropPtr()
will acquire that lock.
Done
+ ReplicationSlotDropPtr(s);
But more importantly, I don't think this is
correct. ReplicationSlotDropPtr() assumes that the to-be-dropped slot is
acquired by the current backend - without that somebody else could
concurrently acquire that slot.SO I think you need to do something like ReplicationSlotsDropDBSlots()
does:/* acquire slot, so ReplicationSlotDropAcquired can be reused */
SpinLockAcquire(&s->mutex);
/* can't change while ReplicationSlotControlLock is held */
slotname = NameStr(s->data.name);
active_pid = s->active_pid;
if (active_pid == 0)
{
MyReplicationSlot = s;
s->active_pid = MyProcPid;
}
SpinLockRelease(&s->mutex);
I have now done this in ReplicationSlotDropConflicting() itself.
Greetings,
Andres Freund
I have also removed the code inside #ifdef NOT_ANYMORE that errors out
with "logical decoding cannot be used while in recovery".
I have introduced a new procsignal reason
PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT so that when the conflicting
logical slot is dropped, a new error detail will be shown : "User was
using the logical slot that must be dropped".
Accordingly, added PgStat_StatDBEntry.n_conflict_logicalslot field.
Also, in RecoveryConflictInterrupt(), had to do some special handling
for am_cascading_walsender, so that a conflicting walsender on standby
will be terminated irrespective of the transaction status.
Attached v9 patch.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v9.patchapplication/octet-stream; name=logical-decoding-on-standby_v9.patchDownload
From 24ab7a9da9976cc67fe9b1a374efcf10257eac4a Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Mon, 24 Jun 2019 23:42:42 +0530
Subject: [PATCH] Logical decoding on standby - v9.
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
3. Handle slot conflicts when master wal_level becomes less than logical.
Changes in v6 patch :
While creating the slot, lastReplayedEndRecPtr is used to set the
restart_lsn, but its position is later adjusted in
DecodingContextFindStartpoint() in case it does not point to a
valid record location. This can happen because replay pointer
points to 1 + end of last record replayed, which means it can
coincide with first byte of a new WAL block, i.e. inside block
header.
Also, modified the test to handle the requirement that the
logical slot creation on standby requires a checkpoint
(or any other transaction commit) to be given from master. For
that, in src/test/perl/PostgresNode.pm, added a new function
create_logical_slot_on_standby() which does the reqiured steps.
Changes in v7 patch :
Merge the two conflict messages for xmin and catalog_xmin into
a single one.
Changes in v8 :
Fix incorrect flush ptr on standby (reported by Tushar Ahuja).
In XLogSendLogical(), GetFlushRecPtr() was used to get the flushed
point. On standby, GetFlushRecPtr() does not give a valid value, so it
was wrongly determined that the sent record is beyond flush point, as
a result of which, WalSndCaughtUp was set to true, causing
WalSndLoop() to sleep for some duration after every record.
This was reported by Tushar Ahuja, where pg_recvlogical seems like it
is hanging when there are loads of insert.
Fix: Use GetStandbyFlushRecPtr() if am_cascading_walsender
Changes in v9 :
While dropping a conflicting logical slot, if a backend has acquired it, send
it a conflict recovery signal. Check new function ReplicationSlotDropConflicting().
Also, miscellaneous review comments addressed, but not all of them yet.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 4 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/access/transam/xlog.c | 22 ++
src/backend/access/transam/xlogreader.c | 4 -
src/backend/postmaster/pgstat.c | 4 +
src/backend/replication/logical/decode.c | 14 +-
src/backend/replication/logical/logical.c | 42 +++
src/backend/replication/slot.c | 212 ++++++++++-
src/backend/replication/walsender.c | 8 +-
src/backend/storage/ipc/procarray.c | 4 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/ipc/standby.c | 7 +-
src/backend/tcop/postgres.c | 23 +-
src/backend/utils/adt/pgstatfuncs.c | 1 +
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/access/xlog.h | 1 +
src/include/access/xlogreader.h | 2 -
src/include/pgstat.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/procsignal.h | 1 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
src/test/perl/PostgresNode.pm | 27 ++
.../recovery/t/018_logical_decoding_on_replica.pl | 395 +++++++++++++++++++++
38 files changed, 809 insertions(+), 48 deletions(-)
create mode 100644 src/test/recovery/t/018_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..385ea1f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 5321762..e28465a 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d768b9b..10b7857 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7240,6 +7242,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7270,7 +7273,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7280,6 +7283,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7700,7 +7704,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7736,7 +7741,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7832,7 +7838,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7969,7 +7977,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..bf34d3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 0357030..6b641c9 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,8 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable =
+ RelationIsAccessibleInLogicalDecoding(heapRel);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 6532a25..b874bda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2b1662a..eaaf631 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e08320e..2fe1de2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4926,6 +4926,15 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file.
+ */
+WalLevel
+GetActiveWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,19 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+ /*
+ * Drop logical slots if we are in hot standby and master does not have
+ * logical data. Don't bother to search for the slots if standby is
+ * running with wal_level lower than logical, because in that case,
+ * we would have either disallowed creation of logical slots or dropped
+ * existing ones.
+ */
+ if (InRecovery && InHotStandby &&
+ xlrec.wal_level < WAL_LEVEL_LOGICAL &&
+ wal_level >= WAL_LEVEL_LOGICAL)
+ ResolveRecoveryConflictWithLogicalSlots(InvalidOid, InvalidTransactionId,
+ gettext_noop("logical decoding on standby requires wal_level >= logical on master"));
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->MaxConnections = xlrec.MaxConnections;
ControlFile->max_worker_processes = xlrec.max_worker_processes;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 88be7fe..431a302 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -878,7 +878,6 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
return true;
}
-#ifdef FRONTEND
/*
* Functions that are currently not needed in the backend, but are better
* implemented inside xlogreader.c because of the internal facilities available
@@ -1003,9 +1002,6 @@ out:
return found;
}
-#endif /* FRONTEND */
-
-
/* ----------------------------------------
* Functions for decoding the data and block references in a record.
* ----------------------------------------
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28..797ea0c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4728,6 +4728,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_conflict_tablespace = 0;
dbentry->n_conflict_lock = 0;
dbentry->n_conflict_snapshot = 0;
+ dbentry->n_conflict_logicalslot = 0;
dbentry->n_conflict_bufferpin = 0;
dbentry->n_conflict_startup_deadlock = 0;
dbentry->n_temp_files = 0;
@@ -6352,6 +6353,9 @@ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len)
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
dbentry->n_conflict_snapshot++;
break;
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ dbentry->n_conflict_logicalslot++;
+ break;
case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
dbentry->n_conflict_bufferpin++;
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..c1bd028 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index bbd38c0..347eba7 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,6 +94,24 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
+ if (RecoveryInProgress())
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
+
+#ifdef NOT_ANYMORE
/* ----
* TODO: We got to change that someday soon...
*
@@ -111,6 +129,7 @@ CheckLogicalDecodingRequirements(void)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("logical decoding cannot be used while in recovery")));
+#endif
}
/*
@@ -241,6 +260,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
@@ -474,6 +495,27 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
(uint32) (slot->data.restart_lsn >> 32),
(uint32) slot->data.restart_lsn);
+ Assert(!XLogRecPtrIsInvalid(startptr));
+
+ /*
+ * restart_lsn initially may point one past the end of the record. If that
+ * is a XLOG page boundary, it will not be a valid LSN for the start of a
+ * record. If that's the case, look for the start of the first record.
+ */
+ if (!XRecOffIsValid(startptr))
+ {
+ elog(DEBUG1, "invalid restart lsn %X/%X",
+ (uint32) (startptr >> 32), (uint32) startptr);
+ startptr = XLogFindNextRecord(ctx->reader, startptr);
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = startptr;
+ SpinLockRelease(&slot->mutex);
+
+ elog(DEBUG1, "moved slot restart lsn to %X/%X",
+ (uint32) (startptr >> 32), (uint32) startptr);
+ }
+
/* Wait for a consistent starting point */
for (;;)
{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 55c306e..6312a3a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
#include "pgstat.h"
#include "replication/slot.h"
#include "storage/fd.h"
+#include "storage/lock.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
@@ -101,6 +102,7 @@ int max_replication_slots = 0; /* the maximum number of replication
static void ReplicationSlotDropAcquired(void);
static void ReplicationSlotDropPtr(ReplicationSlot *slot);
+static void ReplicationSlotDropConflicting(ReplicationSlot *slot);
/* internal persistency functions */
static void RestoreSlotFromDisk(const char *name);
@@ -638,6 +640,64 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
}
/*
+ * Permanently drop a conflicting replication slot. If it's already active by
+ * another backend, send it a recovery conflict signal, and then try again.
+ */
+static void
+ReplicationSlotDropConflicting(ReplicationSlot *slot)
+{
+ pid_t active_pid;
+ PGPROC *proc;
+ VirtualTransactionId vxid;
+
+ ConditionVariablePrepareToSleep(&slot->active_cv);
+ while (1)
+ {
+ SpinLockAcquire(&slot->mutex);
+ active_pid = slot->active_pid;
+ if (active_pid == 0)
+ active_pid = slot->active_pid = MyProcPid;
+ SpinLockRelease(&slot->mutex);
+
+ /* Drop the acquired slot, unless it is acquired by another backend */
+ if (active_pid == MyProcPid)
+ {
+ elog(DEBUG1, "acquired conflicting slot, now dropping it");
+ ReplicationSlotDropPtr(slot);
+ break;
+ }
+
+ /* Send the other backend, a conflict recovery signal */
+
+ SetInvalidVirtualTransactionId(vxid);
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ proc = BackendPidGetProcWithLock(active_pid);
+ if (proc)
+ GET_VXID_FROM_PGPROC(vxid, *proc);
+ LWLockRelease(ProcArrayLock);
+
+ /*
+ * If coincidently that process finished, some other backend may
+ * acquire the slot again. So start over again.
+ * Note: Even if vxid.localTransactionId is invalid, we need to cancel
+ * that backend, because there is no other way to make it release the
+ * slot. So don't bother to validate vxid.localTransactionId.
+ */
+ if (vxid.backendId == InvalidBackendId)
+ continue;
+
+ elog(DEBUG1, "cancelling pid %d (backendId: %d) for releasing slot",
+ active_pid, vxid.backendId);
+
+ CancelVirtualTransaction(vxid, PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+ ConditionVariableSleep(&slot->active_cv,
+ WAIT_EVENT_REPLICATION_SLOT_DROP);
+ }
+
+ ConditionVariableCancelSleep();
+}
+
+/*
* Serialize the currently acquired slot's state from memory to disk, thereby
* guaranteeing the current state will survive a crash.
*/
@@ -1016,37 +1076,38 @@ ReplicationSlotReserveWal(void)
/*
* For logical slots log a standby snapshot and start logical decoding
* at exactly that position. That allows the slot to start up more
- * quickly.
+ * quickly. But on a standby we cannot do WAL writes, so just use the
+ * replay pointer; effectively, an attempt to create a logical slot on
+ * standby will cause it to wait for an xl_running_xact record to be
+ * logged independently on the primary, so that a snapshot can be built
+ * using the record.
*
- * That's not needed (or indeed helpful) for physical slots as they'll
- * start replay at the last logged checkpoint anyway. Instead return
- * the location of the last redo LSN. While that slightly increases
- * the chance that we have to retry, it's where a base backup has to
- * start replay at.
+ * None of this is needed (or indeed helpful) for physical slots as
+ * they'll start replay at the last logged checkpoint anyway. Instead
+ * return the location of the last redo LSN. While that slightly
+ * increases the chance that we have to retry, it's where a base backup
+ * has to start replay at.
*/
+
+ restart_lsn =
+ (SlotIsPhysical(slot) ? GetRedoRecPtr() :
+ (RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) :
+ GetXLogInsertRecPtr()));
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = restart_lsn;
+ SpinLockRelease(&slot->mutex);
+
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
XLogRecPtr flushptr;
- /* start at current insert position */
- restart_lsn = GetXLogInsertRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
-
/* make sure we have enough information to start */
flushptr = LogStandbySnapshot();
/* and make sure it's fsynced to disk */
XLogFlush(flushptr);
}
- else
- {
- restart_lsn = GetRedoRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
- }
/* prevent WAL removal as fast as possible */
ReplicationSlotsComputeRequiredLSN();
@@ -1065,6 +1126,119 @@ ReplicationSlotReserveWal(void)
}
/*
+ * Resolve recovery conflicts with logical slots.
+ *
+ * When xid is valid, it means that rows older than xid might have been
+ * removed. Therefore we need to drop slots that depend on seeing those rows.
+ * When xid is invalid, drop all logical slots. This is required when the
+ * master wal_level is set back to replica, so existing logical slots need to
+ * be dropped. Also, when xid is invalid, a common 'conflict_reason' is
+ * provided for the error detail; otherwise it is NULL, in which case it is
+ * constructed out of the xid value.
+ */
+void
+ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid,
+ char *conflict_reason)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* We are only dealing with *logical* slot conflicts. */
+ if (!SlotIsLogical(s))
+ continue;
+
+ /* Invalid xid means caller is asking to drop all logical slots */
+ if (!TransactionIdIsValid(xid))
+ found_conflict = true;
+ else
+ {
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+ StringInfoData conflict_str;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ /*
+ * Build the conflict_str which will look like :
+ * "slot xmin: 1234, slot catalog_xmin: 5678, conflicts with xid
+ * horizon being increased to 9012"
+ */
+ initStringInfo(&conflict_str);
+ if (TransactionIdIsValid(slot_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ appendStringInfo(&conflict_str, "slot xmin: %d", slot_xmin);
+
+ if (TransactionIdIsValid(slot_catalog_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ appendStringInfo(&conflict_str, "%sslot catalog_xmin: %d",
+ conflict_str.len > 0 ? ", " : "",
+ slot_catalog_xmin);
+
+ if (conflict_str.len > 0)
+ {
+ appendStringInfo(&conflict_str, ", %s %d",
+ gettext_noop("conflicts with xid horizon being increased to"),
+ xid);
+ found_conflict = true;
+ conflict_reason = conflict_str.data;
+ }
+ }
+
+ if (found_conflict)
+ {
+ NameData slotname;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ SpinLockRelease(&s->mutex);
+
+ ereport(LOG,
+ (errmsg("Dropping conflicting slot %s", NameStr(slotname)),
+ errdetail("%s", conflict_reason)));
+
+ /* ReplicationSlotDropPtr() would acquire the lock below */
+ LWLockRelease(ReplicationSlotControlLock);
+
+ ReplicationSlotDropConflicting(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 92fa86f..4ce7096 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2814,6 +2814,7 @@ XLogSendLogical(void)
{
XLogRecord *record;
char *errm;
+ XLogRecPtr flushPtr;
/*
* Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2830,10 +2831,11 @@ XLogSendLogical(void)
if (errm != NULL)
elog(ERROR, "%s", errm);
+ flushPtr = (am_cascading_walsender ?
+ GetStandbyFlushRecPtr() : GetFlushRecPtr());
+
if (record != NULL)
{
- /* XXX: Note that logical decoding cannot be used while in recovery */
- XLogRecPtr flushPtr = GetFlushRecPtr();
/*
* Note the lack of any call to LagTrackerWrite() which is handled by
@@ -2857,7 +2859,7 @@ XLogSendLogical(void)
* If the record we just wanted read is at or beyond the flushed
* point, then we're caught up.
*/
- if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
+ if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
{
WalSndCaughtUp = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 18a0f62..ec696f4 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2669,6 +2669,10 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
GET_VXID_FROM_PGPROC(procvxid, *proc);
+ /*
+ * Note: vxid.localTransactionId can be invalid, which means the
+ * request is to signal the pid that is not running a transaction.
+ */
if (procvxid.backendId == vxid.backendId &&
procvxid.localTransactionId == vxid.localTransactionId)
{
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7605b2c..645f320 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -286,6 +286,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+ if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT))
+ RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 25b7e31..7cfb6d5 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithLogicalSlots(node.dbNode, latestRemovedXid, NULL);
}
void
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1..c23d361 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2393,6 +2393,9 @@ errdetail_recovery_conflict(void)
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
errdetail("User query might have needed to see row versions that must be removed.");
break;
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ errdetail("User was using the logical slot that must be dropped.");
+ break;
case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
errdetail("User transaction caused buffer deadlock with recovery.");
break;
@@ -2879,6 +2882,25 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
case PROCSIG_RECOVERY_CONFLICT_LOCK:
case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ /*
+ * For conflicts that require a logical slot to be dropped, the
+ * requirement is for the signal receiver to release the slot,
+ * so that it could be dropped by the signal sender. So for
+ * normal backends, the transaction should be aborted, just
+ * like for other recovery conflicts. But if it's walsender on
+ * standby, then it has to be killed so as to release an
+ * acquired logical slot.
+ */
+ if (am_cascading_walsender &&
+ reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT &&
+ MyReplicationSlot && SlotIsLogical(MyReplicationSlot))
+ {
+ RecoveryConflictPending = true;
+ QueryCancelPending = true;
+ InterruptPending = true;
+ break;
+ }
/*
* If we aren't in a transaction any longer then ignore.
@@ -2920,7 +2942,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
/* Intentional fall through to session cancel */
/* FALLTHROUGH */
-
case PROCSIG_RECOVERY_CONFLICT_DATABASE:
RecoveryConflictPending = true;
ProcDiePending = true;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..7dfbef7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1499,6 +1499,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS)
dbentry->n_conflict_tablespace +
dbentry->n_conflict_lock +
dbentry->n_conflict_snapshot +
+ dbentry->n_conflict_logicalslot +
dbentry->n_conflict_bufferpin +
dbentry->n_conflict_startup_deadlock);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c13c08a..bd35bc1 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 969a537..59246c3 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -48,9 +48,9 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -96,6 +96,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f6cdca8..a1d1f11 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 073f740..d3dad69 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 237f4e0..e7439c1 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern WalLevel GetActiveWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 04228e2..a5ffffc 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -215,9 +215,7 @@ extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
/* Invalidate read state */
extern void XLogReaderInvalReadState(XLogReaderState *state);
-#ifdef FRONTEND
extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
-#endif /* FRONTEND */
/* Functions for decoding an XLogRecord */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..4fe8684 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -604,6 +604,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_conflict_tablespace;
PgStat_Counter n_conflict_lock;
PgStat_Counter n_conflict_snapshot;
+ PgStat_Counter n_conflict_logicalslot;
PgStat_Counter n_conflict_bufferpin;
PgStat_Counter n_conflict_startup_deadlock;
PgStat_Counter n_temp_files;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8fbddea..73b954e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, char *reason);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 05b186a..956d3c2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -39,6 +39,7 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
PROCSIG_RECOVERY_CONFLICT_LOCK,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
+ PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index a3f8f82..6dedebc 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool onCatalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bf..579d9ff 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d7f33ab..8c90fd7 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 6019f37..719837d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2000,6 +2000,33 @@ sub pg_recvlogical_upto
=pod
+=item $node->create_logical_slot_on_standby(self, master, slot_name, dbname)
+
+Create logical replication slot on given standby
+
+=cut
+
+sub create_logical_slot_on_standby
+{
+ my ($self, $master, $slot_name, $dbname) = @_;
+ my ($stdout, $stderr);
+
+ my $handle;
+
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $self->connstr($dbname), '-P', 'test_decoding', '-S', $slot_name, '--create-slot'], '>', \$stdout, '2>', \$stderr);
+ sleep(1);
+
+ # Slot creation on standby waits for an xl_running_xacts record. So arrange
+ # for it.
+ $master->safe_psql('postgres', 'CHECKPOINT');
+
+ $handle->finish();
+
+ return 0;
+}
+
+=pod
+
=back
=cut
diff --git a/src/test/recovery/t/018_logical_decoding_on_replica.pl b/src/test/recovery/t/018_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..304f32a
--- /dev/null
+++ b/src/test/recovery/t/018_logical_decoding_on_replica.pl
@@ -0,0 +1,395 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 51;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_phys_mins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now that the standby's
+# slot doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be non-NULL since hs_feedback is on and
+# there is a logical slot present on standby.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NOT NULL");
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+is($node_replica->create_logical_slot_on_standby($node_master, 'dodropslot', 'testdb'),
+ 0, 'created dodropslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+is($node_replica->create_logical_slot_on_standby($node_master, 'otherslot', 'postgres'),
+ 0, 'created otherslot on postgres')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On Mon, 24 Jun 2019 at 23:58, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Thu, 20 Jun 2019 at 00:31, Andres Freund <andres@anarazel.de> wrote:
Or else, do you think we can just increment the record pointer by
doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
SizeOfXLogShortPHD() ?I found out that we can't do this, because we don't know whether the
xlog header is SizeOfXLogShortPHD or SizeOfXLogLongPHD. In fact, in
our context, it is SizeOfXLogLongPHD. So we indeed need the
XLogReaderState handle.Well, we can determine whether a long or a short header is going to be
used, as that's solely dependent on the LSN:Discussion of this point (plus some more points) is in a separate
reply. You can reply to my comments there :
/messages/by-id/CAJ3gD9f_HjQ6qP=+1jwzwy77fwcbT4-M3UvVsqpAzsY-jqM8nw@mail.gmail.com
As you suggested, I have used XLogSegmentOffset() to know the header
size, and bumped the restart_lsn in ReplicationSlotReserveWal() rather
than DecodingContextFindStartpoint(). Like I mentioned in the above
link, I am not sure why it's not worth doing this like you said.
- * That's not needed (or indeed helpful) for physical slots as they'll - * start replay at the last logged checkpoint anyway. Instead return - * the location of the last redo LSN. While that slightly increases - * the chance that we have to retry, it's where a base backup has to - * start replay at. + * None of this is needed (or indeed helpful) for physical slots as + * they'll start replay at the last logged checkpoint anyway. Instead + * return the location of the last redo LSN. While that slightly + * increases the chance that we have to retry, it's where a base backup + * has to start replay at. */ + + restart_lsn = + (SlotIsPhysical(slot) ? GetRedoRecPtr() : + (RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) : + GetXLogInsertRecPtr()));Please rewrite this to use normal if blocks.
Ok, done.
I'm also not convinced that
it's useful to have this if block, and then another if block that
basically tests the same conditions again.Will check and get back on this one.
Those conditions are not exactly same. restart_lsn is assigned three
different pointers depending upon three different conditions. And
LogStandbySnapshot() is to be done only for combination of two
specific conditions. So we need to have two different condition
blocks.
Also, it's better if we have the
"assign-slot-restart_lsn-under-spinlock" in a common code, rather than
repeating it in two different blocks.
We can do something like :
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
restart_lsn = GetXLogInsertRecPtr();
/* Assign restart_lsn to slot restart_lsn under Spinlock */
/* Log standby snapshot and fsync to disk */
}
else
{
if (SlotIsPhysical(slot))
restart_lsn = GetRedoRecPtr();
else if (RecoveryInProgress())
restart_lsn = GetXLogReplayRecPtr(NULL);
else
restart_lsn = GetXLogInsertRecPtr();
/* Assign restart_lsn to slot restart_lsn under Spinlock */
}
But I think better/simpler thing would be to take out the
assign-slot-restart_lsn outside of the two condition blocks into a
common location, like this :
if (SlotIsPhysical(slot))
restart_lsn = GetRedoRecPtr();
else if (RecoveryInProgress())
restart_lsn = GetXLogReplayRecPtr(NULL);
else
restart_lsn = GetXLogInsertRecPtr();
/* Assign restart_lsn to slot restart_lsn under Spinlock */
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
/ * Log standby snapshot and fsync to disk */
}
So in the updated patch (v10), I have done as above.
Attachments:
logical-decoding-on-standby_v10.patchapplication/octet-stream; name=logical-decoding-on-standby_v10.patchDownload
From f432ba4f782e25db93039a87445696886a1fa479 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Tue, 25 Jun 2019 15:51:32 +0530
Subject: [PATCH] Logical decoding on standby - v10
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
3. Handle slot conflicts when master wal_level becomes less than logical.
Changes in v6 patch :
While creating the slot, lastReplayedEndRecPtr is used to set the
restart_lsn, but its position is later adjusted in
DecodingContextFindStartpoint() in case it does not point to a
valid record location. This can happen because replay pointer
points to 1 + end of last record replayed, which means it can
coincide with first byte of a new WAL block, i.e. inside block
header.
Also, modified the test to handle the requirement that the
logical slot creation on standby requires a checkpoint
(or any other transaction commit) to be given from master. For
that, in src/test/perl/PostgresNode.pm, added a new function
create_logical_slot_on_standby() which does the reqiured steps.
Changes in v7 patch :
Merge the two conflict messages for xmin and catalog_xmin into
a single one.
Changes in v8 :
Fix incorrect flush ptr on standby (reported by Tushar Ahuja).
In XLogSendLogical(), GetFlushRecPtr() was used to get the flushed
point. On standby, GetFlushRecPtr() does not give a valid value, so it
was wrongly determined that the sent record is beyond flush point, as
a result of which, WalSndCaughtUp was set to true, causing
WalSndLoop() to sleep for some duration after every record.
This was reported by Tushar Ahuja, where pg_recvlogical seems like it
is hanging when there are loads of insert.
Fix: Use GetStandbyFlushRecPtr() if am_cascading_walsender
Changes in v9 :
While dropping a conflicting logical slot, if a backend has acquired it, send
it a conflict recovery signal. Check new function ReplicationSlotDropConflicting().
Also, miscellaneous review comments addressed, but not all of them yet.
Changes in v10 :
Adjust restart_lsn if it's a Replay Pointer.
This was earlier done in DecodingContextFindStartpoint() but now it
is done in in ReplicationSlotReserveWal(), when restart_lsn is initialized.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 4 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/access/transam/xlog.c | 22 ++
src/backend/postmaster/pgstat.c | 4 +
src/backend/replication/logical/decode.c | 14 +-
src/backend/replication/logical/logical.c | 33 +-
src/backend/replication/slot.c | 230 +++++++++++-
src/backend/replication/walsender.c | 8 +-
src/backend/storage/ipc/procarray.c | 4 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/ipc/standby.c | 7 +-
src/backend/tcop/postgres.c | 23 +-
src/backend/utils/adt/pgstatfuncs.c | 1 +
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/access/xlog.h | 1 +
src/include/pgstat.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/procsignal.h | 1 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
src/test/perl/PostgresNode.pm | 27 ++
.../recovery/t/018_logical_decoding_on_replica.pl | 395 +++++++++++++++++++++
36 files changed, 802 insertions(+), 58 deletions(-)
create mode 100644 src/test/recovery/t/018_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..385ea1f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 5321762..e28465a 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d768b9b..10b7857 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7240,6 +7242,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7270,7 +7273,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7280,6 +7283,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7700,7 +7704,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7736,7 +7741,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7832,7 +7838,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7969,7 +7977,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..bf34d3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 0357030..6b641c9 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,8 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable =
+ RelationIsAccessibleInLogicalDecoding(heapRel);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 6532a25..b874bda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2b1662a..eaaf631 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e08320e..2fe1de2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4926,6 +4926,15 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file.
+ */
+WalLevel
+GetActiveWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,19 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+ /*
+ * Drop logical slots if we are in hot standby and master does not have
+ * logical data. Don't bother to search for the slots if standby is
+ * running with wal_level lower than logical, because in that case,
+ * we would have either disallowed creation of logical slots or dropped
+ * existing ones.
+ */
+ if (InRecovery && InHotStandby &&
+ xlrec.wal_level < WAL_LEVEL_LOGICAL &&
+ wal_level >= WAL_LEVEL_LOGICAL)
+ ResolveRecoveryConflictWithLogicalSlots(InvalidOid, InvalidTransactionId,
+ gettext_noop("logical decoding on standby requires wal_level >= logical on master"));
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->MaxConnections = xlrec.MaxConnections;
ControlFile->max_worker_processes = xlrec.max_worker_processes;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28..797ea0c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4728,6 +4728,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_conflict_tablespace = 0;
dbentry->n_conflict_lock = 0;
dbentry->n_conflict_snapshot = 0;
+ dbentry->n_conflict_logicalslot = 0;
dbentry->n_conflict_bufferpin = 0;
dbentry->n_conflict_startup_deadlock = 0;
dbentry->n_temp_files = 0;
@@ -6352,6 +6353,9 @@ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len)
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
dbentry->n_conflict_snapshot++;
break;
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ dbentry->n_conflict_logicalslot++;
+ break;
case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
dbentry->n_conflict_bufferpin++;
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..c1bd028 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index bbd38c0..4169828 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,23 +94,22 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
- /* ----
- * TODO: We got to change that someday soon...
- *
- * There's basically three things missing to allow this:
- * 1) We need to be able to correctly and quickly identify the timeline a
- * LSN belongs to
- * 2) We need to force hot_standby_feedback to be enabled at all times so
- * the primary cannot remove rows we need.
- * 3) support dropping replication slots referring to a database, in
- * dbase_redo. There can't be any active ones due to HS recovery
- * conflicts, so that should be relatively easy.
- * ----
- */
if (RecoveryInProgress())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("logical decoding cannot be used while in recovery")));
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
}
/*
@@ -241,6 +240,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 55c306e..fcffba2 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
#include "pgstat.h"
#include "replication/slot.h"
#include "storage/fd.h"
+#include "storage/lock.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
@@ -101,6 +102,7 @@ int max_replication_slots = 0; /* the maximum number of replication
static void ReplicationSlotDropAcquired(void);
static void ReplicationSlotDropPtr(ReplicationSlot *slot);
+static void ReplicationSlotDropConflicting(ReplicationSlot *slot);
/* internal persistency functions */
static void RestoreSlotFromDisk(const char *name);
@@ -638,6 +640,64 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
}
/*
+ * Permanently drop a conflicting replication slot. If it's already active by
+ * another backend, send it a recovery conflict signal, and then try again.
+ */
+static void
+ReplicationSlotDropConflicting(ReplicationSlot *slot)
+{
+ pid_t active_pid;
+ PGPROC *proc;
+ VirtualTransactionId vxid;
+
+ ConditionVariablePrepareToSleep(&slot->active_cv);
+ while (1)
+ {
+ SpinLockAcquire(&slot->mutex);
+ active_pid = slot->active_pid;
+ if (active_pid == 0)
+ active_pid = slot->active_pid = MyProcPid;
+ SpinLockRelease(&slot->mutex);
+
+ /* Drop the acquired slot, unless it is acquired by another backend */
+ if (active_pid == MyProcPid)
+ {
+ elog(DEBUG1, "acquired conflicting slot, now dropping it");
+ ReplicationSlotDropPtr(slot);
+ break;
+ }
+
+ /* Send the other backend, a conflict recovery signal */
+
+ SetInvalidVirtualTransactionId(vxid);
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ proc = BackendPidGetProcWithLock(active_pid);
+ if (proc)
+ GET_VXID_FROM_PGPROC(vxid, *proc);
+ LWLockRelease(ProcArrayLock);
+
+ /*
+ * If coincidently that process finished, some other backend may
+ * acquire the slot again. So start over again.
+ * Note: Even if vxid.localTransactionId is invalid, we need to cancel
+ * that backend, because there is no other way to make it release the
+ * slot. So don't bother to validate vxid.localTransactionId.
+ */
+ if (vxid.backendId == InvalidBackendId)
+ continue;
+
+ elog(DEBUG1, "cancelling pid %d (backendId: %d) for releasing slot",
+ active_pid, vxid.backendId);
+
+ CancelVirtualTransaction(vxid, PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+ ConditionVariableSleep(&slot->active_cv,
+ WAIT_EVENT_REPLICATION_SLOT_DROP);
+ }
+
+ ConditionVariableCancelSleep();
+}
+
+/*
* Serialize the currently acquired slot's state from memory to disk, thereby
* guaranteeing the current state will survive a crash.
*/
@@ -1016,37 +1076,56 @@ ReplicationSlotReserveWal(void)
/*
* For logical slots log a standby snapshot and start logical decoding
* at exactly that position. That allows the slot to start up more
- * quickly.
+ * quickly. But on a standby we cannot do WAL writes, so just use the
+ * replay pointer; effectively, an attempt to create a logical slot on
+ * standby will cause it to wait for an xl_running_xact record to be
+ * logged independently on the primary, so that a snapshot can be built
+ * using the record.
*
- * That's not needed (or indeed helpful) for physical slots as they'll
- * start replay at the last logged checkpoint anyway. Instead return
- * the location of the last redo LSN. While that slightly increases
- * the chance that we have to retry, it's where a base backup has to
- * start replay at.
+ * None of this is needed (or indeed helpful) for physical slots as
+ * they'll start replay at the last logged checkpoint anyway. Instead
+ * return the location of the last redo LSN. While that slightly
+ * increases the chance that we have to retry, it's where a base backup
+ * has to start replay at.
*/
+ if (SlotIsPhysical(slot))
+ restart_lsn = GetRedoRecPtr();
+ else if (RecoveryInProgress())
+ {
+ restart_lsn = GetXLogReplayRecPtr(NULL);
+ /*
+ * Replay pointer may point one past the end of the record. If that
+ * is a XLOG page boundary, it will not be a valid LSN for the
+ * start of a record, so bump it up past the page header.
+ */
+ if (!XRecOffIsValid(restart_lsn))
+ {
+ if (restart_lsn % XLOG_BLCKSZ != 0)
+ elog(ERROR, "invalid replay pointer");
+ /* For the first page of a segment file, it's a long header */
+ if (XLogSegmentOffset(restart_lsn, wal_segment_size) == 0)
+ restart_lsn += SizeOfXLogLongPHD;
+ else
+ restart_lsn += SizeOfXLogShortPHD;
+ }
+ }
+ else
+ restart_lsn = GetXLogInsertRecPtr();
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = restart_lsn;
+ SpinLockRelease(&slot->mutex);
+
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
XLogRecPtr flushptr;
- /* start at current insert position */
- restart_lsn = GetXLogInsertRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
-
/* make sure we have enough information to start */
flushptr = LogStandbySnapshot();
/* and make sure it's fsynced to disk */
XLogFlush(flushptr);
}
- else
- {
- restart_lsn = GetRedoRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
- }
/* prevent WAL removal as fast as possible */
ReplicationSlotsComputeRequiredLSN();
@@ -1065,6 +1144,119 @@ ReplicationSlotReserveWal(void)
}
/*
+ * Resolve recovery conflicts with logical slots.
+ *
+ * When xid is valid, it means that rows older than xid might have been
+ * removed. Therefore we need to drop slots that depend on seeing those rows.
+ * When xid is invalid, drop all logical slots. This is required when the
+ * master wal_level is set back to replica, so existing logical slots need to
+ * be dropped. Also, when xid is invalid, a common 'conflict_reason' is
+ * provided for the error detail; otherwise it is NULL, in which case it is
+ * constructed out of the xid value.
+ */
+void
+ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid,
+ char *conflict_reason)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* We are only dealing with *logical* slot conflicts. */
+ if (!SlotIsLogical(s))
+ continue;
+
+ /* Invalid xid means caller is asking to drop all logical slots */
+ if (!TransactionIdIsValid(xid))
+ found_conflict = true;
+ else
+ {
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+ StringInfoData conflict_str;
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ /*
+ * Build the conflict_str which will look like :
+ * "slot xmin: 1234, slot catalog_xmin: 5678, conflicts with xid
+ * horizon being increased to 9012"
+ */
+ initStringInfo(&conflict_str);
+ if (TransactionIdIsValid(slot_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ appendStringInfo(&conflict_str, "slot xmin: %d", slot_xmin);
+
+ if (TransactionIdIsValid(slot_catalog_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ appendStringInfo(&conflict_str, "%sslot catalog_xmin: %d",
+ conflict_str.len > 0 ? ", " : "",
+ slot_catalog_xmin);
+
+ if (conflict_str.len > 0)
+ {
+ appendStringInfo(&conflict_str, ", %s %d",
+ gettext_noop("conflicts with xid horizon being increased to"),
+ xid);
+ found_conflict = true;
+ conflict_reason = conflict_str.data;
+ }
+ }
+
+ if (found_conflict)
+ {
+ NameData slotname;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ SpinLockRelease(&s->mutex);
+
+ ereport(LOG,
+ (errmsg("Dropping conflicting slot %s", NameStr(slotname)),
+ errdetail("%s", conflict_reason)));
+
+ /* ReplicationSlotDropPtr() would acquire the lock below */
+ LWLockRelease(ReplicationSlotControlLock);
+
+ ReplicationSlotDropConflicting(s);
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 92fa86f..4ce7096 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2814,6 +2814,7 @@ XLogSendLogical(void)
{
XLogRecord *record;
char *errm;
+ XLogRecPtr flushPtr;
/*
* Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2830,10 +2831,11 @@ XLogSendLogical(void)
if (errm != NULL)
elog(ERROR, "%s", errm);
+ flushPtr = (am_cascading_walsender ?
+ GetStandbyFlushRecPtr() : GetFlushRecPtr());
+
if (record != NULL)
{
- /* XXX: Note that logical decoding cannot be used while in recovery */
- XLogRecPtr flushPtr = GetFlushRecPtr();
/*
* Note the lack of any call to LagTrackerWrite() which is handled by
@@ -2857,7 +2859,7 @@ XLogSendLogical(void)
* If the record we just wanted read is at or beyond the flushed
* point, then we're caught up.
*/
- if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
+ if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
{
WalSndCaughtUp = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 18a0f62..ec696f4 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2669,6 +2669,10 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
GET_VXID_FROM_PGPROC(procvxid, *proc);
+ /*
+ * Note: vxid.localTransactionId can be invalid, which means the
+ * request is to signal the pid that is not running a transaction.
+ */
if (procvxid.backendId == vxid.backendId &&
procvxid.localTransactionId == vxid.localTransactionId)
{
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7605b2c..645f320 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -286,6 +286,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+ if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT))
+ RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 25b7e31..7cfb6d5 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithLogicalSlots(node.dbNode, latestRemovedXid, NULL);
}
void
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1..c23d361 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2393,6 +2393,9 @@ errdetail_recovery_conflict(void)
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
errdetail("User query might have needed to see row versions that must be removed.");
break;
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ errdetail("User was using the logical slot that must be dropped.");
+ break;
case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
errdetail("User transaction caused buffer deadlock with recovery.");
break;
@@ -2879,6 +2882,25 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
case PROCSIG_RECOVERY_CONFLICT_LOCK:
case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ /*
+ * For conflicts that require a logical slot to be dropped, the
+ * requirement is for the signal receiver to release the slot,
+ * so that it could be dropped by the signal sender. So for
+ * normal backends, the transaction should be aborted, just
+ * like for other recovery conflicts. But if it's walsender on
+ * standby, then it has to be killed so as to release an
+ * acquired logical slot.
+ */
+ if (am_cascading_walsender &&
+ reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT &&
+ MyReplicationSlot && SlotIsLogical(MyReplicationSlot))
+ {
+ RecoveryConflictPending = true;
+ QueryCancelPending = true;
+ InterruptPending = true;
+ break;
+ }
/*
* If we aren't in a transaction any longer then ignore.
@@ -2920,7 +2942,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
/* Intentional fall through to session cancel */
/* FALLTHROUGH */
-
case PROCSIG_RECOVERY_CONFLICT_DATABASE:
RecoveryConflictPending = true;
ProcDiePending = true;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..7dfbef7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1499,6 +1499,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS)
dbentry->n_conflict_tablespace +
dbentry->n_conflict_lock +
dbentry->n_conflict_snapshot +
+ dbentry->n_conflict_logicalslot +
dbentry->n_conflict_bufferpin +
dbentry->n_conflict_startup_deadlock);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c13c08a..bd35bc1 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 969a537..59246c3 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -48,9 +48,9 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -96,6 +96,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f6cdca8..a1d1f11 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 073f740..d3dad69 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 237f4e0..e7439c1 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern WalLevel GetActiveWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..4fe8684 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -604,6 +604,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_conflict_tablespace;
PgStat_Counter n_conflict_lock;
PgStat_Counter n_conflict_snapshot;
+ PgStat_Counter n_conflict_logicalslot;
PgStat_Counter n_conflict_bufferpin;
PgStat_Counter n_conflict_startup_deadlock;
PgStat_Counter n_temp_files;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8fbddea..73b954e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, char *reason);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 05b186a..956d3c2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -39,6 +39,7 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
PROCSIG_RECOVERY_CONFLICT_LOCK,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
+ PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index a3f8f82..6dedebc 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool onCatalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bf..579d9ff 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d7f33ab..8c90fd7 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 6019f37..719837d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2000,6 +2000,33 @@ sub pg_recvlogical_upto
=pod
+=item $node->create_logical_slot_on_standby(self, master, slot_name, dbname)
+
+Create logical replication slot on given standby
+
+=cut
+
+sub create_logical_slot_on_standby
+{
+ my ($self, $master, $slot_name, $dbname) = @_;
+ my ($stdout, $stderr);
+
+ my $handle;
+
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $self->connstr($dbname), '-P', 'test_decoding', '-S', $slot_name, '--create-slot'], '>', \$stdout, '2>', \$stderr);
+ sleep(1);
+
+ # Slot creation on standby waits for an xl_running_xacts record. So arrange
+ # for it.
+ $master->safe_psql('postgres', 'CHECKPOINT');
+
+ $handle->finish();
+
+ return 0;
+}
+
+=pod
+
=back
=cut
diff --git a/src/test/recovery/t/018_logical_decoding_on_replica.pl b/src/test/recovery/t/018_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..304f32a
--- /dev/null
+++ b/src/test/recovery/t/018_logical_decoding_on_replica.pl
@@ -0,0 +1,395 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 51;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_phys_mins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('decoding_standby');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin, "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+is($new_logical_xmin, '', "logical xmin null");
+isnt($new_logical_catalog_xmin, '', "logical slot catalog_xmin not null");
+cmp_ok($new_logical_catalog_xmin, ">", $logical_catalog_xmin, "logical slot catalog_xmin advanced after get_changes");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent
+
+my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+isnt($new_physical_xmin, '', "physical xmin not null");
+# hot standby feedback should advance phys catalog_xmin now that the standby's
+# slot doesn't hold it down as far.
+isnt($new_physical_catalog_xmin, '', "physical catalog_xmin not null");
+cmp_ok($new_physical_catalog_xmin, ">", $physical_catalog_xmin, "physical catalog_xmin advanced");
+
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin, 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin, 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+########################################################################
+# Recovery conflict: conflicting replication slot should get dropped
+########################################################################
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+isnt($ret, 0, 'usage of slot failed as expected');
+like($stderr, qr/does not exist/, 'slot not found as expected');
+
+# Re-create the slot now that we know it is dropped
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+# Set hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. Both should be non-NULL since hs_feedback is on and
+# there is a logical slot present on standby.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NOT NULL");
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_phys_mins($node_master, 'decoding_standby',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery: drop database drops idle slots
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB on the upstream if they're on the right DB, or not dropped if on
+# another DB.
+
+is($node_replica->create_logical_slot_on_standby($node_master, 'dodropslot', 'testdb'),
+ 0, 'created dodropslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+is($node_replica->create_logical_slot_on_standby($node_master, 'otherslot', 'postgres'),
+ 0, 'created otherslot on postgres')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+is($node_replica->slot('dodropslot')->{'slot_type'}, 'logical', 'slot dodropslot on standby created');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'slot otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby not dropped');
+
+
+##################################################
+# Recovery: drop database drops in-use slots
+##################################################
+
+# This time, have the slot in-use on the downstream DB when we drop it.
+print "Testing dropdb when downstream slot is in-use";
+$node_master->psql('postgres', q[CREATE DATABASE testdb2]);
+
+print "creating slot dodropslot2";
+$node_replica->command_ok(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-P', 'test_decoding', '-S', 'dodropslot2', '--create-slot'],
+ 'pg_recvlogical created slot test_decoding');
+is($node_replica->slot('dodropslot2')->{'slot_type'}, 'logical', 'slot dodropslot2 on standby created');
+
+# make sure the slot is in use
+print "starting pg_recvlogical";
+$handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb2'), '-S', 'dodropslot2', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+sleep(1);
+
+is($node_replica->slot('dodropslot2')->{'active'}, 't', 'slot on standby is active')
+ or BAIL_OUT("slot not active on standby, cannot continue. pg_recvlogical exited with '$stdout', '$stderr'");
+
+# Master doesn't know the replica's slot is busy so dropdb should succeed
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb2]);
+ok(1, 'dropdb finished');
+
+while ($node_replica->slot('dodropslot2')->{'active_pid'})
+{
+ sleep(1);
+ print "waiting for walsender to exit";
+}
+
+print "walsender exited, waiting for pg_recvlogical to exit";
+
+# our client should've terminated in response to the walsender error
+eval {
+ $handle->finish;
+};
+$return = $?;
+if ($return) {
+ is($return, 256, "pg_recvlogical terminated by server");
+ like($stderr, qr/terminating connection due to conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/User was connected to a database that must be dropped./, 'recvlogical recovery conflict db');
+}
+
+is($node_replica->slot('dodropslot2')->{'active_pid'}, '', 'walsender backend exited');
+
+# The slot should be dropped by recovery now
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres', q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb2')]), 'f',
+ 'database dropped on standby');
+
+is($node_replica->slot('dodropslot2')->{'slot_type'}, '', 'slot on standby dropped');
--
2.1.4
On Fri, Jun 21, 2019 at 11:50 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
This definitely needs to be expanded, and follow the message style
guideline.This message , with the v8 patch, looks like this :
ereport(LOG,
(errmsg("Dropping conflicting slot %s", NameStr(slotname)),
errdetail("%s", reason)));
where reason is a char string.
That does not follow the message style guideline.
https://www.postgresql.org/docs/12/error-style-guide.html
From the grammar and punctuation section:
"Primary error messages: Do not capitalize the first letter. Do not
end a message with a period. Do not even think about ending a message
with an exclamation point.
Detail and hint messages: Use complete sentences, and end each with a
period. Capitalize the first word of sentences. Put two spaces after
the period if another sentence follows (for English text; might be
inappropriate in other languages)."
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, 25 Jun 2019 at 19:14, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jun 21, 2019 at 11:50 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
This definitely needs to be expanded, and follow the message style
guideline.This message , with the v8 patch, looks like this :
ereport(LOG,
(errmsg("Dropping conflicting slot %s", NameStr(slotname)),
errdetail("%s", reason)));
where reason is a char string.That does not follow the message style guideline.
https://www.postgresql.org/docs/12/error-style-guide.html
From the grammar and punctuation section:
"Primary error messages: Do not capitalize the first letter. Do not
end a message with a period. Do not even think about ending a message
with an exclamation point.Detail and hint messages: Use complete sentences, and end each with a
period. Capitalize the first word of sentences. Put two spaces after
the period if another sentence follows (for English text; might be
inappropriate in other languages)."
Thanks. In the updated patch, changed the message style. Now it looks
like this :
primary message : dropped conflicting slot slot_name
error detail : Slot conflicted with xid horizon which was being
increased to 9012 (slot xmin: 1234, slot catalog_xmin: 5678).
--------------------
Also, in the updated patch (v11), I have added some scenarios that
verify that slot is dropped when either master wal_level is
insufficient, or when slot is conflicting. Also organized the test
file a bit.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v11.patchapplication/octet-stream; name=logical-decoding-on-standby_v11.patchDownload
From aa3004a70e1ab2ee304367b29dde1549326354f1 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Mon, 1 Jul 2019 10:49:50 +0530
Subject: [PATCH] Logical decoding on standby - v11
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
3. Handle slot conflicts when master wal_level becomes less than logical.
Changes in v6 patch :
While creating the slot, lastReplayedEndRecPtr is used to set the
restart_lsn, but its position is later adjusted in
DecodingContextFindStartpoint() in case it does not point to a
valid record location. This can happen because replay pointer
points to 1 + end of last record replayed, which means it can
coincide with first byte of a new WAL block, i.e. inside block
header.
Also, modified the test to handle the requirement that the
logical slot creation on standby requires a checkpoint
(or any other transaction commit) to be given from master. For
that, in src/test/perl/PostgresNode.pm, added a new function
create_logical_slot_on_standby() which does the reqiured steps.
Changes in v7 patch :
Merge the two conflict messages for xmin and catalog_xmin into
a single one.
Changes in v8 :
Fix incorrect flush ptr on standby (reported by Tushar Ahuja).
In XLogSendLogical(), GetFlushRecPtr() was used to get the flushed
point. On standby, GetFlushRecPtr() does not give a valid value, so it
was wrongly determined that the sent record is beyond flush point, as
a result of which, WalSndCaughtUp was set to true, causing
WalSndLoop() to sleep for some duration after every record.
This was reported by Tushar Ahuja, where pg_recvlogical seems like it
is hanging when there are loads of insert.
Fix: Use GetStandbyFlushRecPtr() if am_cascading_walsender
Changes in v9 :
While dropping a conflicting logical slot, if a backend has acquired it, send
it a conflict recovery signal. Check new function ReplicationSlotDropConflicting().
Also, miscellaneous review comments addressed, but not all of them yet.
Changes in v10 :
Adjust restart_lsn if it's a Replay Pointer.
This was earlier done in DecodingContextFindStartpoint() but now it
is done in in ReplicationSlotReserveWal(), when restart_lsn is initialized.
Changes in v11 :
Added some test scenarios to test drop-slot conflicts. Organized the
test file a bit.
Also improved the conflict error message.
---
src/backend/access/gist/gistxlog.c | 6 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 4 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 2 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/access/transam/xlog.c | 22 ++
src/backend/postmaster/pgstat.c | 4 +
src/backend/replication/logical/decode.c | 14 +-
src/backend/replication/logical/logical.c | 33 +-
src/backend/replication/slot.c | 233 +++++++++++-
src/backend/replication/walsender.c | 8 +-
src/backend/storage/ipc/procarray.c | 4 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/ipc/standby.c | 7 +-
src/backend/tcop/postgres.c | 23 +-
src/backend/utils/adt/pgstatfuncs.c | 1 +
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/access/xlog.h | 1 +
src/include/pgstat.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/procsignal.h | 1 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 1 +
src/test/perl/PostgresNode.pm | 27 ++
.../recovery/t/018_logical_decoding_on_replica.pl | 420 +++++++++++++++++++++
36 files changed, 830 insertions(+), 58 deletions(-)
create mode 100644 src/test/recovery/t/018_logical_decoding_on_replica.pl
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..385ea1f 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 5321762..e28465a 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d768b9b..10b7857 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7240,6 +7242,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7270,7 +7273,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7280,6 +7283,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7700,7 +7704,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7736,7 +7741,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7832,7 +7838,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7969,7 +7977,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..bf34d3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 0357030..6b641c9 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1140,6 +1142,8 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable =
+ RelationIsAccessibleInLogicalDecoding(heapRel);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 6532a25..b874bda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2b1662a..eaaf631 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e08320e..7417bcf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4926,6 +4926,15 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file.
+ */
+WalLevel
+GetActiveWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,19 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+ /*
+ * Drop logical slots if we are in hot standby and master does not have
+ * logical data. Don't bother to search for the slots if standby is
+ * running with wal_level lower than logical, because in that case,
+ * we would have either disallowed creation of logical slots or dropped
+ * existing ones.
+ */
+ if (InRecovery && InHotStandby &&
+ xlrec.wal_level < WAL_LEVEL_LOGICAL &&
+ wal_level >= WAL_LEVEL_LOGICAL)
+ ResolveRecoveryConflictWithLogicalSlots(InvalidOid, InvalidTransactionId,
+ gettext_noop("Logical decoding on standby requires wal_level >= logical on master."));
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->MaxConnections = xlrec.MaxConnections;
ControlFile->max_worker_processes = xlrec.max_worker_processes;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28..797ea0c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4728,6 +4728,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_conflict_tablespace = 0;
dbentry->n_conflict_lock = 0;
dbentry->n_conflict_snapshot = 0;
+ dbentry->n_conflict_logicalslot = 0;
dbentry->n_conflict_bufferpin = 0;
dbentry->n_conflict_startup_deadlock = 0;
dbentry->n_temp_files = 0;
@@ -6352,6 +6353,9 @@ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len)
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
dbentry->n_conflict_snapshot++;
break;
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ dbentry->n_conflict_logicalslot++;
+ break;
case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
dbentry->n_conflict_bufferpin++;
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..c1bd028 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /* Cannot proceed if master itself does not have logical data */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index bbd38c0..4169828 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,23 +94,22 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
- /* ----
- * TODO: We got to change that someday soon...
- *
- * There's basically three things missing to allow this:
- * 1) We need to be able to correctly and quickly identify the timeline a
- * LSN belongs to
- * 2) We need to force hot_standby_feedback to be enabled at all times so
- * the primary cannot remove rows we need.
- * 3) support dropping replication slots referring to a database, in
- * dbase_redo. There can't be any active ones due to HS recovery
- * conflicts, so that should be relatively easy.
- * ----
- */
if (RecoveryInProgress())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("logical decoding cannot be used while in recovery")));
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
}
/*
@@ -241,6 +240,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 55c306e..47c7dd8 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
#include "pgstat.h"
#include "replication/slot.h"
#include "storage/fd.h"
+#include "storage/lock.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
@@ -101,6 +102,7 @@ int max_replication_slots = 0; /* the maximum number of replication
static void ReplicationSlotDropAcquired(void);
static void ReplicationSlotDropPtr(ReplicationSlot *slot);
+static void ReplicationSlotDropConflicting(ReplicationSlot *slot);
/* internal persistency functions */
static void RestoreSlotFromDisk(const char *name);
@@ -638,6 +640,64 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
}
/*
+ * Permanently drop a conflicting replication slot. If it's already active by
+ * another backend, send it a recovery conflict signal, and then try again.
+ */
+static void
+ReplicationSlotDropConflicting(ReplicationSlot *slot)
+{
+ pid_t active_pid;
+ PGPROC *proc;
+ VirtualTransactionId vxid;
+
+ ConditionVariablePrepareToSleep(&slot->active_cv);
+ while (1)
+ {
+ SpinLockAcquire(&slot->mutex);
+ active_pid = slot->active_pid;
+ if (active_pid == 0)
+ active_pid = slot->active_pid = MyProcPid;
+ SpinLockRelease(&slot->mutex);
+
+ /* Drop the acquired slot, unless it is acquired by another backend */
+ if (active_pid == MyProcPid)
+ {
+ elog(DEBUG1, "acquired conflicting slot, now dropping it");
+ ReplicationSlotDropPtr(slot);
+ break;
+ }
+
+ /* Send the other backend, a conflict recovery signal */
+
+ SetInvalidVirtualTransactionId(vxid);
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ proc = BackendPidGetProcWithLock(active_pid);
+ if (proc)
+ GET_VXID_FROM_PGPROC(vxid, *proc);
+ LWLockRelease(ProcArrayLock);
+
+ /*
+ * If coincidently that process finished, some other backend may
+ * acquire the slot again. So start over again.
+ * Note: Even if vxid.localTransactionId is invalid, we need to cancel
+ * that backend, because there is no other way to make it release the
+ * slot. So don't bother to validate vxid.localTransactionId.
+ */
+ if (vxid.backendId == InvalidBackendId)
+ continue;
+
+ elog(DEBUG1, "cancelling pid %d (backendId: %d) for releasing slot",
+ active_pid, vxid.backendId);
+
+ CancelVirtualTransaction(vxid, PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+ ConditionVariableSleep(&slot->active_cv,
+ WAIT_EVENT_REPLICATION_SLOT_DROP);
+ }
+
+ ConditionVariableCancelSleep();
+}
+
+/*
* Serialize the currently acquired slot's state from memory to disk, thereby
* guaranteeing the current state will survive a crash.
*/
@@ -1016,37 +1076,56 @@ ReplicationSlotReserveWal(void)
/*
* For logical slots log a standby snapshot and start logical decoding
* at exactly that position. That allows the slot to start up more
- * quickly.
+ * quickly. But on a standby we cannot do WAL writes, so just use the
+ * replay pointer; effectively, an attempt to create a logical slot on
+ * standby will cause it to wait for an xl_running_xact record to be
+ * logged independently on the primary, so that a snapshot can be built
+ * using the record.
*
- * That's not needed (or indeed helpful) for physical slots as they'll
- * start replay at the last logged checkpoint anyway. Instead return
- * the location of the last redo LSN. While that slightly increases
- * the chance that we have to retry, it's where a base backup has to
- * start replay at.
+ * None of this is needed (or indeed helpful) for physical slots as
+ * they'll start replay at the last logged checkpoint anyway. Instead
+ * return the location of the last redo LSN. While that slightly
+ * increases the chance that we have to retry, it's where a base backup
+ * has to start replay at.
*/
+ if (SlotIsPhysical(slot))
+ restart_lsn = GetRedoRecPtr();
+ else if (RecoveryInProgress())
+ {
+ restart_lsn = GetXLogReplayRecPtr(NULL);
+ /*
+ * Replay pointer may point one past the end of the record. If that
+ * is a XLOG page boundary, it will not be a valid LSN for the
+ * start of a record, so bump it up past the page header.
+ */
+ if (!XRecOffIsValid(restart_lsn))
+ {
+ if (restart_lsn % XLOG_BLCKSZ != 0)
+ elog(ERROR, "invalid replay pointer");
+ /* For the first page of a segment file, it's a long header */
+ if (XLogSegmentOffset(restart_lsn, wal_segment_size) == 0)
+ restart_lsn += SizeOfXLogLongPHD;
+ else
+ restart_lsn += SizeOfXLogShortPHD;
+ }
+ }
+ else
+ restart_lsn = GetXLogInsertRecPtr();
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = restart_lsn;
+ SpinLockRelease(&slot->mutex);
+
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
XLogRecPtr flushptr;
- /* start at current insert position */
- restart_lsn = GetXLogInsertRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
-
/* make sure we have enough information to start */
flushptr = LogStandbySnapshot();
/* and make sure it's fsynced to disk */
XLogFlush(flushptr);
}
- else
- {
- restart_lsn = GetRedoRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
- }
/* prevent WAL removal as fast as possible */
ReplicationSlotsComputeRequiredLSN();
@@ -1065,6 +1144,122 @@ ReplicationSlotReserveWal(void)
}
/*
+ * Resolve recovery conflicts with logical slots.
+ *
+ * When xid is valid, it means that rows older than xid might have been
+ * removed. Therefore we need to drop slots that depend on seeing those rows.
+ * When xid is invalid, drop all logical slots. This is required when the
+ * master wal_level is set back to replica, so existing logical slots need to
+ * be dropped. Also, when xid is invalid, a common 'conflict_reason' is
+ * provided for the error detail; otherwise it is NULL, in which case it is
+ * constructed out of the xid value.
+ */
+void
+ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid,
+ char *conflict_reason)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* We are only dealing with *logical* slot conflicts. */
+ if (!SlotIsLogical(s))
+ continue;
+
+ /* Invalid xid means caller is asking to drop all logical slots */
+ if (!TransactionIdIsValid(xid))
+ found_conflict = true;
+ else
+ {
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+ StringInfoData conflict_str, conflict_xmins;
+ char *conflict_sentence =
+ gettext_noop("Slot conflicted with xid horizon which was being increased to");
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ /*
+ * Build the conflict_str which will look like :
+ * "Slot conflicted with xid horizon which was being increased
+ * to 9012 (slot xmin: 1234, slot catalog_xmin: 5678)."
+ */
+ initStringInfo(&conflict_xmins);
+ if (TransactionIdIsValid(slot_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ appendStringInfo(&conflict_xmins, "slot xmin: %d", slot_xmin);
+ }
+ if (TransactionIdIsValid(slot_catalog_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ appendStringInfo(&conflict_xmins, "%sslot catalog_xmin: %d",
+ conflict_xmins.len > 0 ? ", " : "",
+ slot_catalog_xmin);
+
+ if (conflict_xmins.len > 0)
+ {
+ initStringInfo(&conflict_str);
+ appendStringInfo(&conflict_str, "%s %d (%s).",
+ conflict_sentence, xid, conflict_xmins.data);
+ found_conflict = true;
+ conflict_reason = conflict_str.data;
+ }
+ }
+
+ if (found_conflict)
+ {
+ NameData slotname;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ SpinLockRelease(&s->mutex);
+
+ /* ReplicationSlotDropPtr() would acquire the lock below */
+ LWLockRelease(ReplicationSlotControlLock);
+
+ ReplicationSlotDropConflicting(s);
+
+ ereport(LOG,
+ (errmsg("dropped conflicting slot %s", NameStr(slotname)),
+ errdetail("%s", conflict_reason)));
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 92fa86f..4ce7096 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2814,6 +2814,7 @@ XLogSendLogical(void)
{
XLogRecord *record;
char *errm;
+ XLogRecPtr flushPtr;
/*
* Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2830,10 +2831,11 @@ XLogSendLogical(void)
if (errm != NULL)
elog(ERROR, "%s", errm);
+ flushPtr = (am_cascading_walsender ?
+ GetStandbyFlushRecPtr() : GetFlushRecPtr());
+
if (record != NULL)
{
- /* XXX: Note that logical decoding cannot be used while in recovery */
- XLogRecPtr flushPtr = GetFlushRecPtr();
/*
* Note the lack of any call to LagTrackerWrite() which is handled by
@@ -2857,7 +2859,7 @@ XLogSendLogical(void)
* If the record we just wanted read is at or beyond the flushed
* point, then we're caught up.
*/
- if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
+ if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
{
WalSndCaughtUp = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 18a0f62..ec696f4 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2669,6 +2669,10 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
GET_VXID_FROM_PGPROC(procvxid, *proc);
+ /*
+ * Note: vxid.localTransactionId can be invalid, which means the
+ * request is to signal the pid that is not running a transaction.
+ */
if (procvxid.backendId == vxid.backendId &&
procvxid.localTransactionId == vxid.localTransactionId)
{
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7605b2c..645f320 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -286,6 +286,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+ if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT))
+ RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 25b7e31..7cfb6d5 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithLogicalSlots(node.dbNode, latestRemovedXid, NULL);
}
void
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1..c23d361 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2393,6 +2393,9 @@ errdetail_recovery_conflict(void)
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
errdetail("User query might have needed to see row versions that must be removed.");
break;
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ errdetail("User was using the logical slot that must be dropped.");
+ break;
case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
errdetail("User transaction caused buffer deadlock with recovery.");
break;
@@ -2879,6 +2882,25 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
case PROCSIG_RECOVERY_CONFLICT_LOCK:
case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ /*
+ * For conflicts that require a logical slot to be dropped, the
+ * requirement is for the signal receiver to release the slot,
+ * so that it could be dropped by the signal sender. So for
+ * normal backends, the transaction should be aborted, just
+ * like for other recovery conflicts. But if it's walsender on
+ * standby, then it has to be killed so as to release an
+ * acquired logical slot.
+ */
+ if (am_cascading_walsender &&
+ reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT &&
+ MyReplicationSlot && SlotIsLogical(MyReplicationSlot))
+ {
+ RecoveryConflictPending = true;
+ QueryCancelPending = true;
+ InterruptPending = true;
+ break;
+ }
/*
* If we aren't in a transaction any longer then ignore.
@@ -2920,7 +2942,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
/* Intentional fall through to session cancel */
/* FALLTHROUGH */
-
case PROCSIG_RECOVERY_CONFLICT_DATABASE:
RecoveryConflictPending = true;
ProcDiePending = true;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..7dfbef7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1499,6 +1499,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS)
dbentry->n_conflict_tablespace +
dbentry->n_conflict_lock +
dbentry->n_conflict_snapshot +
+ dbentry->n_conflict_logicalslot +
dbentry->n_conflict_bufferpin +
dbentry->n_conflict_startup_deadlock);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c13c08a..bd35bc1 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 969a537..59246c3 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -48,9 +48,9 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -96,6 +96,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f6cdca8..a1d1f11 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 073f740..d3dad69 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 237f4e0..e7439c1 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern WalLevel GetActiveWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..4fe8684 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -604,6 +604,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_conflict_tablespace;
PgStat_Counter n_conflict_lock;
PgStat_Counter n_conflict_snapshot;
+ PgStat_Counter n_conflict_logicalslot;
PgStat_Counter n_conflict_bufferpin;
PgStat_Counter n_conflict_startup_deadlock;
PgStat_Counter n_temp_files;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8fbddea..73b954e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, char *reason);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 05b186a..956d3c2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -39,6 +39,7 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
PROCSIG_RECOVERY_CONFLICT_LOCK,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
+ PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index a3f8f82..6dedebc 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool onCatalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bf..579d9ff 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d7f33ab..8c90fd7 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 6019f37..719837d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2000,6 +2000,33 @@ sub pg_recvlogical_upto
=pod
+=item $node->create_logical_slot_on_standby(self, master, slot_name, dbname)
+
+Create logical replication slot on given standby
+
+=cut
+
+sub create_logical_slot_on_standby
+{
+ my ($self, $master, $slot_name, $dbname) = @_;
+ my ($stdout, $stderr);
+
+ my $handle;
+
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $self->connstr($dbname), '-P', 'test_decoding', '-S', $slot_name, '--create-slot'], '>', \$stdout, '2>', \$stderr);
+ sleep(1);
+
+ # Slot creation on standby waits for an xl_running_xacts record. So arrange
+ # for it.
+ $master->safe_psql('postgres', 'CHECKPOINT');
+
+ $handle->finish();
+
+ return 0;
+}
+
+=pod
+
=back
=cut
diff --git a/src/test/recovery/t/018_logical_decoding_on_replica.pl b/src/test/recovery/t/018_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..fd77e19
--- /dev/null
+++ b/src/test/recovery/t/018_logical_decoding_on_replica.pl
@@ -0,0 +1,420 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 58;
+use RecursiveCopy;
+use File::Copy;
+use Time::HiRes qw(usleep);
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+my $node_master = get_new_node('master');
+my $node_replica = get_new_node('replica');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_xmins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('master_physical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+sub create_logical_slots
+{
+ is($node_replica->create_logical_slot_on_standby($node_master, 'dropslot', 'testdb'),
+ 0, 'created dropslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+ is($node_replica->slot('dropslot')->{'slot_type'}, 'logical', 'dropslot on standby created');
+ is($node_replica->create_logical_slot_on_standby($node_master, 'activeslot', 'testdb'),
+ 0, 'created activeslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+ is($node_replica->slot('activeslot')->{'slot_type'}, 'logical', 'activeslot on standby created');
+
+ return 0;
+}
+
+sub make_slot_active
+{
+ # make sure activeslot is in use
+ print "starting pg_recvlogical";
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb'), '-S', 'activeslot', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+
+ while (!$node_replica->slot('activeslot')->{'active_pid'})
+ {
+ usleep(100_000);
+ print "waiting for slot to become active\n";
+ }
+ return 0;
+}
+
+sub check_slots_dropped
+{
+ is($node_replica->slot('dropslot')->{'slot_type'}, '', 'dropslot on standby dropped');
+ is($node_replica->slot('activeslot')->{'slot_type'}, '', 'activeslot on standby dropped');
+
+ # our client should've terminated in response to the walsender error
+ eval {
+ $handle->finish;
+ };
+ $return = $?;
+ cmp_ok($return, "!=", 0, "pg_recvlogical exited non-zero ");
+ if ($return) {
+ like($stderr, qr/conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/must be dropped/, 'recvlogical error detail');
+ }
+
+ return 0;
+}
+
+# Initialize master node
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('master_physical');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=master_physical');
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+# Initialize slave node
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'master_physical']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_xmins($node_master, 'master_physical',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin,
+ "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb',
+ qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# logical slot catalog_xmin on slave should advance after
+# pg_logical_slot_get_changes
+($new_logical_xmin, $new_logical_catalog_xmin) =
+ wait_for_xmins($node_replica, 'standby_logical',
+ "catalog_xmin::varchar::int > ${logical_catalog_xmin}");
+is($new_logical_xmin, '', "logical xmin null");
+
+# hot standby feedback should advance master's phys catalog_xmin now that the
+# standby's slot doesn't hold it down as far.
+my ($new_physical_xmin, $new_physical_catalog_xmin) =
+ wait_for_xmins($node_master, 'master_physical',
+ "catalog_xmin::varchar::int > ${physical_catalog_xmin}");
+isnt($new_physical_xmin, '', "physical xmin not null");
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin,
+ 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin,
+ 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_xmins($node_master, 'master_physical',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery conflict: Drop conflicting slots, including in-use slots
+# Scenario 1 : hot_standby_feedback off
+##################################################
+
+create_logical_slots();
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback off by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_xmins($node_master, 'master_physical',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+
+make_slot_active();
+
+# This should trigger the conflict
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+check_slots_dropped();
+
+# Turn hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_xmins($node_master, 'master_physical',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery conflict: Drop conflicting slots, including in-use slots
+# Scenario 2 : incorrect wal_level at master
+##################################################
+
+create_logical_slots();
+
+make_slot_active();
+
+# Make master wal_level replica. This will trigger slot conflict.
+$node_master->append_conf('postgresql.conf',q[
+wal_level = 'replica'
+]);
+$node_master->restart;
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+check_slots_dropped();
+
+# Restore master wal_level
+$node_master->append_conf('postgresql.conf',q[
+wal_level = 'logical'
+]);
+$node_master->restart;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+##################################################
+# Recovery: drop database drops slots, including active slots.
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB.
+create_logical_slots();
+
+make_slot_active();
+
+# Create a slot on a database that would not be dropped. This slot should not
+# get dropped.
+is($node_replica->create_logical_slot_on_standby($node_master, 'otherslot', 'postgres'),
+ 0, 'created otherslot on postgres')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres',
+ q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+check_slots_dropped();
+
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical',
+ 'otherslot on standby not dropped');
+
+# Cleanup : manually drop the slot that was not dropped.
+$node_replica->psql('postgres', q[SELECT pg_drop_replication_slot('otherslot')]);
--
2.1.4
On 07/01/2019 11:04 AM, Amit Khandekar wrote:
Also, in the updated patch (v11), I have added some scenarios that
verify that slot is dropped when either master wal_level is
insufficient, or when slot is conflicting. Also organized the test
file a bit.
One scenario where replication slot removed even after fixing the
problem (which Error message suggested to do)
Please refer this below scenario
Master cluster-
postgresql,conf file
wal_level=logical
hot_standby_feedback = on
port=5432
Standby cluster-
postgresql,conf file
wal_level=logical
hot_standby_feedback = on
port=5433
both Master/Slave cluster are up and running and are in SYNC with each other
Create a logical replication slot on SLAVE ( SELECT * from
pg_create_logical_replication_slot('m', 'test_decoding'); )
change wal_level='hot_standby' on Master postgresql.conf file / restart
the server
Run get_changes function on Standby -
postgres=# select * from pg_logical_slot_get_changes('m',null,null);
ERROR: logical decoding on standby requires wal_level >= logical on master
Correct it on Master postgresql.conf file ,i.e set wal_level='logical'
again / restart the server
and again fire get_changes function on Standby -
postgres=# select * from pg_logical_slot_get_changes('m',null,null);
*ERROR: replication slot "m" does not exist
*This looks little weird as slot got dropped/removed internally . i
guess it should get invalid rather than removed automatically.
Lets user's delete the slot themself rather than automatically removed
as a surprise.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Thu, 4 Jul 2019 at 15:52, tushar <tushar.ahuja@enterprisedb.com> wrote:
On 07/01/2019 11:04 AM, Amit Khandekar wrote:
Also, in the updated patch (v11), I have added some scenarios that
verify that slot is dropped when either master wal_level is
insufficient, or when slot is conflicting. Also organized the test
file a bit.One scenario where replication slot removed even after fixing the problem (which Error message suggested to do)
Which specific problem are you referring to ? Removing a conflicting
slot, itself is the part of the fix for the conflicting slot problem.
Please refer this below scenario
Master cluster-
postgresql,conf file
wal_level=logical
hot_standby_feedback = on
port=5432Standby cluster-
postgresql,conf file
wal_level=logical
hot_standby_feedback = on
port=5433both Master/Slave cluster are up and running and are in SYNC with each other
Create a logical replication slot on SLAVE ( SELECT * from pg_create_logical_replication_slot('m', 'test_decoding'); )change wal_level='hot_standby' on Master postgresql.conf file / restart the server
Run get_changes function on Standby -
postgres=# select * from pg_logical_slot_get_changes('m',null,null);
ERROR: logical decoding on standby requires wal_level >= logical on masterCorrect it on Master postgresql.conf file ,i.e set wal_level='logical' again / restart the server
and again fire get_changes function on Standby -
postgres=# select * from pg_logical_slot_get_changes('m',null,null);
ERROR: replication slot "m" does not existThis looks little weird as slot got dropped/removed internally . i guess it should get invalid rather than removed automatically.
Lets user's delete the slot themself rather than automatically removed as a surprise.
It was earlier discussed about what action should be taken when we
find conflicting slots. Out of the options, one was to drop the slot,
and we chose that because that was simple. See this :
/messages/by-id/20181212204154.nsxf3gzqv3gesl32@alap3.anarazel.de
By the way, you are getting the "logical decoding on standby requires
wal_level >= logical on master" error while using the slot, which is
because we reject the command even before checking the existence of
the slot. Actually the slot is already dropped due to master
wal_level. Then when you correct the master wal_level, the command is
not rejected, and proceeds to use the slot and then it is found that
the slot does not exist.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Thu, 4 Jul 2019 at 17:21, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Thu, 4 Jul 2019 at 15:52, tushar <tushar.ahuja@enterprisedb.com> wrote:
On 07/01/2019 11:04 AM, Amit Khandekar wrote:
Also, in the updated patch (v11), I have added some scenarios that
verify that slot is dropped when either master wal_level is
insufficient, or when slot is conflicting. Also organized the test
file a bit.One scenario where replication slot removed even after fixing the problem (which Error message suggested to do)
Which specific problem are you referring to ? Removing a conflicting
slot, itself is the part of the fix for the conflicting slot problem.Please refer this below scenario
Master cluster-
postgresql,conf file
wal_level=logical
hot_standby_feedback = on
port=5432Standby cluster-
postgresql,conf file
wal_level=logical
hot_standby_feedback = on
port=5433both Master/Slave cluster are up and running and are in SYNC with each other
Create a logical replication slot on SLAVE ( SELECT * from pg_create_logical_replication_slot('m', 'test_decoding'); )change wal_level='hot_standby' on Master postgresql.conf file / restart the server
Run get_changes function on Standby -
postgres=# select * from pg_logical_slot_get_changes('m',null,null);
ERROR: logical decoding on standby requires wal_level >= logical on masterCorrect it on Master postgresql.conf file ,i.e set wal_level='logical' again / restart the server
and again fire get_changes function on Standby -
postgres=# select * from pg_logical_slot_get_changes('m',null,null);
ERROR: replication slot "m" does not existThis looks little weird as slot got dropped/removed internally . i guess it should get invalid rather than removed automatically.
Lets user's delete the slot themself rather than automatically removed as a surprise.It was earlier discussed about what action should be taken when we
find conflicting slots. Out of the options, one was to drop the slot,
and we chose that because that was simple. See this :
/messages/by-id/20181212204154.nsxf3gzqv3gesl32@alap3.anarazel.de
Sorry, the above link is not the one I wanted to refer to. Correct one is this :
/messages/by-id/20181214005521.jaty2d24lz4nroil@alap3.anarazel.de
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Hi,
Thanks for the new version! Looks like we're making progress towards
something committable here.
I think it'd be good to split the patch into a few pieces. I'd maybe do
that like:
1) WAL format changes (plus required other changes)
2) Recovery conflicts with slots
3) logical decoding on standby
4) tests
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*//* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
Hm. I think we otherwise only ever use
RelationIsAccessibleInLogicalDecoding() on tables, not on indexes. And
while I think this would mostly work for builtin catalog tables, it
won't work for "user catalog tables" as RelationIsUsedAsCatalogTable()
won't perform any useful checks for indexes.
So I think we either need to look up the table, or pass it down.
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index d768b9b..10b7857 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel, * see comments for vacuum_log_cleanup_info(). */ XLogRecPtr -log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid) +log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid) { xl_heap_cleanup_info xlrec; XLogRecPtr recptr;- xlrec.node = rnode; + xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel); + xlrec.node = rel->rd_node; xlrec.latestRemovedXid = latestRemovedXid;XLogBeginInsert();
@@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
It'd probably be a good idea to add a comment to
RelationIsUsedAsCatalogTable() that it better never invoke anything
performing catalog accesses. Otherwise there's quite the danger with
recursion (some operation doing RelationIsAccessibleInLogicalDecoding(),
that then accessing the catalog, which in turn could again need to
perform said operation, loop).
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
We should document that it is safe to do catalog acceses here, because
spgist is never used to back catalogs. Otherwise there would be an a
endless recursion danger here.
Did you check how hard it we to just pass down the heap relation?
/* + * Get the wal_level from the control file. + */ +WalLevel +GetActiveWalLevel(void) +{ + return ControlFile->wal_level; +}
What does "Active" mean here? I assume it's supposed to indicate that it
could be different than what's configured in postgresql.conf, for a
replica? If so, that should be mentioned.
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,19 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));+ /* + * Drop logical slots if we are in hot standby and master does not have + * logical data.
nitpick: s/master/the primary/ (mostly adding the "the", but I
personally also prefer primary over master)
s/logical data/a WAL level sufficient for logical decoding/
Don't bother to search for the slots if standby is + * running with wal_level lower than logical, because in that case, + * we would have either disallowed creation of logical slots or dropped + * existing ones.
s/Don't bother/No need/
s/slots/potentially conflicting logically slots/
+ if (InRecovery && InHotStandby && + xlrec.wal_level < WAL_LEVEL_LOGICAL && + wal_level >= WAL_LEVEL_LOGICAL) + ResolveRecoveryConflictWithLogicalSlots(InvalidOid, InvalidTransactionId, + gettext_noop("Logical decoding on standby requires wal_level >= logical on master."));
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 151c3ef..c1bd028 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) * can restart from there. */ break; + case XLOG_PARAMETER_CHANGE: + { + xl_parameter_change *xlrec = + (xl_parameter_change *) XLogRecGetData(buf->record); + /* Cannot proceed if master itself does not have logical data */
This needs an explanation as to how this is reachable...
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + break;
Hm, this strikes me as a not quite good enough error message (same in
other copies of the message). Perhaps something roughly like "could not
continue with logical decoding, the primary's wal level is now too low
(%u)"?
if (RecoveryInProgress()) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("logical decoding cannot be used while in recovery"))); + { + /* + * This check may have race conditions, but whenever + * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we + * verify that there are no existing logical replication slots. And to + * avoid races around creating a new slot, + * CheckLogicalDecodingRequirements() is called once before creating + * the slot, and once when logical decoding is initially starting up. + */ + if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + } }/*
@@ -241,6 +240,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;+ CheckLogicalDecodingRequirements();
+
This should reference the above explanation.
/* + * Permanently drop a conflicting replication slot. If it's already active by + * another backend, send it a recovery conflict signal, and then try again. + */ +static void +ReplicationSlotDropConflicting(ReplicationSlot *slot)
+void +ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, + char *conflict_reason) +{ + /* + * Build the conflict_str which will look like : + * "Slot conflicted with xid horizon which was being increased + * to 9012 (slot xmin: 1234, slot catalog_xmin: 5678)." + */ + initStringInfo(&conflict_xmins); + if (TransactionIdIsValid(slot_xmin) && + TransactionIdPrecedesOrEquals(slot_xmin, xid)) + { + appendStringInfo(&conflict_xmins, "slot xmin: %d", slot_xmin); + } + if (TransactionIdIsValid(slot_catalog_xmin) && + TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + appendStringInfo(&conflict_xmins, "%sslot catalog_xmin: %d", + conflict_xmins.len > 0 ? ", " : "", + slot_catalog_xmin); + + if (conflict_xmins.len > 0) + { + initStringInfo(&conflict_str); + appendStringInfo(&conflict_str, "%s %d (%s).", + conflict_sentence, xid, conflict_xmins.data); + found_conflict = true; + conflict_reason = conflict_str.data; + } + }
I think this is going to be a nightmare for translators, no? I'm not
clear as to why any of this is needed?
+ /* ReplicationSlotDropPtr() would acquire the lock below */ + LWLockRelease(ReplicationSlotControlLock);
"would acquire"? I think it *does* acquire, right?
@@ -2879,6 +2882,25 @@ RecoveryConflictInterrupt(ProcSignalReason reason) case PROCSIG_RECOVERY_CONFLICT_LOCK: case PROCSIG_RECOVERY_CONFLICT_TABLESPACE: case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT: + case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT: + /* + * For conflicts that require a logical slot to be dropped, the + * requirement is for the signal receiver to release the slot, + * so that it could be dropped by the signal sender. So for + * normal backends, the transaction should be aborted, just + * like for other recovery conflicts. But if it's walsender on + * standby, then it has to be killed so as to release an + * acquired logical slot. + */ + if (am_cascading_walsender && + reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT && + MyReplicationSlot && SlotIsLogical(MyReplicationSlot)) + { + RecoveryConflictPending = true; + QueryCancelPending = true; + InterruptPending = true; + break; + }
Huh, I'm not following as to why that's needed for walsenders?
@@ -1499,6 +1499,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS) dbentry->n_conflict_tablespace + dbentry->n_conflict_lock + dbentry->n_conflict_snapshot + + dbentry->n_conflict_logicalslot + dbentry->n_conflict_bufferpin + dbentry->n_conflict_startup_deadlock);
I think this probably needs adjustments in a few more places,
e.g. monitoring.sgml...
Thanks!
Andres Freund
On Tue, Jul 9, 2019 at 11:14 PM Andres Freund <andres@anarazel.de> wrote:
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + break;Hm, this strikes me as a not quite good enough error message (same in
other copies of the message). Perhaps something roughly like "could not
continue with logical decoding, the primary's wal level is now too low
(%u)"?
For what it's worth, I dislike that wording on grammatical grounds --
it sounds like two complete sentences joined by a comma, which is poor
style -- and think Amit's wording is probably fine. We could fix the
grammatical issue by replacing the comma in your version with the word
"because," but that seems unnecessarily wordy to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, 10 Jul 2019 at 08:44, Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the new version! Looks like we're making progress towards
something committable here.I think it'd be good to split the patch into a few pieces. I'd maybe do
that like:
1) WAL format changes (plus required other changes)
2) Recovery conflicts with slots
3) logical decoding on standby
4) tests
All right. Will do that in the next patch set. For now, I have quickly
done the below changes in a single patch again (attached), in order to
get early comments if any.
@@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*//* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;Hm. I think we otherwise only ever use
RelationIsAccessibleInLogicalDecoding() on tables, not on indexes. And
while I think this would mostly work for builtin catalog tables, it
won't work for "user catalog tables" as RelationIsUsedAsCatalogTable()
won't perform any useful checks for indexes.So I think we either need to look up the table, or pass it down.
Done. Passed down the heap rel.
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index d768b9b..10b7857 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel, * see comments for vacuum_log_cleanup_info(). */ XLogRecPtr -log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid) +log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid) { xl_heap_cleanup_info xlrec; XLogRecPtr recptr;- xlrec.node = rnode; + xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel); + xlrec.node = rel->rd_node; xlrec.latestRemovedXid = latestRemovedXid;XLogBeginInsert();
@@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
It'd probably be a good idea to add a comment to
RelationIsUsedAsCatalogTable() that it better never invoke anything
performing catalog accesses. Otherwise there's quite the danger with
recursion (some operation doing RelationIsAccessibleInLogicalDecoding(),
that then accessing the catalog, which in turn could again need to
perform said operation, loop).
Added comments in RelationIsUsedAsCatalogTable() as well as
RelationIsAccessibleInLogicalDecoding() :
* RelationIsAccessibleInLogicalDecoding
* True if we need to log enough information to have access via
* decoding snapshot.
* This definition should not invoke anything that performs catalog
* access. Otherwise, e.g. logging a WAL entry for catalog relation may
* invoke this function, which will in turn do catalog access, which may
* in turn cause another similar WAL entry to be logged, leading to
* infinite recursion.
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;We should document that it is safe to do catalog acceses here, because
spgist is never used to back catalogs. Otherwise there would be an a
endless recursion danger here.
Comments added.
Did you check how hard it we to just pass down the heap relation?
It does look hard. Check my comments in an earlier reply, that I have
pasted below :
This one seems harder, but I'm not actually sure why we make it so
hard. It seems like we just ought to add the table to IndexVacuumInfo.
This means we have to add heapRel assignment wherever we initialize
IndexVacuumInfo structure, namely in lazy_vacuum_index(),
lazy_cleanup_index(), validate_index(), analyze_rel(), and make sure
these functions have a heap rel handle. Do you think we should do this
as part of this patch ?
/* + * Get the wal_level from the control file. + */ +WalLevel +GetActiveWalLevel(void) +{ + return ControlFile->wal_level; +}What does "Active" mean here? I assume it's supposed to indicate that it
could be different than what's configured in postgresql.conf, for a
replica? If so, that should be mentioned.
Done. Here are the new comments :
* Get the wal_level from the control file. For a standby, this value should be
* considered as its active wal_level, because it may be different from what
* was originally configured on standby.
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9843,6 +9852,19 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));+ /* + * Drop logical slots if we are in hot standby and master does not have + * logical data.nitpick: s/master/the primary/ (mostly adding the "the", but I
personally also prefer primary over master)s/logical data/a WAL level sufficient for logical decoding/
Don't bother to search for the slots if standby is + * running with wal_level lower than logical, because in that case, + * we would have either disallowed creation of logical slots or dropped + * existing ones.s/Don't bother/No need/
s/slots/potentially conflicting logically slots/
Done.
+ if (InRecovery && InHotStandby && + xlrec.wal_level < WAL_LEVEL_LOGICAL && + wal_level >= WAL_LEVEL_LOGICAL) + ResolveRecoveryConflictWithLogicalSlots(InvalidOid, InvalidTransactionId, + gettext_noop("Logical decoding on standby requires wal_level >= logical on master."));diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 151c3ef..c1bd028 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) * can restart from there. */ break; + case XLOG_PARAMETER_CHANGE: + { + xl_parameter_change *xlrec = + (xl_parameter_change *) XLogRecGetData(buf->record); + /* Cannot proceed if master itself does not have logical data */This needs an explanation as to how this is reachable...
Done. Here are the comments :
* If wal_level on primary is reduced to less than logical, then we
* want to prevent existing logical slots from being used.
* Existing logical slot on standby gets dropped when this WAL
* record is replayed; and further, slot creation fails when the
* wal level is not sufficient; but all these operations are not
* synchronized, so a logical slot may creep in while the wal_level
* is being reduced. Hence this extra check.
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + break;Hm, this strikes me as a not quite good enough error message (same in
other copies of the message). Perhaps something roughly like "could not
continue with logical decoding, the primary's wal level is now too low
(%u)"?
Haven't changed this. There is another reply from Robert. I think what
you want to emphasize is that we can't *continue*. I am not sure why
user can't infer that the "logical decoding could not continue" when
we say "logical decoding requires wal_level >= ...."
if (RecoveryInProgress()) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("logical decoding cannot be used while in recovery"))); + { + /* + * This check may have race conditions, but whenever + * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we + * verify that there are no existing logical replication slots. And to + * avoid races around creating a new slot, + * CheckLogicalDecodingRequirements() is called once before creating + * the slot, and once when logical decoding is initially starting up. + */ + if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); + } }/*
@@ -241,6 +240,8 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;+ CheckLogicalDecodingRequirements(); +This should reference the above explanation.
Done.
/* + * Permanently drop a conflicting replication slot. If it's already active by + * another backend, send it a recovery conflict signal, and then try again. + */ +static void +ReplicationSlotDropConflicting(ReplicationSlot *slot)+void +ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, + char *conflict_reason) +{ + /* + * Build the conflict_str which will look like : + * "Slot conflicted with xid horizon which was being increased + * to 9012 (slot xmin: 1234, slot catalog_xmin: 5678)." + */ + initStringInfo(&conflict_xmins); + if (TransactionIdIsValid(slot_xmin) && + TransactionIdPrecedesOrEquals(slot_xmin, xid)) + { + appendStringInfo(&conflict_xmins, "slot xmin: %d", slot_xmin); + } + if (TransactionIdIsValid(slot_catalog_xmin) && + TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) + appendStringInfo(&conflict_xmins, "%sslot catalog_xmin: %d", + conflict_xmins.len > 0 ? ", " : "", + slot_catalog_xmin); + + if (conflict_xmins.len > 0) + { + initStringInfo(&conflict_str); + appendStringInfo(&conflict_str, "%s %d (%s).", + conflict_sentence, xid, conflict_xmins.data); + found_conflict = true; + conflict_reason = conflict_str.data; + } + }I think this is going to be a nightmare for translators, no?
For translators, I think the .po files will have the required text,
because I have used gettext_noop() for both conflict_sentence and the
passed in conflict_reason parameter. And the "dropped conflicting
slot." is passed to ereport() as usual. The rest portion of errdetail
is not language specific. E.g. "slot" remains "slot".
I'm not clear as to why any of this is needed?
The conflict can happen for either xmin or catalog_xmin or both, right
? The purpose of the above is to show only conflicting xmin out of the
two.
+ /* ReplicationSlotDropPtr() would acquire the lock below */ + LWLockRelease(ReplicationSlotControlLock);"would acquire"? I think it *does* acquire, right?
Yes, Changed to "will".
@@ -2879,6 +2882,25 @@ RecoveryConflictInterrupt(ProcSignalReason reason) case PROCSIG_RECOVERY_CONFLICT_LOCK: case PROCSIG_RECOVERY_CONFLICT_TABLESPACE: case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT: + case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT: + /* + * For conflicts that require a logical slot to be dropped, the + * requirement is for the signal receiver to release the slot, + * so that it could be dropped by the signal sender. So for + * normal backends, the transaction should be aborted, just + * like for other recovery conflicts. But if it's walsender on + * standby, then it has to be killed so as to release an + * acquired logical slot. + */ + if (am_cascading_walsender && + reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT && + MyReplicationSlot && SlotIsLogical(MyReplicationSlot)) + { + RecoveryConflictPending = true; + QueryCancelPending = true; + InterruptPending = true; + break; + }Huh, I'm not following as to why that's needed for walsenders?
For normal backends, we ignore this signal if we aren't in a
transaction (block). But for walsender, there is no transaction, but
we cannot ignore the signal. This is because walsender can keep a
logical slot acquired when it was spawned by "pg_recvlogical --start".
So we can't ignore the signal. So the only way that we can make it
release the acquired slot is to kill it.
@@ -1499,6 +1499,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS) dbentry->n_conflict_tablespace + dbentry->n_conflict_lock + dbentry->n_conflict_snapshot + + dbentry->n_conflict_logicalslot + dbentry->n_conflict_bufferpin + dbentry->n_conflict_startup_deadlock);I think this probably needs adjustments in a few more places,
e.g. monitoring.sgml...
Oops, yeah, to search for similar additions, I had looked for
"conflict_snapshot" using cscope. I should have done the same using
"git grep".
Done now.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logical-decoding-on-standby_v12.patchapplication/octet-stream; name=logical-decoding-on-standby_v12.patchDownload
From 3dbe81d332f7145fd356957f9b4609e8d2e97b24 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amit.khandekar@enterprisedb.com>
Date: Wed, 10 Jul 2019 16:55:19 +0530
Subject: [PATCH] Logical decoding on standby - v12
Author : Andres Freund.
Besides the above main changes, patch includes following :
1. Handle slot conflict recovery by dropping the conflicting slots.
-Amit Khandekar.
2. test/recovery/t/016_logical_decoding_on_replica.pl added.
Original author : Craig Ringer. few changes/additions from Amit Khandekar.
3. Handle slot conflicts when master wal_level becomes less than logical.
Changes in v6 patch :
While creating the slot, lastReplayedEndRecPtr is used to set the
restart_lsn, but its position is later adjusted in
DecodingContextFindStartpoint() in case it does not point to a
valid record location. This can happen because replay pointer
points to 1 + end of last record replayed, which means it can
coincide with first byte of a new WAL block, i.e. inside block
header.
Also, modified the test to handle the requirement that the
logical slot creation on standby requires a checkpoint
(or any other transaction commit) to be given from master. For
that, in src/test/perl/PostgresNode.pm, added a new function
create_logical_slot_on_standby() which does the reqiured steps.
Changes in v7 patch :
Merge the two conflict messages for xmin and catalog_xmin into
a single one.
Changes in v8 :
Fix incorrect flush ptr on standby (reported by Tushar Ahuja).
In XLogSendLogical(), GetFlushRecPtr() was used to get the flushed
point. On standby, GetFlushRecPtr() does not give a valid value, so it
was wrongly determined that the sent record is beyond flush point, as
a result of which, WalSndCaughtUp was set to true, causing
WalSndLoop() to sleep for some duration after every record.
This was reported by Tushar Ahuja, where pg_recvlogical seems like it
is hanging when there are loads of insert.
Fix: Use GetStandbyFlushRecPtr() if am_cascading_walsender
Changes in v9 :
While dropping a conflicting logical slot, if a backend has acquired it, send
it a conflict recovery signal. Check new function ReplicationSlotDropConflicting().
Also, miscellaneous review comments addressed, but not all of them yet.
Changes in v10 :
Adjust restart_lsn if it's a Replay Pointer.
This was earlier done in DecodingContextFindStartpoint() but now it
is done in in ReplicationSlotReserveWal(), when restart_lsn is initialized.
Changes in v11 :
Added some test scenarios to test drop-slot conflicts. Organized the
test file a bit.
Also improved the conflict error message.
Changes in v12 :
Review comments addressed.
---
doc/src/sgml/monitoring.sgml | 6 +
src/backend/access/gist/gist.c | 2 +-
src/backend/access/gist/gistbuild.c | 2 +-
src/backend/access/gist/gistutil.c | 4 +-
src/backend/access/gist/gistxlog.c | 9 +-
src/backend/access/hash/hash_xlog.c | 3 +-
src/backend/access/hash/hashinsert.c | 2 +
src/backend/access/heap/heapam.c | 23 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtpage.c | 4 +
src/backend/access/nbtree/nbtxlog.c | 4 +-
src/backend/access/spgist/spgvacuum.c | 8 +
src/backend/access/spgist/spgxlog.c | 1 +
src/backend/access/transam/xlog.c | 25 ++
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/pgstat.c | 4 +
src/backend/replication/logical/decode.c | 22 +-
src/backend/replication/logical/logical.c | 37 +-
src/backend/replication/slot.c | 233 +++++++++++-
src/backend/replication/walsender.c | 8 +-
src/backend/storage/ipc/procarray.c | 4 +
src/backend/storage/ipc/procsignal.c | 3 +
src/backend/storage/ipc/standby.c | 7 +-
src/backend/tcop/postgres.c | 23 +-
src/backend/utils/adt/pgstatfuncs.c | 16 +
src/backend/utils/cache/lsyscache.c | 16 +
src/include/access/gist_private.h | 6 +-
src/include/access/gistxlog.h | 3 +-
src/include/access/hash_xlog.h | 1 +
src/include/access/heapam_xlog.h | 8 +-
src/include/access/nbtxlog.h | 2 +
src/include/access/spgxlog.h | 1 +
src/include/access/xlog.h | 1 +
src/include/catalog/pg_proc.dat | 5 +
src/include/pgstat.h | 1 +
src/include/replication/slot.h | 2 +
src/include/storage/procsignal.h | 1 +
src/include/storage/standby.h | 2 +-
src/include/utils/lsyscache.h | 1 +
src/include/utils/rel.h | 9 +
src/test/perl/PostgresNode.pm | 27 ++
.../recovery/t/018_logical_decoding_on_replica.pl | 420 +++++++++++++++++++++
src/test/regress/expected/rules.out | 1 +
44 files changed, 896 insertions(+), 66 deletions(-)
create mode 100644 src/test/recovery/t/018_logical_decoding_on_replica.pl
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..42bfe82 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2678,6 +2678,12 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
old snapshots</entry>
</row>
<row>
+ <entry><structfield>confl_logicalslot</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of queries in this database that have been canceled due to
+ logical slots</entry>
+ </row>
+ <row>
<entry><structfield>confl_bufferpin</structfield></entry>
<entry><type>bigint</type></entry>
<entry>Number of queries in this database that have been canceled due to
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 470b121..af1bd13 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -339,7 +339,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
for (; ptr; ptr = ptr->next)
{
/* Allocate new page */
- ptr->buffer = gistNewBuffer(rel);
+ ptr->buffer = gistNewBuffer(heapRel, rel);
GISTInitBuffer(ptr->buffer, (is_leaf) ? F_LEAF : 0);
ptr->page = BufferGetPage(ptr->buffer);
ptr->block.blkno = BufferGetBlockNumber(ptr->buffer);
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index ecef0ff..b5f59a1 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -171,7 +171,7 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
buildstate.giststate->tempCxt = createTempGistContext();
/* initialize the root page */
- buffer = gistNewBuffer(index);
+ buffer = gistNewBuffer(heap, index);
Assert(BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO);
page = BufferGetPage(buffer);
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 49df056..1fcc7cb 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -807,7 +807,7 @@ gistcheckpage(Relation rel, Buffer buf)
* Caller is responsible for initializing the page by calling GISTInitBuffer
*/
Buffer
-gistNewBuffer(Relation r)
+gistNewBuffer(Relation heapRel, Relation r)
{
Buffer buffer;
bool needLock;
@@ -851,7 +851,7 @@ gistNewBuffer(Relation r)
* page's deleteXid.
*/
if (XLogStandbyInfoActive() && RelationNeedsWAL(r))
- gistXLogPageReuse(r, blkno, GistPageGetDeleteXid(page));
+ gistXLogPageReuse(heapRel, r, blkno, GistPageGetDeleteXid(page));
return buffer;
}
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 503db34..1f40f98 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -195,7 +195,8 @@ gistRedoDeleteRecord(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
@@ -397,7 +398,7 @@ gistRedoPageReuse(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
- xlrec->node);
+ xlrec->onCatalogTable, xlrec->node);
}
}
@@ -578,7 +579,8 @@ gistXLogPageDelete(Buffer buffer, TransactionId xid,
* Write XLOG record about reuse of a deleted page.
*/
void
-gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXid)
+gistXLogPageReuse(Relation heapRel, Relation rel,
+ BlockNumber blkno, TransactionId latestRemovedXid)
{
gistxlogPageReuse xlrec_reuse;
@@ -589,6 +591,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(heapRel);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d7b7098..00c3e0f 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1002,7 +1002,8 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
RelFileNode rnode;
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xldata->latestRemovedXid,
+ xldata->onCatalogTable, rnode);
}
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true, &buffer);
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 5321762..e28465a 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -17,6 +17,7 @@
#include "access/hash.h"
#include "access/hash_xlog.h"
+#include "catalog/catalog.h"
#include "miscadmin.h"
#include "utils/rel.h"
#include "storage/lwlock.h"
@@ -398,6 +399,7 @@ _hash_vacuum_one_page(Relation rel, Relation hrel, Buffer metabuf, Buffer buf)
xl_hash_vacuum_one_page xlrec;
XLogRecPtr recptr;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(hrel);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.ntuples = ndeletable;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d768b9b..10b7857 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
* see comments for vacuum_log_cleanup_info().
*/
XLogRecPtr
-log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
+log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
{
xl_heap_cleanup_info xlrec;
XLogRecPtr recptr;
- xlrec.node = rnode;
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
+ xlrec.node = rel->rd_node;
xlrec.latestRemovedXid = latestRemovedXid;
XLogBeginInsert();
@@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer,
/* Caller should not call me on a non-WAL-logged relation */
Assert(RelationNeedsWAL(reln));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
xlrec.ndead = ndead;
@@ -7240,6 +7242,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
/* nor when there are no tuples to freeze */
Assert(ntuples > 0);
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
xlrec.cutoff_xid = cutoff_xid;
xlrec.ntuples = ntuples;
@@ -7270,7 +7273,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* heap_buffer, if necessary.
*/
XLogRecPtr
-log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
TransactionId cutoff_xid, uint8 vmflags)
{
xl_heap_visible xlrec;
@@ -7280,6 +7283,7 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
+ xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
xlrec.cutoff_xid = cutoff_xid;
xlrec.flags = vmflags;
XLogBeginInsert();
@@ -7700,7 +7704,8 @@ heap_xlog_cleanup_info(XLogReaderState *record)
xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) XLogRecGetData(record);
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, xlrec->node);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, xlrec->node);
/*
* Actual operation is a no-op. Record type exists to provide a means for
@@ -7736,7 +7741,8 @@ heap_xlog_clean(XLogReaderState *record)
* latestRemovedXid is invalid, skip conflict processing.
*/
if (InHotStandby && TransactionIdIsValid(xlrec->latestRemovedXid))
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
/*
* If we have a full-page image, restore it (using a cleanup lock) and
@@ -7832,7 +7838,9 @@ heap_xlog_visible(XLogReaderState *record)
* rather than killing the transaction outright.
*/
if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->cutoff_xid,
+ xlrec->onCatalogTable,
+ rnode);
/*
* Read the heap page, if it still exists. If the heap file has dropped or
@@ -7969,7 +7977,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
TransactionIdRetreat(latestRemovedXid);
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..bf34d3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -473,7 +473,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* No need to write the record at all unless it contains a valid value
*/
if (TransactionIdIsValid(vacrelstats->latestRemovedXid))
- (void) log_heap_cleanup_info(rel->rd_node, vacrelstats->latestRemovedXid);
+ (void) log_heap_cleanup_info(rel, vacrelstats->latestRemovedXid);
}
/*
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06..c5fdd64 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -281,7 +281,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel->rd_node, heapBuf, vmBuf,
+ recptr = log_heap_visible(rel, heapBuf, vmBuf,
cutoff_xid, flags);
/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 50455db..65c0f50 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "utils/lsyscache.h"
#include "utils/snapmgr.h"
static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
@@ -771,6 +772,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
*/
/* XLOG stuff */
+ xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
xlrec_reuse.node = rel->rd_node;
xlrec_reuse.block = blkno;
xlrec_reuse.latestRemovedXid = latestRemovedXid;
@@ -1138,6 +1140,8 @@ _bt_delitems_delete(Relation rel, Buffer buf,
XLogRecPtr recptr;
xl_btree_delete xlrec_delete;
+ xlrec_delete.onCatalogTable =
+ RelationIsAccessibleInLogicalDecoding(heapRel);
xlrec_delete.latestRemovedXid = latestRemovedXid;
xlrec_delete.nitems = nitems;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 3147ea4..869dfda 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -526,7 +526,8 @@ btree_xlog_delete(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid, rnode);
+ ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable, rnode);
}
/*
@@ -810,6 +811,7 @@ btree_xlog_reuse_page(XLogReaderState *record)
if (InHotStandby)
{
ResolveRecoveryConflictWithSnapshot(xlrec->latestRemovedXid,
+ xlrec->onCatalogTable,
xlrec->node);
}
}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2b1662a..28dee96 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -27,6 +27,7 @@
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
#include "utils/snapmgr.h"
+#include "utils/lsyscache.h"
/* Entry in pending-list of TIDs we need to revisit */
@@ -502,6 +503,13 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
OffsetNumber itemnos[MaxIndexTuplesPerPage];
spgxlogVacuumRedirect xlrec;
+ /*
+ * There is no chance of endless recursion even when we are doing catalog
+ * acceses here; because, spgist is never used for catalogs. Check
+ * comments in RelationIsAccessibleInLogicalDecoding().
+ */
+ xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
+
xlrec.nToPlaceholder = 0;
xlrec.newestRedirectXid = InvalidTransactionId;
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index ebe6ae8..800609c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -881,6 +881,7 @@ spgRedoVacuumRedirect(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &node, NULL, NULL);
ResolveRecoveryConflictWithSnapshot(xldata->newestRedirectXid,
+ xldata->onCatalogTable,
node);
}
}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b6c9353..2f60967 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4927,6 +4927,17 @@ LocalProcessControlFile(bool reset)
}
/*
+ * Get the wal_level from the control file. For a standby, this value should be
+ * considered as its active wal_level, because it may be different from what
+ * was originally configured on standby.
+ */
+WalLevel
+GetActiveWalLevel(void)
+{
+ return ControlFile->wal_level;
+}
+
+/*
* Initialization of shared memory for XLOG
*/
Size
@@ -9856,6 +9867,20 @@ xlog_redo(XLogReaderState *record)
/* Update our copy of the parameters in pg_control */
memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
+ /*
+ * Drop logical slots if we are in hot standby and the primary does not
+ * have a WAL level sufficient for logical decoding. No need to search
+ * for potentially conflicting logically slots if standby is running
+ * with wal_level lower than logical, because in that case, we would
+ * have either disallowed creation of logical slots or dropped existing
+ * ones.
+ */
+ if (InRecovery && InHotStandby &&
+ xlrec.wal_level < WAL_LEVEL_LOGICAL &&
+ wal_level >= WAL_LEVEL_LOGICAL)
+ ResolveRecoveryConflictWithLogicalSlots(InvalidOid, InvalidTransactionId,
+ gettext_noop("Logical decoding on standby requires wal_level >= logical on master."));
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
ControlFile->MaxConnections = xlrec.MaxConnections;
ControlFile->max_worker_processes = xlrec.max_worker_processes;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..f3fad98 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -893,6 +893,7 @@ CREATE VIEW pg_stat_database_conflicts AS
pg_stat_get_db_conflict_tablespace(D.oid) AS confl_tablespace,
pg_stat_get_db_conflict_lock(D.oid) AS confl_lock,
pg_stat_get_db_conflict_snapshot(D.oid) AS confl_snapshot,
+ pg_stat_get_db_conflict_logicalslot(D.oid) AS confl_logicalslot,
pg_stat_get_db_conflict_bufferpin(D.oid) AS confl_bufferpin,
pg_stat_get_db_conflict_startup_deadlock(D.oid) AS confl_deadlock
FROM pg_database D;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28..797ea0c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4728,6 +4728,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_conflict_tablespace = 0;
dbentry->n_conflict_lock = 0;
dbentry->n_conflict_snapshot = 0;
+ dbentry->n_conflict_logicalslot = 0;
dbentry->n_conflict_bufferpin = 0;
dbentry->n_conflict_startup_deadlock = 0;
dbentry->n_temp_files = 0;
@@ -6352,6 +6353,9 @@ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len)
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
dbentry->n_conflict_snapshot++;
break;
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ dbentry->n_conflict_logicalslot++;
+ break;
case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
dbentry->n_conflict_bufferpin++;
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 151c3ef..abfa8e4 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -190,11 +190,31 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* can restart from there.
*/
break;
+ case XLOG_PARAMETER_CHANGE:
+ {
+ xl_parameter_change *xlrec =
+ (xl_parameter_change *) XLogRecGetData(buf->record);
+
+ /*
+ * If wal_level on primary is reduced to less than logical, then we
+ * want to prevent existing logical slots from being used.
+ * Existing logical slots on standby get dropped when this WAL
+ * record is replayed; and further, slot creation fails when the
+ * wal level is not sufficient; but all these operations are not
+ * synchronized, so a logical slot may creep in while the wal_level
+ * is being reduced. Hence this extra check.
+ */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ break;
+ }
case XLOG_NOOP:
case XLOG_NEXTOID:
case XLOG_SWITCH:
case XLOG_BACKUP_END:
- case XLOG_PARAMETER_CHANGE:
case XLOG_RESTORE_POINT:
case XLOG_FPW_CHANGE:
case XLOG_FPI_FOR_HINT:
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 9853be6..54d0424 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -94,23 +94,22 @@ CheckLogicalDecodingRequirements(void)
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical decoding requires a database connection")));
- /* ----
- * TODO: We got to change that someday soon...
- *
- * There's basically three things missing to allow this:
- * 1) We need to be able to correctly and quickly identify the timeline a
- * LSN belongs to
- * 2) We need to force hot_standby_feedback to be enabled at all times so
- * the primary cannot remove rows we need.
- * 3) support dropping replication slots referring to a database, in
- * dbase_redo. There can't be any active ones due to HS recovery
- * conflicts, so that should be relatively easy.
- * ----
- */
if (RecoveryInProgress())
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("logical decoding cannot be used while in recovery")));
+ {
+ /*
+ * This check may have race conditions, but whenever
+ * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
+ * verify that there are no existing logical replication slots. And to
+ * avoid races around creating a new slot,
+ * CheckLogicalDecodingRequirements() is called once before creating
+ * the slot, and once when logical decoding is initially starting up.
+ */
+ if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));
+ }
}
/*
@@ -241,6 +240,12 @@ CreateInitDecodingContext(char *plugin,
LogicalDecodingContext *ctx;
MemoryContext old_context;
+ /*
+ * On standby, this check is also required while creating the slot. Check
+ * the comments in this function.
+ */
+ CheckLogicalDecodingRequirements();
+
/* shorter lines... */
slot = MyReplicationSlot;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 62342a6..76d7277 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -46,6 +46,7 @@
#include "pgstat.h"
#include "replication/slot.h"
#include "storage/fd.h"
+#include "storage/lock.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/builtins.h"
@@ -101,6 +102,7 @@ int max_replication_slots = 0; /* the maximum number of replication
static void ReplicationSlotDropAcquired(void);
static void ReplicationSlotDropPtr(ReplicationSlot *slot);
+static void ReplicationSlotDropConflicting(ReplicationSlot *slot);
/* internal persistency functions */
static void RestoreSlotFromDisk(const char *name);
@@ -638,6 +640,64 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
}
/*
+ * Permanently drop a conflicting replication slot. If it's already active by
+ * another backend, send it a recovery conflict signal, and then try again.
+ */
+static void
+ReplicationSlotDropConflicting(ReplicationSlot *slot)
+{
+ pid_t active_pid;
+ PGPROC *proc;
+ VirtualTransactionId vxid;
+
+ ConditionVariablePrepareToSleep(&slot->active_cv);
+ while (1)
+ {
+ SpinLockAcquire(&slot->mutex);
+ active_pid = slot->active_pid;
+ if (active_pid == 0)
+ active_pid = slot->active_pid = MyProcPid;
+ SpinLockRelease(&slot->mutex);
+
+ /* Drop the acquired slot, unless it is acquired by another backend */
+ if (active_pid == MyProcPid)
+ {
+ elog(DEBUG1, "acquired conflicting slot, now dropping it");
+ ReplicationSlotDropPtr(slot);
+ break;
+ }
+
+ /* Send the other backend, a conflict recovery signal */
+
+ SetInvalidVirtualTransactionId(vxid);
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ proc = BackendPidGetProcWithLock(active_pid);
+ if (proc)
+ GET_VXID_FROM_PGPROC(vxid, *proc);
+ LWLockRelease(ProcArrayLock);
+
+ /*
+ * If coincidently that process finished, some other backend may
+ * acquire the slot again. So start over again.
+ * Note: Even if vxid.localTransactionId is invalid, we need to cancel
+ * that backend, because there is no other way to make it release the
+ * slot. So don't bother to validate vxid.localTransactionId.
+ */
+ if (vxid.backendId == InvalidBackendId)
+ continue;
+
+ elog(DEBUG1, "cancelling pid %d (backendId: %d) for releasing slot",
+ active_pid, vxid.backendId);
+
+ CancelVirtualTransaction(vxid, PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+ ConditionVariableSleep(&slot->active_cv,
+ WAIT_EVENT_REPLICATION_SLOT_DROP);
+ }
+
+ ConditionVariableCancelSleep();
+}
+
+/*
* Serialize the currently acquired slot's state from memory to disk, thereby
* guaranteeing the current state will survive a crash.
*/
@@ -1016,37 +1076,56 @@ ReplicationSlotReserveWal(void)
/*
* For logical slots log a standby snapshot and start logical decoding
* at exactly that position. That allows the slot to start up more
- * quickly.
+ * quickly. But on a standby we cannot do WAL writes, so just use the
+ * replay pointer; effectively, an attempt to create a logical slot on
+ * standby will cause it to wait for an xl_running_xact record to be
+ * logged independently on the primary, so that a snapshot can be built
+ * using the record.
*
- * That's not needed (or indeed helpful) for physical slots as they'll
- * start replay at the last logged checkpoint anyway. Instead return
- * the location of the last redo LSN. While that slightly increases
- * the chance that we have to retry, it's where a base backup has to
- * start replay at.
+ * None of this is needed (or indeed helpful) for physical slots as
+ * they'll start replay at the last logged checkpoint anyway. Instead
+ * return the location of the last redo LSN. While that slightly
+ * increases the chance that we have to retry, it's where a base backup
+ * has to start replay at.
*/
+ if (SlotIsPhysical(slot))
+ restart_lsn = GetRedoRecPtr();
+ else if (RecoveryInProgress())
+ {
+ restart_lsn = GetXLogReplayRecPtr(NULL);
+ /*
+ * Replay pointer may point one past the end of the record. If that
+ * is a XLOG page boundary, it will not be a valid LSN for the
+ * start of a record, so bump it up past the page header.
+ */
+ if (!XRecOffIsValid(restart_lsn))
+ {
+ if (restart_lsn % XLOG_BLCKSZ != 0)
+ elog(ERROR, "invalid replay pointer");
+ /* For the first page of a segment file, it's a long header */
+ if (XLogSegmentOffset(restart_lsn, wal_segment_size) == 0)
+ restart_lsn += SizeOfXLogLongPHD;
+ else
+ restart_lsn += SizeOfXLogShortPHD;
+ }
+ }
+ else
+ restart_lsn = GetXLogInsertRecPtr();
+
+ SpinLockAcquire(&slot->mutex);
+ slot->data.restart_lsn = restart_lsn;
+ SpinLockRelease(&slot->mutex);
+
if (!RecoveryInProgress() && SlotIsLogical(slot))
{
XLogRecPtr flushptr;
- /* start at current insert position */
- restart_lsn = GetXLogInsertRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
-
/* make sure we have enough information to start */
flushptr = LogStandbySnapshot();
/* and make sure it's fsynced to disk */
XLogFlush(flushptr);
}
- else
- {
- restart_lsn = GetRedoRecPtr();
- SpinLockAcquire(&slot->mutex);
- slot->data.restart_lsn = restart_lsn;
- SpinLockRelease(&slot->mutex);
- }
/* prevent WAL removal as fast as possible */
ReplicationSlotsComputeRequiredLSN();
@@ -1065,6 +1144,122 @@ ReplicationSlotReserveWal(void)
}
/*
+ * Resolve recovery conflicts with logical slots.
+ *
+ * When xid is valid, it means that rows older than xid might have been
+ * removed. Therefore we need to drop slots that depend on seeing those rows.
+ * When xid is invalid, drop all logical slots. This is required when the
+ * master wal_level is set back to replica, so existing logical slots need to
+ * be dropped. Also, when xid is invalid, a common 'conflict_reason' is
+ * provided for the error detail; otherwise it is NULL, in which case it is
+ * constructed out of the xid value.
+ */
+void
+ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid,
+ char *conflict_reason)
+{
+ int i;
+ bool found_conflict = false;
+
+ if (max_replication_slots <= 0)
+ return;
+
+restart:
+ if (found_conflict)
+ {
+ CHECK_FOR_INTERRUPTS();
+ found_conflict = false;
+ }
+
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s;
+
+ s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* cannot change while ReplicationSlotCtlLock is held */
+ if (!s->in_use)
+ continue;
+
+ /* We are only dealing with *logical* slot conflicts. */
+ if (!SlotIsLogical(s))
+ continue;
+
+ /* Invalid xid means caller is asking to drop all logical slots */
+ if (!TransactionIdIsValid(xid))
+ found_conflict = true;
+ else
+ {
+ TransactionId slot_xmin;
+ TransactionId slot_catalog_xmin;
+ StringInfoData conflict_str, conflict_xmins;
+ char *conflict_sentence =
+ gettext_noop("Slot conflicted with xid horizon which was being increased to");
+
+ /* not our database, skip */
+ if (s->data.database != InvalidOid && s->data.database != dboid)
+ continue;
+
+ SpinLockAcquire(&s->mutex);
+ slot_xmin = s->data.xmin;
+ slot_catalog_xmin = s->data.catalog_xmin;
+ SpinLockRelease(&s->mutex);
+
+ /*
+ * Build the conflict_str which will look like :
+ * "Slot conflicted with xid horizon which was being increased
+ * to 9012 (slot xmin: 1234, slot catalog_xmin: 5678)."
+ */
+ initStringInfo(&conflict_xmins);
+ if (TransactionIdIsValid(slot_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_xmin, xid))
+ {
+ appendStringInfo(&conflict_xmins, "slot xmin: %d", slot_xmin);
+ }
+ if (TransactionIdIsValid(slot_catalog_xmin) &&
+ TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+ appendStringInfo(&conflict_xmins, "%sslot catalog_xmin: %d",
+ conflict_xmins.len > 0 ? ", " : "",
+ slot_catalog_xmin);
+
+ if (conflict_xmins.len > 0)
+ {
+ initStringInfo(&conflict_str);
+ appendStringInfo(&conflict_str, "%s %d (%s).",
+ conflict_sentence, xid, conflict_xmins.data);
+ found_conflict = true;
+ conflict_reason = conflict_str.data;
+ }
+ }
+
+ if (found_conflict)
+ {
+ NameData slotname;
+
+ SpinLockAcquire(&s->mutex);
+ slotname = s->data.name;
+ SpinLockRelease(&s->mutex);
+
+ /* ReplicationSlotDropConflicting() will acquire the lock below */
+ LWLockRelease(ReplicationSlotControlLock);
+
+ ReplicationSlotDropConflicting(s);
+
+ ereport(LOG,
+ (errmsg("dropped conflicting slot %s", NameStr(slotname)),
+ errdetail("%s", conflict_reason)));
+
+ /* We released the lock above; so re-scan the slots. */
+ goto restart;
+ }
+ }
+
+ LWLockRelease(ReplicationSlotControlLock);
+}
+
+
+/*
* Flush all replication slots to disk.
*
* This needn't actually be part of a checkpoint, but it's a convenient
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e7a59b0..a45098c 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2814,6 +2814,7 @@ XLogSendLogical(void)
{
XLogRecord *record;
char *errm;
+ XLogRecPtr flushPtr;
/*
* Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2830,10 +2831,11 @@ XLogSendLogical(void)
if (errm != NULL)
elog(ERROR, "%s", errm);
+ flushPtr = (am_cascading_walsender ?
+ GetStandbyFlushRecPtr() : GetFlushRecPtr());
+
if (record != NULL)
{
- /* XXX: Note that logical decoding cannot be used while in recovery */
- XLogRecPtr flushPtr = GetFlushRecPtr();
/*
* Note the lack of any call to LagTrackerWrite() which is handled by
@@ -2857,7 +2859,7 @@ XLogSendLogical(void)
* If the record we just wanted read is at or beyond the flushed
* point, then we're caught up.
*/
- if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
+ if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
{
WalSndCaughtUp = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ea02973..09c827b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2670,6 +2670,10 @@ CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
GET_VXID_FROM_PGPROC(procvxid, *proc);
+ /*
+ * Note: vxid.localTransactionId can be invalid, which means the
+ * request is to signal the pid that is not running a transaction.
+ */
if (procvxid.backendId == vxid.backendId &&
procvxid.localTransactionId == vxid.localTransactionId)
{
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7605b2c..645f320 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -286,6 +286,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+ if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT))
+ RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 25b7e31..7cfb6d5 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -23,6 +23,7 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "replication/slot.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
@@ -291,7 +292,8 @@ ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
}
void
-ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode node)
+ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
+ bool onCatalogTable, RelFileNode node)
{
VirtualTransactionId *backends;
@@ -312,6 +314,9 @@ ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid, RelFileNode
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+
+ if (onCatalogTable)
+ ResolveRecoveryConflictWithLogicalSlots(node.dbNode, latestRemovedXid, NULL);
}
void
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1..c23d361 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2393,6 +2393,9 @@ errdetail_recovery_conflict(void)
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
errdetail("User query might have needed to see row versions that must be removed.");
break;
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ errdetail("User was using the logical slot that must be dropped.");
+ break;
case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
errdetail("User transaction caused buffer deadlock with recovery.");
break;
@@ -2879,6 +2882,25 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
case PROCSIG_RECOVERY_CONFLICT_LOCK:
case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+ case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+ /*
+ * For conflicts that require a logical slot to be dropped, the
+ * requirement is for the signal receiver to release the slot,
+ * so that it could be dropped by the signal sender. So for
+ * normal backends, the transaction should be aborted, just
+ * like for other recovery conflicts. But if it's walsender on
+ * standby, then it has to be killed so as to release an
+ * acquired logical slot.
+ */
+ if (am_cascading_walsender &&
+ reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT &&
+ MyReplicationSlot && SlotIsLogical(MyReplicationSlot))
+ {
+ RecoveryConflictPending = true;
+ QueryCancelPending = true;
+ InterruptPending = true;
+ break;
+ }
/*
* If we aren't in a transaction any longer then ignore.
@@ -2920,7 +2942,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
/* Intentional fall through to session cancel */
/* FALLTHROUGH */
-
case PROCSIG_RECOVERY_CONFLICT_DATABASE:
RecoveryConflictPending = true;
ProcDiePending = true;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..547f9ab 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1456,6 +1456,21 @@ pg_stat_get_db_conflict_snapshot(PG_FUNCTION_ARGS)
}
Datum
+pg_stat_get_db_conflict_logicalslot(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_conflict_logicalslot);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
pg_stat_get_db_conflict_bufferpin(PG_FUNCTION_ARGS)
{
Oid dbid = PG_GETARG_OID(0);
@@ -1499,6 +1514,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS)
dbentry->n_conflict_tablespace +
dbentry->n_conflict_lock +
dbentry->n_conflict_snapshot +
+ dbentry->n_conflict_logicalslot +
dbentry->n_conflict_bufferpin +
dbentry->n_conflict_startup_deadlock);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c13c08a..bd35bc1 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -18,7 +18,9 @@
#include "access/hash.h"
#include "access/htup_details.h"
#include "access/nbtree.h"
+#include "access/table.h"
#include "bootstrap/bootstrap.h"
+#include "catalog/catalog.h"
#include "catalog/namespace.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
@@ -1893,6 +1895,20 @@ get_rel_persistence(Oid relid)
return result;
}
+bool
+get_rel_logical_catalog(Oid relid)
+{
+ bool res;
+ Relation rel;
+
+ /* assume previously locked */
+ rel = heap_open(relid, NoLock);
+ res = RelationIsAccessibleInLogicalDecoding(rel);
+ heap_close(rel, NoLock);
+
+ return res;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index f80694b..f772488 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -429,8 +429,8 @@ extern XLogRecPtr gistXLogPageDelete(Buffer buffer,
TransactionId xid, Buffer parentBuffer,
OffsetNumber downlinkOffset);
-extern void gistXLogPageReuse(Relation rel, BlockNumber blkno,
- TransactionId latestRemovedXid);
+extern void gistXLogPageReuse(Relation heapRel, Relation rel,
+ BlockNumber blkno, TransactionId latestRemovedXid);
extern XLogRecPtr gistXLogUpdate(Buffer buffer,
OffsetNumber *todelete, int ntodelete,
@@ -468,7 +468,7 @@ extern bool gistproperty(Oid index_oid, int attno,
extern bool gistfitpage(IndexTuple *itvec, int len);
extern bool gistnospace(Page page, IndexTuple *itvec, int len, OffsetNumber todelete, Size freespace);
extern void gistcheckpage(Relation rel, Buffer buf);
-extern Buffer gistNewBuffer(Relation r);
+extern Buffer gistNewBuffer(Relation heapRel, Relation r);
extern bool gistPageRecyclable(Page page);
extern void gistfillbuffer(Page page, IndexTuple *itup, int len,
OffsetNumber off);
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 969a537..59246c3 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -48,9 +48,9 @@ typedef struct gistxlogPageUpdate
*/
typedef struct gistxlogDelete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 ntodelete; /* number of deleted offsets */
-
/*
* In payload of blk 0 : todelete OffsetNumbers
*/
@@ -96,6 +96,7 @@ typedef struct gistxlogPageDelete
*/
typedef struct gistxlogPageReuse
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/hash_xlog.h b/src/include/access/hash_xlog.h
index 53b682c..fd70b55 100644
--- a/src/include/access/hash_xlog.h
+++ b/src/include/access/hash_xlog.h
@@ -263,6 +263,7 @@ typedef struct xl_hash_init_bitmap_page
*/
typedef struct xl_hash_vacuum_one_page
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int ntuples;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f6cdca8..a1d1f11 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -237,6 +237,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_clean
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
uint16 nredirected;
uint16 ndead;
@@ -252,6 +253,7 @@ typedef struct xl_heap_clean
*/
typedef struct xl_heap_cleanup_info
{
+ bool onCatalogTable;
RelFileNode node;
TransactionId latestRemovedXid;
} xl_heap_cleanup_info;
@@ -332,6 +334,7 @@ typedef struct xl_heap_freeze_tuple
*/
typedef struct xl_heap_freeze_page
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint16 ntuples;
} xl_heap_freeze_page;
@@ -346,6 +349,7 @@ typedef struct xl_heap_freeze_page
*/
typedef struct xl_heap_visible
{
+ bool onCatalogTable;
TransactionId cutoff_xid;
uint8 flags;
} xl_heap_visible;
@@ -395,7 +399,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_cleanup_info(RelFileNode rnode,
+extern XLogRecPtr log_heap_cleanup_info(Relation rel,
TransactionId latestRemovedXid);
extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer,
OffsetNumber *redirected, int nredirected,
@@ -414,7 +418,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
bool *totally_frozen);
extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
xl_heap_freeze_tuple *xlrec_tp);
-extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
Buffer vm_buffer, TransactionId cutoff_xid, uint8 flags);
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 9beccc8..f64a33c 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -126,6 +126,7 @@ typedef struct xl_btree_split
*/
typedef struct xl_btree_delete
{
+ bool onCatalogTable;
TransactionId latestRemovedXid;
int nitems;
@@ -139,6 +140,7 @@ typedef struct xl_btree_delete
*/
typedef struct xl_btree_reuse_page
{
+ bool onCatalogTable;
RelFileNode node;
BlockNumber block;
TransactionId latestRemovedXid;
diff --git a/src/include/access/spgxlog.h b/src/include/access/spgxlog.h
index 073f740..d3dad69 100644
--- a/src/include/access/spgxlog.h
+++ b/src/include/access/spgxlog.h
@@ -237,6 +237,7 @@ typedef struct spgxlogVacuumRoot
typedef struct spgxlogVacuumRedirect
{
+ bool onCatalogTable;
uint16 nToPlaceholder; /* number of redirects to make placeholders */
OffsetNumber firstPlaceholder; /* first placeholder tuple to remove */
TransactionId newestRedirectXid; /* newest XID of removed redirects */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252..72c8d33 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -299,6 +299,7 @@ extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
extern void BootStrapXLOG(void);
extern void LocalProcessControlFile(bool reset);
+extern WalLevel GetActiveWalLevel(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
extern void InitXLOGAccess(void);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 604470c..81bbfcb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5259,6 +5259,11 @@
proname => 'pg_stat_get_db_conflict_snapshot', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_conflict_snapshot' },
+{ oid => '3432',
+ descr => 'statistics: recovery conflicts in database caused by logical replication slot',
+ proname => 'pg_stat_get_db_conflict_logicalslot', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_conflict_logicalslot' },
{ oid => '3068',
descr => 'statistics: recovery conflicts in database caused by shared buffer pin',
proname => 'pg_stat_get_db_conflict_bufferpin', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..4fe8684 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -604,6 +604,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_conflict_tablespace;
PgStat_Counter n_conflict_lock;
PgStat_Counter n_conflict_snapshot;
+ PgStat_Counter n_conflict_logicalslot;
PgStat_Counter n_conflict_bufferpin;
PgStat_Counter n_conflict_startup_deadlock;
PgStat_Counter n_temp_files;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 8fbddea..73b954e 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -205,4 +205,6 @@ extern void CheckPointReplicationSlots(void);
extern void CheckSlotRequirements(void);
+extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, char *reason);
+
#endif /* SLOT_H */
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 05b186a..956d3c2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -39,6 +39,7 @@ typedef enum
PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
PROCSIG_RECOVERY_CONFLICT_LOCK,
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
+ PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index a3f8f82..6dedebc 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -28,7 +28,7 @@ extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
extern void ResolveRecoveryConflictWithSnapshot(TransactionId latestRemovedXid,
- RelFileNode node);
+ bool onCatalogTable, RelFileNode node);
extern void ResolveRecoveryConflictWithTablespace(Oid tsid);
extern void ResolveRecoveryConflictWithDatabase(Oid dbid);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index c8df5bf..579d9ff 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -131,6 +131,7 @@ extern char get_rel_relkind(Oid relid);
extern bool get_rel_relispartition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool get_rel_logical_catalog(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d35b4a5..2243236 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/xlog.h"
+#include "catalog/catalog.h"
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
@@ -309,6 +310,9 @@ typedef struct StdRdOptions
* RelationIsUsedAsCatalogTable
* Returns whether the relation should be treated as a catalog table
* from the pov of logical decoding. Note multiple eval of argument!
+ * This definition should not invoke anything that performs catalog
+ * access; otherwise it may cause infinite recursion. Check the comments
+ * in RelationIsAccessibleInLogicalDecoding() for details.
*/
#define RelationIsUsedAsCatalogTable(relation) \
((relation)->rd_options && \
@@ -566,6 +570,11 @@ typedef struct ViewOptions
* RelationIsAccessibleInLogicalDecoding
* True if we need to log enough information to have access via
* decoding snapshot.
+ * This definition should not invoke anything that performs catalog
+ * access. Otherwise, e.g. logging a WAL entry for catalog relation may
+ * invoke this function, which will in turn do catalog access, which may
+ * in turn cause another similar WAL entry to be logged, leading to
+ * infinite recursion.
*/
#define RelationIsAccessibleInLogicalDecoding(relation) \
(XLogLogicalInfoActive() && \
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 6019f37..719837d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2000,6 +2000,33 @@ sub pg_recvlogical_upto
=pod
+=item $node->create_logical_slot_on_standby(self, master, slot_name, dbname)
+
+Create logical replication slot on given standby
+
+=cut
+
+sub create_logical_slot_on_standby
+{
+ my ($self, $master, $slot_name, $dbname) = @_;
+ my ($stdout, $stderr);
+
+ my $handle;
+
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $self->connstr($dbname), '-P', 'test_decoding', '-S', $slot_name, '--create-slot'], '>', \$stdout, '2>', \$stderr);
+ sleep(1);
+
+ # Slot creation on standby waits for an xl_running_xacts record. So arrange
+ # for it.
+ $master->safe_psql('postgres', 'CHECKPOINT');
+
+ $handle->finish();
+
+ return 0;
+}
+
+=pod
+
=back
=cut
diff --git a/src/test/recovery/t/018_logical_decoding_on_replica.pl b/src/test/recovery/t/018_logical_decoding_on_replica.pl
new file mode 100644
index 0000000..fd77e19
--- /dev/null
+++ b/src/test/recovery/t/018_logical_decoding_on_replica.pl
@@ -0,0 +1,420 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 58;
+use RecursiveCopy;
+use File::Copy;
+use Time::HiRes qw(usleep);
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+my $node_master = get_new_node('master');
+my $node_replica = get_new_node('replica');
+
+# Fetch xmin columns from slot's pg_replication_slots row, after waiting for
+# given boolean condition to be true to ensure we've reached a quiescent state
+sub wait_for_xmins
+{
+ my ($node, $slotname, $check_expr) = @_;
+
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT $check_expr
+ FROM pg_catalog.pg_replication_slots
+ WHERE slot_name = '$slotname';
+ ]) or die "Timed out waiting for slot xmins to advance";
+
+ my $slotinfo = $node->slot($slotname);
+ return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+sub print_phys_xmin
+{
+ my $slot = $node_master->slot('master_physical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+sub print_logical_xmin
+{
+ my $slot = $node_replica->slot('standby_logical');
+ return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+sub create_logical_slots
+{
+ is($node_replica->create_logical_slot_on_standby($node_master, 'dropslot', 'testdb'),
+ 0, 'created dropslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+ is($node_replica->slot('dropslot')->{'slot_type'}, 'logical', 'dropslot on standby created');
+ is($node_replica->create_logical_slot_on_standby($node_master, 'activeslot', 'testdb'),
+ 0, 'created activeslot on testdb')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+ is($node_replica->slot('activeslot')->{'slot_type'}, 'logical', 'activeslot on standby created');
+
+ return 0;
+}
+
+sub make_slot_active
+{
+ # make sure activeslot is in use
+ print "starting pg_recvlogical";
+ $handle = IPC::Run::start(['pg_recvlogical', '-d', $node_replica->connstr('testdb'), '-S', 'activeslot', '-f', '-', '--no-loop', '--start'], '>', \$stdout, '2>', \$stderr);
+
+ while (!$node_replica->slot('activeslot')->{'active_pid'})
+ {
+ usleep(100_000);
+ print "waiting for slot to become active\n";
+ }
+ return 0;
+}
+
+sub check_slots_dropped
+{
+ is($node_replica->slot('dropslot')->{'slot_type'}, '', 'dropslot on standby dropped');
+ is($node_replica->slot('activeslot')->{'slot_type'}, '', 'activeslot on standby dropped');
+
+ # our client should've terminated in response to the walsender error
+ eval {
+ $handle->finish;
+ };
+ $return = $?;
+ cmp_ok($return, "!=", 0, "pg_recvlogical exited non-zero ");
+ if ($return) {
+ like($stderr, qr/conflict with recovery/, 'recvlogical recovery conflict');
+ like($stderr, qr/must be dropped/, 'recvlogical error detail');
+ }
+
+ return 0;
+}
+
+# Initialize master node
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('master_physical');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=master_physical');
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+# Initialize slave node
+$node_replica->init_from_backup(
+ $node_master, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+ q[primary_slot_name = 'master_physical']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_xmins($node_master, 'master_physical',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+# Create new slots on the replica, ignoring the ones on the master completely.
+#
+# This must succeed since we know we have a catalog_xmin reservation. We
+# might've already sent hot standby feedback to advance our physical slot's
+# catalog_xmin but not received the corresponding xlog for the catalog xmin
+# advance, in which case we'll create a slot that isn't usable. The calling
+# application can prevent this by creating a temporary slot on the master to
+# lock in its catalog_xmin. For a truly race-free solution we'd need
+# master-to-standby hot_standby_feedback replies.
+#
+# In this case it won't race because there's no concurrent activity on the
+# master.
+#
+is($node_replica->create_logical_slot_on_standby($node_master, 'standby_logical', 'testdb'),
+ 0, 'logical slot creation on standby succeeded')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+($xmin, $catalog_xmin) = print_logical_xmin();
+is($xmin, '', "logical xmin null");
+isnt($catalog_xmin, '', "logical catalog_xmin not null");
+
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+$node_master->safe_psql('testdb', q[INSERT INTO test_table(blah) values ('itworks')]);
+$node_master->safe_psql('testdb', 'DROP TABLE test_table');
+$node_master->safe_psql('testdb', 'VACUUM');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+($xmin, $catalog_xmin) = print_phys_xmin();
+isnt($xmin, '', "physical xmin not null");
+isnt($catalog_xmin, '', "physical catalog_xmin not null");
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# Should show the inserts even when the table is dropped on master
+($ret, $stdout, $stderr) = $node_replica->psql('testdb', qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($stderr, '', 'stderr is empty');
+is($ret, 0, 'replay from slot succeeded')
+ or BAIL_OUT('cannot continue if slot replay fails');
+is($stdout, q{BEGIN
+table public.test_table: INSERT: id[integer]:1 blah[text]:'itworks'
+COMMIT}, 'replay results match');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+my ($physical_xmin, $physical_catalog_xmin) = print_phys_xmin();
+isnt($physical_xmin, '', "physical xmin not null");
+isnt($physical_catalog_xmin, '', "physical catalog_xmin not null");
+
+my ($logical_xmin, $logical_catalog_xmin) = print_logical_xmin();
+is($logical_xmin, '', "logical xmin null");
+isnt($logical_catalog_xmin, '', "logical catalog_xmin not null");
+
+# Ok, do a pile of tx's and make sure xmin advances.
+# Ideally we'd just hold catalog_xmin, but since hs_feedback currently uses the slot,
+# we hold down xmin.
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_1();]);
+$node_master->safe_psql('testdb', 'CREATE TABLE test_table(id serial primary key, blah text)');
+for my $i (0 .. 2000)
+{
+ $node_master->safe_psql('testdb', qq[INSERT INTO test_table(blah) VALUES ('entry $i')]);
+}
+$node_master->safe_psql('testdb', qq[CREATE TABLE catalog_increase_2();]);
+$node_master->safe_psql('testdb', 'VACUUM');
+
+my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+cmp_ok($new_logical_catalog_xmin, "==", $logical_catalog_xmin,
+ "logical slot catalog_xmin hasn't advanced before get_changes");
+
+($ret, $stdout, $stderr) = $node_replica->psql('testdb',
+ qq[SELECT data FROM pg_logical_slot_get_changes('standby_logical', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-timestamp', '0')]);
+is($ret, 0, 'replay of big series succeeded');
+isnt($stdout, '', 'replayed some rows');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# logical slot catalog_xmin on slave should advance after
+# pg_logical_slot_get_changes
+($new_logical_xmin, $new_logical_catalog_xmin) =
+ wait_for_xmins($node_replica, 'standby_logical',
+ "catalog_xmin::varchar::int > ${logical_catalog_xmin}");
+is($new_logical_xmin, '', "logical xmin null");
+
+# hot standby feedback should advance master's phys catalog_xmin now that the
+# standby's slot doesn't hold it down as far.
+my ($new_physical_xmin, $new_physical_catalog_xmin) =
+ wait_for_xmins($node_master, 'master_physical',
+ "catalog_xmin::varchar::int > ${physical_catalog_xmin}");
+isnt($new_physical_xmin, '', "physical xmin not null");
+cmp_ok($new_physical_catalog_xmin, "<=", $new_logical_catalog_xmin,
+ 'upstream physical slot catalog_xmin not past downstream catalog_xmin with hs_feedback on');
+
+#########################################################
+# Upstream oldestXid retention
+#########################################################
+
+sub test_oldest_xid_retention()
+{
+ # First burn some xids on the master in another DB, so we push the master's
+ # nextXid ahead.
+ foreach my $i (1 .. 100)
+ {
+ $node_master->safe_psql('postgres', 'SELECT txid_current()');
+ }
+
+ # Force vacuum freeze on the master and ensure its oldestXmin doesn't advance
+ # past our needed xmin. The only way we have visibility into that is to force
+ # a checkpoint.
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = true WHERE datname = 'template0'");
+ foreach my $dbname ('template1', 'postgres', 'testdb', 'template0')
+ {
+ $node_master->safe_psql($dbname, 'VACUUM FREEZE');
+ }
+ sleep(1);
+ $node_master->safe_psql('postgres', 'CHECKPOINT');
+ IPC::Run::run(['pg_controldata', $node_master->data_dir()], '>', \$stdout)
+ or die "pg_controldata failed with $?";
+ my @checkpoint = split('\n', $stdout);
+ my ($oldestXid, $nextXid) = ('', '', '');
+ foreach my $line (@checkpoint)
+ {
+ if ($line =~ qr/^Latest checkpoint's NextXID:\s+\d+:(\d+)/)
+ {
+ $nextXid = $1;
+ }
+ if ($line =~ qr/^Latest checkpoint's oldestXID:\s+(\d+)/)
+ {
+ $oldestXid = $1;
+ }
+ }
+ die 'no oldestXID found in checkpoint' unless $oldestXid;
+
+ my ($new_physical_xmin, $new_physical_catalog_xmin) = print_phys_xmin();
+ my ($new_logical_xmin, $new_logical_catalog_xmin) = print_logical_xmin();
+
+ print "upstream oldestXid $oldestXid, nextXid $nextXid, phys slot catalog_xmin $new_physical_catalog_xmin, downstream catalog_xmin $new_logical_catalog_xmin";
+
+ $node_master->safe_psql('postgres', "UPDATE pg_database SET datallowconn = false WHERE datname = 'template0'");
+
+ return ($oldestXid);
+}
+
+my ($oldestXid) = test_oldest_xid_retention();
+
+cmp_ok($oldestXid, "<=", $new_logical_catalog_xmin,
+ 'upstream oldestXid not past downstream catalog_xmin with hs_feedback on');
+
+##################################################
+# Drop slot
+##################################################
+#
+is($node_replica->safe_psql('postgres', 'SHOW hot_standby_feedback'), 'on', 'hs_feedback is on');
+
+# Make sure slots on replicas are droppable, and properly clear the upstream's xmin
+$node_replica->psql('testdb', q[SELECT pg_drop_replication_slot('standby_logical')]);
+
+is($node_replica->slot('standby_logical')->{'slot_type'}, '', 'slot on standby dropped manually');
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. catalog_xmin should become NULL because we dropped
+# the logical slot.
+($xmin, $catalog_xmin) = wait_for_xmins($node_master, 'master_physical',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery conflict: Drop conflicting slots, including in-use slots
+# Scenario 1 : hot_standby_feedback off
+##################################################
+
+create_logical_slots();
+
+# One way to reproduce recovery conflict is to run VACUUM FULL with
+# hot_standby_feedback turned off on slave.
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = off
+]);
+$node_replica->restart;
+# ensure walreceiver feedback off by waiting for expected xmin and
+# catalog_xmin on master. Both should be NULL since hs_feedback is off
+($xmin, $catalog_xmin) = wait_for_xmins($node_master, 'master_physical',
+ "xmin IS NULL AND catalog_xmin IS NULL");
+
+make_slot_active();
+
+# This should trigger the conflict
+$node_master->safe_psql('testdb', 'VACUUM FULL');
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+check_slots_dropped();
+
+# Turn hot_standby_feedback back on
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+
+# ensure walreceiver feedback sent by waiting for expected xmin and
+# catalog_xmin on master. With hot_standby_feedback on, xmin should advance,
+# but catalog_xmin should still remain NULL since there is no logical slot.
+($xmin, $catalog_xmin) = wait_for_xmins($node_master, 'master_physical',
+ "xmin IS NOT NULL AND catalog_xmin IS NULL");
+
+##################################################
+# Recovery conflict: Drop conflicting slots, including in-use slots
+# Scenario 2 : incorrect wal_level at master
+##################################################
+
+create_logical_slots();
+
+make_slot_active();
+
+# Make master wal_level replica. This will trigger slot conflict.
+$node_master->append_conf('postgresql.conf',q[
+wal_level = 'replica'
+]);
+$node_master->restart;
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+check_slots_dropped();
+
+# Restore master wal_level
+$node_master->append_conf('postgresql.conf',q[
+wal_level = 'logical'
+]);
+$node_master->restart;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+##################################################
+# Recovery: drop database drops slots, including active slots.
+##################################################
+
+# Create a couple of slots on the DB to ensure they are dropped when we drop
+# the DB.
+create_logical_slots();
+
+make_slot_active();
+
+# Create a slot on a database that would not be dropped. This slot should not
+# get dropped.
+is($node_replica->create_logical_slot_on_standby($node_master, 'otherslot', 'postgres'),
+ 0, 'created otherslot on postgres')
+ or BAIL_OUT('cannot continue if slot creation fails, see logs');
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical', 'otherslot on standby created');
+
+# dropdb on the master to verify slots are dropped on standby
+$node_master->safe_psql('postgres', q[DROP DATABASE testdb]);
+
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+is($node_replica->safe_psql('postgres',
+ q[SELECT EXISTS(SELECT 1 FROM pg_database WHERE datname = 'testdb')]), 'f',
+ 'database dropped on standby');
+
+check_slots_dropped();
+
+is($node_replica->slot('otherslot')->{'slot_type'}, 'logical',
+ 'otherslot on standby not dropped');
+
+# Cleanup : manually drop the slot that was not dropped.
+$node_replica->psql('postgres', q[SELECT pg_drop_replication_slot('otherslot')]);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..1a049a4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1838,6 +1838,7 @@ pg_stat_database_conflicts| SELECT d.oid AS datid,
pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace,
pg_stat_get_db_conflict_lock(d.oid) AS confl_lock,
pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot,
+ pg_stat_get_db_conflict_logicalslot(d.oid) AS confl_logicalslot,
pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin,
pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock
FROM pg_database d;
--
2.1.4
On 07/10/2019 05:12 PM, Amit Khandekar wrote:
All right. Will do that in the next patch set. For now, I have quickly
done the below changes in a single patch again (attached), in order to
get early comments if any.
Thanks Amit for your patch. i am able to see 1 issues on Standby server
- (where logical replication slot created ) ,
a)size of pg_wal folder is NOT decreasing even after firing
get_changes function
b)pg_wal files are not recycling and every time it is creating new
files after firing get_changes function
Here are the detailed steps -
create a directory with the name 'archive_dir' under /tmp (mkdir
/tmp/archive_dir)
*SR setup -*
*Master*
.)Perform initdb (./initdb -D master --wal-segsize=2)
.)Open postgresql.conf file and add these below parameters at the end
of file
wal_level='logical'
min_wal_size=4MB
max_wal_size=4MB
hot_standby_feedback = on
archive_mode=on
archive_command='cp %p /tmp/archive_dir/%f'
.)Start the server ( /pg_ctl -D master/ start -l logsM -c )
.)Connect to psql , create physical slot
->SELECT * FROM
pg_create_physical_replication_slot('decoding_standby');
*Standby - *
.)Perform pg_basebackup ( ./pg_basebackup -D standby/
--slot=decoding_standby -R -v)
.)Open postgresql.conf file of standby and add these 2 parameters - at
the end of file
port=5555
primary_slot_name = 'decoding_standby'
.)Start the Standby server ( ./pg_ctl -D standby/ start -l logsS -c )
.)Connect to psql terminal and create logical replication slot
->SELECT * from pg_create_logical_replication_slot('standby',
'test_decoding');
*MISC steps**-
*.)Connect to master and create table/insert rows ( create table t(n
int); insert into t (values (1);)
.)Connect to standby and fire get_changes function ( select * from
pg_logical_slot_get_changes('standby',null,null); )
.)Run pgbench ( ./pgbench -i -s 10 postgres)
.)Check the pg_wal directory size of STANDBY
[centos@mail-arts bin]$ du -sch standby/pg_wal/
127M standby/pg_wal/
127M total
[centos@mail-arts bin]$
.)Connect to standby and fire get_changes function ( select * from
pg_logical_slot_get_changes('standby',null,null); )
.)Check the pg_wal directory size of STANDBY
[centos@mail-arts bin]$ du -sch standby/pg_wal/
127M standby/pg_wal/
127M total
[centos@mail-arts bin]$
.)Restart both master and standby ( ./pg_ctl -D master restart -l logsM
-c) and (./pg_ctl -D standby restart -l logsS -c )
.)Check the pg_wal directory size of STANDBY
[centos@mail-arts bin]$ du -sch standby/pg_wal/
127M standby/pg_wal/
127M total
[centos@mail-arts bin]$
and if we see the pg_wal files ,it is growing rampant and not reusing.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
Hi,
On 2019-07-12 14:53:21 +0530, tushar wrote:
On 07/10/2019 05:12 PM, Amit Khandekar wrote:
All right. Will do that in the next patch set. For now, I have quickly
done the below changes in a single patch again (attached), in order to
get early comments if any.Thanks Amit for your patch. i am able to see 1 issues� on Standby server -
(where� logical replication slot created ) ,
a)size of� pg_wal folder� is NOT decreasing even after firing get_changes
function
Even after calling pg_logical_slot_get_changes() multiple times? What
does
SELECT * FROM pg_replication_slots; before and after multiple calls return?
Does manually forcing a checkpoint with CHECKPOINT; first on the primary
and then the standby "fix" the issue?
b)pg_wal files are not recycling� and every time it is creating new files
after firing get_changes function
I'm not sure what you mean by this. Are you saying that
pg_logical_slot_get_changes() causes WAL to be written?
Greetings,
Andres Freund
On Tue, 16 Jul 2019 at 22:56, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-07-12 14:53:21 +0530, tushar wrote:
On 07/10/2019 05:12 PM, Amit Khandekar wrote:
All right. Will do that in the next patch set. For now, I have quickly
done the below changes in a single patch again (attached), in order to
get early comments if any.Thanks Amit for your patch. i am able to see 1 issues on Standby server -
(where logical replication slot created ) ,
a)size of pg_wal folder is NOT decreasing even after firing get_changes
functionEven after calling pg_logical_slot_get_changes() multiple times? What
does
SELECT * FROM pg_replication_slots; before and after multiple calls return?Does manually forcing a checkpoint with CHECKPOINT; first on the primary
and then the standby "fix" the issue?
I independently tried to reproduce this issue on my machine yesterday.
I observed that :
sometimes, the files get cleaned up after two or more
pg_logical_slot_get_changes().
Sometimes, I have to restart the server to see the pg_wal files cleaned up.
This happens more or less the same even for logical slot on *primary*.
Will investigate further with Tushar.
b)pg_wal files are not recycling and every time it is creating new files
after firing get_changes functionI'm not sure what you mean by this. Are you saying that
pg_logical_slot_get_changes() causes WAL to be written?Greetings,
Andres Freund
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On 07/16/2019 10:56 PM, Andres Freund wrote:
Even after calling pg_logical_slot_get_changes() multiple times? What
does
SELECT * FROM pg_replication_slots; before and after multiple calls return?Does manually forcing a checkpoint with CHECKPOINT; first on the primary
and then the standby "fix" the issue?
Yes,eventually it gets clean up -after firing multiple times get_changes
function or checkpoint or even both.
This same behavior we are able to see on MASTER -with or without patch.
but is this an old (existing) issue ?
b)pg_wal files are not recycling and every time it is creating new files
after firing get_changes functionI'm not sure what you mean by this. Are you saying that
pg_logical_slot_get_changes() causes WAL to be written?
No, when i said - created new WAL files , i meant -after each pg_bench
run NOT after executing get_changes.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Wed, 10 Jul 2019 at 17:12, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Wed, 10 Jul 2019 at 08:44, Andres Freund <andres@anarazel.de> wrote:
Hi,
Thanks for the new version! Looks like we're making progress towards
something committable here.I think it'd be good to split the patch into a few pieces. I'd maybe do
that like:
1) WAL format changes (plus required other changes)
2) Recovery conflicts with slots
3) logical decoding on standby
4) testsAll right. Will do that in the next patch set. For now, I have quickly
done the below changes in a single patch again (attached), in order to
get early comments if any.
Attached are the split patches. Included is an additional patch that
has doc changes. Here is what I have added in the docs. Pasting it
here so that all can easily spot how it is supposed to behave, and to
confirm that we are all on the same page :
"A logical replication slot can also be created on a hot standby. To
prevent VACUUM from removing required rows from the system catalogs,
hot_standby_feedback should be set on the standby. In spite of that,
if any required rows get removed on standby, the slot gets dropped.
Existing logical slots on standby also get dropped if wal_level on
primary is reduced to less than 'logical'.
For a logical slot to be created, it builds a historic snapshot, for
which information of all the currently running transactions is
essential. On primary, this information is available, but on standby,
this information has to be obtained from primary. So, slot creation
may wait for some activity to happen on the primary. If the primary is
idle, creating a logical slot on standby may take a noticeable time."
Attachments:
On 2019-Jul-19, Amit Khandekar wrote:
Attached are the split patches. Included is an additional patch that
has doc changes. Here is what I have added in the docs. Pasting it
here so that all can easily spot how it is supposed to behave, and to
confirm that we are all on the same page :
... Apparently, this patch was not added to the commitfest for some
reason; and another patch that *is* in the commitfest has been said to
depend on this one (Petr's https://commitfest.postgresql.org/24/1961/
which hasn't been updated in quite a while and has received no feedback
at all, not even from the listed reviewer Shaun Thomas). To make
matters worse, Amit's patchset no longer applies.
What I would like to do is add a link to this thread to CF's 1961 entry
and then update all these patches, in order to get things moving.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 3 Sep 2019 at 23:10, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Jul-19, Amit Khandekar wrote:
Attached are the split patches. Included is an additional patch that
has doc changes. Here is what I have added in the docs. Pasting it
here so that all can easily spot how it is supposed to behave, and to
confirm that we are all on the same page :... Apparently, this patch was not added to the commitfest for some
reason; and another patch that *is* in the commitfest has been said to
depend on this one (Petr's https://commitfest.postgresql.org/24/1961/
which hasn't been updated in quite a while and has received no feedback
at all, not even from the listed reviewer Shaun Thomas). To make
matters worse, Amit's patchset no longer applies.What I would like to do is add a link to this thread to CF's 1961 entry
and then update all these patches, in order to get things moving.
Hi Alvaro,
Thanks for notifying about this. Will work this week on rebasing this
patchset and putting it into the 2019-11 commit fest.
On Mon, 9 Sep 2019 at 16:06, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On Tue, 3 Sep 2019 at 23:10, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Jul-19, Amit Khandekar wrote:
Attached are the split patches. Included is an additional patch that
has doc changes. Here is what I have added in the docs. Pasting it
here so that all can easily spot how it is supposed to behave, and to
confirm that we are all on the same page :... Apparently, this patch was not added to the commitfest for some
reason; and another patch that *is* in the commitfest has been said to
depend on this one (Petr's https://commitfest.postgresql.org/24/1961/
which hasn't been updated in quite a while and has received no feedback
at all, not even from the listed reviewer Shaun Thomas). To make
matters worse, Amit's patchset no longer applies.What I would like to do is add a link to this thread to CF's 1961 entry
and then update all these patches, in order to get things moving.Hi Alvaro,
Thanks for notifying about this. Will work this week on rebasing this
patchset and putting it into the 2019-11 commit fest.
Rebased patch set attached.
Added in the Nov commitfest : https://commitfest.postgresql.org/25/2283/
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
On Fri, Sep 13, 2019 at 7:20 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Thanks for notifying about this. Will work this week on rebasing this
patchset and putting it into the 2019-11 commit fest.Rebased patch set attached.
Added in the Nov commitfest : https://commitfest.postgresql.org/25/2283/
I took a bit of a look at
0004-New-TAP-test-for-logical-decoding-on-standby.patch and saw some
things I don't like in terms of general code quality:
- Not many comments. I think each set of tests should have a block
comment at the top explaining clearly what it's trying to test.
- print_phys_xmin and print_logical_xmin don't print anything.
- They are also identical to each other except that they each operate
on a different hard-coded slot name.
- They are also identical to wait_for_xmins except that they don't wait.
- create_logical_slots creates two slots whose names are hard-coded
using code that is cut-and-pasted.
- The same code is also cut-and-pasted into two other places in the file.
- Why does that cut-and-pasted code use BAIL_OUT(), which aborts the
entire test run, instead of die, which just aborts the current test
file?
- cmp_ok() message in check_slots_dropped() has trailing whitespace.
- make_slot_active() and check_slots_dropped(), at least, use global
variables; is that really necessary?
- In particular, $return is used only in one function and doesn't need
to survive across calls; why is it not a local variable?
- Depending on whether $return ends up true or false, the number of
executed tests will differ; so besides any actual test failures,
you'll get complaints about not executing exactly 58 tests.
- $backup_name only ever has one value, but for some reason the
variable is created at the top of the test file and then initialized
later. Just do my $backup_name = 'b1' near where it's first used, or
ditch the variable and write 'b1' in each of the three places it's
used.
- Some of the calls to wait_for_xmins() save the return values into
local variables but then do nothing with those values before they are
overwritten. Either it's wrong that we're saving them into local
variables, or it's wrong that we're not doing anything with them.
- test_oldest_xid_retention() is called only once; it basically acts
as a wrapper for one group of tests. You could argue against that
approach, but I actually think it's a nice style which makes the code
more self-documenting. However, it's not used consistently; all the
other groups of tests are written directly as toplevel code.
- The code in that function verifies that oldestXid is found in
pg_controldata's output, but does not check the same for NextXID.
- Is there a reason the code in that function prints debugging output?
Seems like a leftover.
- I think it might be an idea to move the tests for recovery
conflict/slot drop to a separate test file, so that we have one file
for the xmin-related testing and another for the recovery conflict
testing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, 18 Sep 2019 at 19:34, Robert Haas <robertmhaas@gmail.com> wrote:
I took a bit of a look at
0004-New-TAP-test-for-logical-decoding-on-standby.patch and saw some
things I don't like in terms of general code quality:- Not many comments. I think each set of tests should have a block
comment at the top explaining clearly what it's trying to test.
Done at initial couple of test groups so that the groups would be
spotted clearly. Please check.
- print_phys_xmin and print_logical_xmin don't print anything.
- They are also identical to each other except that they each operate
on a different hard-coded slot name.
- They are also identical to wait_for_xmins except that they don't wait.
Re-worked this part of the code. Now a single function
get_slot_xmins(slot_name) is used to return the slot's xmins's. It
figures out by the slot name, whether the slot belongs to master or
slave. Also, avoided the hardcoded 'master_physical' and
'standby_logical' names.
Removed 'node' parameter of wait_for_xmins(), since now we can figure
out node name from slot name.
- create_logical_slots creates two slots whose names are hard-coded
using code that is cut-and-pasted.
- The same code is also cut-and-pasted into two other places in the file.
Didn't remove the hardcoding for slot names, because it's not
convenient to return those from create_logical_slots() and use them in
check_slots_dropped(). But I have cut-pasted code in
create_logical_slots() and the other two places in the file. Now I
have did some of that repeated code in create_logical_slots() itself.
- Why does that cut-and-pasted code use BAIL_OUT(), which aborts the
entire test run, instead of die, which just aborts the current test
file?
Oops. Didn't realize that it bails out from the complete test run.
Replaced it with die().
- cmp_ok() message in check_slots_dropped() has trailing whitespace.
Remove them.
- make_slot_active() and check_slots_dropped(), at least, use global
variables; is that really necessary?
I guess you are referring to $handle. Now made make_slot_active()
return this handle using it's return value, and used this to pass to
check_slots_dropped(). Retained node_replica global variable rather
than passing it as function param, because these functions always use
node_replica, and never node_master.
- In particular, $return is used only in one function and doesn't need
to survive across calls; why is it not a local variable?
- Depending on whether $return ends up true or false, the number of
executed tests will differ; so besides any actual test failures,
you'll get complaints about not executing exactly 58 tests.
Right. Made it local.
- $backup_name only ever has one value, but for some reason the
variable is created at the top of the test file and then initialized
later. Just do my $backup_name = 'b1' near where it's first used, or
ditch the variable and write 'b1' in each of the three places it's
used.
Declared $backup_name near it's first usage.
- Some of the calls to wait_for_xmins() save the return values into
local variables but then do nothing with those values before they are
overwritten. Either it's wrong that we're saving them into local
variables, or it's wrong that we're not doing anything with them.
Yeah, at many places, it was redundant to save them into variables, so
removed the function return value assignment part at those places.
- test_oldest_xid_retention() is called only once; it basically acts
as a wrapper for one group of tests. You could argue against that
approach, but I actually think it's a nice style which makes the code
more self-documenting. However, it's not used consistently; all the
other groups of tests are written directly as toplevel code.
Removed the function and kept it's code at top level code. I think the
test group header comments look sufficient for documenting each group
of tests, so that there is no need to make a separate function for
each group.
- The code in that function verifies that oldestXid is found in
pg_controldata's output, but does not check the same for NextXID.
Actually, there is no need to check NextID. We want to check just
oldest_xid. Removed it's usage.
- Is there a reason the code in that function prints debugging output?
Seems like a leftover.
Yeah, right. Removed them.
- I think it might be an idea to move the tests for recovery
conflict/slot drop to a separate test file, so that we have one file
for the xmin-related testing and another for the recovery conflict
testing.
Actually in some of the conflict-recovery testcases, I am still using
wait_for_xmins() so that we could test the xmin values back after we
drop the slots. So xmin-related testing is embedded in these recovery
tests as well. We can move the wait_for_xmins() function to some
common file and then do the split of this file, but then effectively
some of the xmin-testing would go into the recovery-related test file,
which did not sound sensible to me. What do you say ?
Attached patch series has the test changes addressed.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
On Thu, Sep 26, 2019 at 5:14 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Actually in some of the conflict-recovery testcases, I am still using
wait_for_xmins() so that we could test the xmin values back after we
drop the slots. So xmin-related testing is embedded in these recovery
tests as well. We can move the wait_for_xmins() function to some
common file and then do the split of this file, but then effectively
some of the xmin-testing would go into the recovery-related test file,
which did not sound sensible to me. What do you say ?
I agree we don't want code duplication, but I think we could reduce
the code duplication to a pretty small amount with a few cleanups.
I don't think wait_for_xmins() looks very well-designed. It goes to
trouble of returning a value, but only 2 of the 6 call sites pay
attention to the returned value. I think we should change the
function so that it doesn't return anything and have the callers that
want a return value call get_slot_xmins() after wait_for_xmins().
And then I think we should turn around and get rid of get_slot_xmins()
altogether. Instead of:
my ($xmin, $catalog_xmin) = get_slot_xmins($master_slot);
is($xmin, '', "xmin null");
is($catalog_xmin, '', "catalog_xmin null");
We can write:
my $slot = $node_master->slot($master_slot);
is($slot->{'xmin'}, '', "xmin null");
is($slot->{'catalog_xmin'}, '', "catalog xmin null");
...which is not really any longer or harder to read, but does
eliminate the need for one function definition.
Then I think we should change wait_for_xmins so that it takes three
arguments rather than two: $node, $slotname, $check_expr. With that
and the previous change, we can get rid of get_node_from_slotname().
At that point, the body of wait_for_xmins() would consist of a single
call to $node->poll_query_until() or die(), which doesn't seem like
too much code to duplicate into a new file.
Looking at it at a bit more, though, I wonder why the recovery
conflict scenario is even using wait_for_xmins(). It's hard-coded to
check the state of the master_physical slot, which isn't otherwise
manipulated by the recovery conflict tests. What's the point of
testing that a slot which had xmin and catalog_xmin NULL before the
test started (line 414) and which we haven't changed since still has
those values at two different points during the test (lines 432, 452)?
Perhaps I'm missing something here, but it seems like this is just an
inadvertent entangling of these scenarios with the previous scenarios,
rather than anything that necessarily needs to be connected together.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, 27 Sep 2019 at 01:57, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 26, 2019 at 5:14 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Actually in some of the conflict-recovery testcases, I am still using
wait_for_xmins() so that we could test the xmin values back after we
drop the slots. So xmin-related testing is embedded in these recovery
tests as well. We can move the wait_for_xmins() function to some
common file and then do the split of this file, but then effectively
some of the xmin-testing would go into the recovery-related test file,
which did not sound sensible to me. What do you say ?I agree we don't want code duplication, but I think we could reduce
the code duplication to a pretty small amount with a few cleanups.I don't think wait_for_xmins() looks very well-designed. It goes to
trouble of returning a value, but only 2 of the 6 call sites pay
attention to the returned value. I think we should change the
function so that it doesn't return anything and have the callers that
want a return value call get_slot_xmins() after wait_for_xmins().
Yeah, that can be done.
And then I think we should turn around and get rid of get_slot_xmins()
altogether. Instead of:my ($xmin, $catalog_xmin) = get_slot_xmins($master_slot);
is($xmin, '', "xmin null");
is($catalog_xmin, '', "catalog_xmin null");We can write:
my $slot = $node_master->slot($master_slot);
is($slot->{'xmin'}, '', "xmin null");
is($slot->{'catalog_xmin'}, '', "catalog xmin null");...which is not really any longer or harder to read, but does
eliminate the need for one function definition.
Agreed.
Then I think we should change wait_for_xmins so that it takes three
arguments rather than two: $node, $slotname, $check_expr. With that
and the previous change, we can get rid of get_node_from_slotname().At that point, the body of wait_for_xmins() would consist of a single
call to $node->poll_query_until() or die(), which doesn't seem like
too much code to duplicate into a new file.
Earlier it used to have 3 params, the same ones you mentioned. I
removed $node for caller convenience.
Looking at it at a bit more, though, I wonder why the recovery
conflict scenario is even using wait_for_xmins(). It's hard-coded to
check the state of the master_physical slot, which isn't otherwise
manipulated by the recovery conflict tests. What's the point of
testing that a slot which had xmin and catalog_xmin NULL before the
test started (line 414) and which we haven't changed since still has
those values at two different points during the test (lines 432, 452)?
Perhaps I'm missing something here, but it seems like this is just an
inadvertent entangling of these scenarios with the previous scenarios,
rather than anything that necessarily needs to be connected together.
In the "Drop slot" test scenario, we are testing that after we
manually drop the slot on standby, the master catalog_xmin should be
back to NULL. Hence, the call to wait_for_xmins().
And in the "Scenario 1 : hot_standby_feedback off", wait_for_xmins()
is called the first time only as a mechanism to ensure that
"hot_standby_feedback = off" has taken effect. At the end of this
test, wait_for_xmins() again is called only to ensure that
hot_standby_feedback = on has taken effect.
Preferably I want wait_for_xmins() to get rid of the $node parameter,
because we can deduce it using slot name. But that requires having
get_node_from_slotname(). Your suggestion was to remove
get_node_from_slotname() and add back the $node param so as to reduce
duplicate code. Instead, how about keeping wait_for_xmins() in the
PostgresNode.pm() ? This way, we won't have duplication, and also we
can get rid of param $node. This is just my preference; if you are
quite inclined to not have get_node_from_slotname(), I will go with
your suggestion.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Fri, Sep 27, 2019 at 12:41 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Preferably I want wait_for_xmins() to get rid of the $node parameter,
because we can deduce it using slot name. But that requires having
get_node_from_slotname(). Your suggestion was to remove
get_node_from_slotname() and add back the $node param so as to reduce
duplicate code. Instead, how about keeping wait_for_xmins() in the
PostgresNode.pm() ? This way, we won't have duplication, and also we
can get rid of param $node. This is just my preference; if you are
quite inclined to not have get_node_from_slotname(), I will go with
your suggestion.
I'd be inclined not to have it. I think having a lookup function to
go from slot name -> node is strange; it doesn't really simplify
things that much for the caller, and it makes the logic harder to
follow. It would break outright if you had the same slot name on
multiple nodes, which is a perfectly reasonable scenario.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, 27 Sep 2019 at 23:21, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 27, 2019 at 12:41 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Preferably I want wait_for_xmins() to get rid of the $node parameter,
because we can deduce it using slot name. But that requires having
get_node_from_slotname(). Your suggestion was to remove
get_node_from_slotname() and add back the $node param so as to reduce
duplicate code. Instead, how about keeping wait_for_xmins() in the
PostgresNode.pm() ? This way, we won't have duplication, and also we
can get rid of param $node. This is just my preference; if you are
quite inclined to not have get_node_from_slotname(), I will go with
your suggestion.I'd be inclined not to have it. I think having a lookup function to
go from slot name -> node is strange; it doesn't really simplify
things that much for the caller, and it makes the logic harder to
follow. It would break outright if you had the same slot name on
multiple nodes, which is a perfectly reasonable scenario.
Alright. Attached is the updated patch that splits the file into two
files, one that does only xmin related testing, and the other test
file that tests conflict recovery scenarios, and also one scenario
where drop-database drops the slots on the database on standby.
Removed get_slot_xmins() and get_node_from_slotname().
Renamed 'replica' to 'standby'.
Used node->backup() function instead of pg_basebackup command.
Renamed $master_slot to $master_slotname, similarly for $standby_slot.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
logicaldecodng_standby_v3.tar.gzapplication/x-gzip; name=logicaldecodng_standby_v3.tar.gzDownload
� ���] �<ks�8�����\�D�%Y�����c��o<�Ov6���bA$(qM�:�������@R��Nfj��jU)S"
���h��h4kg��j~8�l��a���jaP�1��S}�c{��w~���:xm����>�v�������{�~��l��h5; ��;��|�b��|�����;��{���s��j�E��oN{n�1v�N���^w�u�N��kw���m�K�;�d�>k4N�k7K�������@#G<