Logical decoding on standby
Hi all
I've prepared a working initial, somewhat raw implementation for
logical decoding on physical standbys. Since it's a series of 20
smallish patches at the moment I won't attach it. You can find the
current version at time of writing here:
i.e the tag dev/logical-decoding-on-standby-pg10-v1 in github
repo 2ndQuadrant/postgres.
and whatever I'm working on (subject to rebase, breakage, etc) lives
in the branch dev/logical-decoding-on-standby-pg10 .
Quickstart
===
Compile and install like usual; make sure to install test_decoding
too. To see the functionality in action, configure with
--enable-tap-tests and:
make -C src/test/recovery check
To try manually, initdb a master, set pg_hba.conf to 'trust' on all
replication connections, append to postgresql.conf:
wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
hot_standby_feedback = on
then start the master. Now
psql -d 'master_connstr' -c "SELECT
pg_create_physical_replication_slot('standby1');"
and
pg_basebackup -d 'master_connstr' -X stream -R --slot=standby1
and start the replica.
You can now use pg_recvlogical to create a slot on the replica and
decode changes from it, e.g.
pg_recvlogical -d 'replica_connstr' -S test -P test_decoding --create-slot
pg_recvlogical -d 'replica_connstr' -S 'test' -f - --start
and you'll (hopefully) see subsequent changes you make on the master.
If not, tell me.
Patch series contents
===
This patch series incorporates the following changes:
* Timeline following for logical slots, so they can start decoding on
the correct timeline and follow timeline switches (with tests);
originally [3]/messages/by-id/CAMsr+YEQB3DbxmCUTTTX4RZy8J2uGrmb5+_ar_joFZNXa81Fug@mail.gmail.com
* Add --endpos to pg_recvlogical, with tests; originally [3]/messages/by-id/CAMsr+YEQB3DbxmCUTTTX4RZy8J2uGrmb5+_ar_joFZNXa81Fug@mail.gmail.com
* Splitting of xmin and catalog_xmin on hot standby feedback, so
logical slots on a replica only hold down catalog_xmin on the master,
not the xmin for user tables (with tests). Minimises upstream bloat;
originally [4]/messages/by-id/CAMsr+YFi-LV7S8ehnwUiZnb=1h_14PwQ25d-vyUNq-f5S5r=Zg@mail.gmail.com
* Suppress export of snapshot when starting logical decoding on
replica. Since we cannot allocate an xid, we cannot export snapshots
on standby. Decoding clients can still do their initial setup via a
slot on the master then switch over, do it via physical copy, etc.
* Require hot_standby_feedback to be enabled when starting logical
decoding on a standby.
* Drop any replication slots from a database when redo'ing database
drop, so we don't leave dangling slots on the replica (with tests).
* Make the walsender respect SIGUSR1 and exit via
RecoveryConflictInterrupt() when it gets
PROCSIG_RECOVERY_CONFLICT_DATABASE (with tests); see [6]/messages/by-id/CAMsr+YFb3R-t5O0jPGvz9_nsAt2GwwZiLSnYu3=X6mR9RnrbEw@mail.gmail.com
* PostgresNode.pm enhancements for the tests
* New test coverage for logical decoding on standby
Remaining issues
===
* The method used to make the walsender respect conflict with recovery
interrupts may not be entirely safe, see walsender
procsignal_sigusr1_handler thread [5]/messages/by-id/CAMsr+YFb3R-t5O0jPGvz9_nsAt2GwwZiLSnYu3=X6mR9RnrbEw@mail.gmail.com;
* We probably terminate walsenders running inside an output plugin
with a virtual xact whose xmin is below the upstream's global xmin,
even though its catalog xmin is fine, in
ResolveRecoveryConflictWithSnapshot(...). Haven't been able to test
this. Need to only terminate them when the conflict affects relations
accessible in logical decoding, which likely needs the upstream to
send more info in WAL;
* logical decoding timeline following needs tests for cascading
physical replication where an intermediate node is promoted per
timeline following thread [3]/messages/by-id/CAMsr+YEQB3DbxmCUTTTX4RZy8J2uGrmb5+_ar_joFZNXa81Fug@mail.gmail.com;
* walsender may need to maintain ThisTimeLineID in more places per
decoding timeline following v10 thread [3]/messages/by-id/CAMsr+YEQB3DbxmCUTTTX4RZy8J2uGrmb5+_ar_joFZNXa81Fug@mail.gmail.com;
* it may be desirable to refactor the walsender to deliver cleaner
logical decoding timeline following per the decoding timeline
following v10 thread[3]/messages/by-id/CAMsr+YEQB3DbxmCUTTTX4RZy8J2uGrmb5+_ar_joFZNXa81Fug@mail.gmail.com
also:
* Nothing stops the user from disabling hot_standby_feedback on the
standby or dropping and re-creating the physical slot on the master,
causing needed catalog tuples to get vacuumed away. Since it's not
going to be safe to check slot shmem state from the
hot_standby_feedback verify hook and we let hot_standby_feedback
change at runtime this is going to be hard to fix comprehensively, so
we need to cope with what happens when feedback fails, but:
* We don't yet detect when upstream's catalog_xmin increases past our
needed catalog_xmin and needed catalog tuples are vacuumed away by the
upstream. So we don't invalidate the slot or terminate any active
decoding sessions using the slot. Active decoding sessions often won't
have a vtxid to use with ResolveRecoveryConflictWithVirtualXIDs(),
transaction cancel is not going to be sufficient, and anyway it'll
cancel too aggressively since it doesn't know it's safe to apply
changes that affect only (non-user-catalog) heap tables without
conflict with decoding sessions.
... so this is definitely NOT ready for commit. It does, however, make
logical decoding work on standby.
Next steps
===
Since it doesn't look practical to ensure there's never been a gap in
hot standby feedback or detect such a gap directly, I'm currently
looking at ways to reliably detect when the upstream has removed
tuples we need and error out. That means we need a way to tell when
upstream's catalog_xmin has advanced, which we don't currently have
from xlogs. Checkpoint's oldestXID is insufficient since advance
could've happened since last checkpoint.
Related threads
===
This series supercedes:
* Timeline following for logical slots
[1]: /messages/by-id/CAMsr+YH-C1-X_+s=2nzAPnR0wwqJa-rUmVHSYyZaNSn93MUBMQ@mail.gmail.com
* WIP: Failover Slots
[2]: /messages/by-id/CAMsr+YFqtf6ecDVmJSLpC_G8T6KoNpKZZ_XgksODwPN+f=evqg@mail.gmail.com
and incorporates the patches in:
* Logical decoding timeline following take II
[3]: /messages/by-id/CAMsr+YEQB3DbxmCUTTTX4RZy8J2uGrmb5+_ar_joFZNXa81Fug@mail.gmail.com
* Send catalog_xmin separately in hot standby feedback
[4]: /messages/by-id/CAMsr+YFi-LV7S8ehnwUiZnb=1h_14PwQ25d-vyUNq-f5S5r=Zg@mail.gmail.com
* Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from
walsender?
[5]: /messages/by-id/CAMsr+YFb3R-t5O0jPGvz9_nsAt2GwwZiLSnYu3=X6mR9RnrbEw@mail.gmail.com
Also relevant:
* Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() in walsender
[6]: /messages/by-id/CAMsr+YFb3R-t5O0jPGvz9_nsAt2GwwZiLSnYu3=X6mR9RnrbEw@mail.gmail.com
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2016-11-21 16:17:58 +0800, Craig Ringer wrote:
I've prepared a working initial, somewhat raw implementation for
logical decoding on physical standbys.
Please attach. Otherwise in a year or two it'll be impossible to look
this up.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22 November 2016 at 03:14, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-11-21 16:17:58 +0800, Craig Ringer wrote:
I've prepared a working initial, somewhat raw implementation for
logical decoding on physical standbys.Please attach. Otherwise in a year or two it'll be impossible to look
this up.
Fair enough. Attached. Easy to apply with "git am".
I'm currently looking at making detection of replay conflict with a
slot work by separating the current catalog_xmin into two effective
parts - the catalog_xmin currently needed by any known slots
(ProcArray->replication_slot_catalog_xmin, as now), and the oldest
actually valid catalog_xmin where we know we haven't removed anything
yet.
That'll be recorded in a new CheckPoint.oldestCatalogXid field and in
ShmemVariableCache ( i.e. VariableCacheData.oldestCatalogXid ).
Vacuum will be responsible for advancing
VariableCacheData.oldestCatalogXid by writing an expanded
xl_heap_cleanup_info record with a new latestRemovedCatalogXid field
and then advancing the value in the ShmemVariableCache. Vacuum will
only remove rows of catalog or user-catalog tables that are older than
VariableCacheData.oldestCatalogXid.
This allows recovery on a standby to tell, based on the last
checkpoint + any xl_heap_cleanup_info records used to maintain
ShmemVariableCache, whether the upstream has removed catalog or
user-catalog records it needs. We can signal walsenders with running
xacts to terminate if their xmin passes the threshold, and when they
start a new xact they can check to see if they're still valid and bail
out.
xl_heap_cleanup_info isn't emitted much, but if adding a field there
is a problem we can instead add an additional xlog buffer that's only
appended when wal_level = logical.
Doing things this way avoids:
* the need for the standby to be able to tell at redo time whether a
RelFileNode is for a catalog or user-catalog relation without access
to relcache; or
* the need to add info on whether a catalog or user-catalog is being
affected to any xlog record that can cause a snapshot conflict on
standby; or
* a completely reliably way to ensure hot_standby_feedback can never
cease to affect the master even if the user does something dumb
at the cost of sometimes somewhat delaying removal of catalog or
user-catalog tuples when wal_level >= hot_standby, a new CheckPoint
field, and a new field in xl_heap_cleanup_info .
The above is not incorporated in the attached patch series, see the
prior post for status of the attached patches.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0005-Create-new-pg_lsn-class-to-deal-with-awkward-LSNs-in.patchtext/x-patch; charset=US-ASCII; name=0005-Create-new-pg_lsn-class-to-deal-with-awkward-LSNs-in.patchDownload+224-1
0006-Expand-streaming-replication-tests-to-cover-hot-stan.patchtext/x-patch; charset=US-ASCII; name=0006-Expand-streaming-replication-tests-to-cover-hot-stan.patchDownload+104-2
0007-Send-catalog_xmin-in-hot-standby-feedback-protocol.patchtext/x-patch; charset=US-ASCII; name=0007-Send-catalog_xmin-in-hot-standby-feedback-protocol.patchDownload+50-15
0008-Make-walsender-respect-catalog_xmin-in-hot-standby-f.patchtext/x-patch; charset=US-ASCII; name=0008-Make-walsender-respect-catalog_xmin-in-hot-standby-f.patchDownload+84-28
0009-Allow-GetOldestXmin-.-to-optionally-disregard-the-ca.patchtext/x-patch; charset=US-ASCII; name=0009-Allow-GetOldestXmin-.-to-optionally-disregard-the-ca.patchDownload+45-29
0001-Add-an-optional-endpos-LSN-argument-to-pg_recvlogica.patchtext/x-patch; charset=UTF-8; name=0001-Add-an-optional-endpos-LSN-argument-to-pg_recvlogica.patchDownload+164-16
0002-Add-a-pg_recvlogical-wrapper-to-PostgresNode.patchtext/x-patch; charset=US-ASCII; name=0002-Add-a-pg_recvlogical-wrapper-to-PostgresNode.patchDownload+104-3
0003-Follow-timeline-switches-in-logical-decoding.patchtext/x-patch; charset=US-ASCII; name=0003-Follow-timeline-switches-in-logical-decoding.patchDownload+347-23
0004-PostgresNode-methods-to-wait-for-node-catchup.patchtext/x-patch; charset=US-ASCII; name=0004-PostgresNode-methods-to-wait-for-node-catchup.patchDownload+120-13
0010-Send-catalog_xmin-separately-in-hot_standby_feedback.patchtext/x-patch; charset=US-ASCII; name=0010-Send-catalog_xmin-separately-in-hot_standby_feedback.patchDownload+14-5
0011-Update-comment-on-issues-with-logical-decoding-on-st.patchtext/x-patch; charset=US-ASCII; name=0011-Update-comment-on-issues-with-logical-decoding-on-st.patchDownload+16-5
0012-Don-t-attempt-to-export-a-snapshot-from-CREATE_REPLI.patchtext/x-patch; charset=US-ASCII; name=0012-Don-t-attempt-to-export-a-snapshot-from-CREATE_REPLI.patchDownload+12-2
0013-ERROR-if-timeline-is-zero-in-walsender.patchtext/x-patch; charset=US-ASCII; name=0013-ERROR-if-timeline-is-zero-in-walsender.patchDownload+5-1
0014-Permit-logical-decoding-on-standby-with-a-warning.patchtext/x-patch; charset=US-ASCII; name=0014-Permit-logical-decoding-on-standby-with-a-warning.patchDownload+17-20
0015-Tests-for-logical-decoding-on-standby.patchtext/x-patch; charset=US-ASCII; name=0015-Tests-for-logical-decoding-on-standby.patchDownload+168-1
0016-Drop-logical-replication-slots-when-redoing-database.patchtext/x-patch; charset=US-ASCII; name=0016-Drop-logical-replication-slots-when-redoing-database.patchDownload+182-16
0017-Allow-walsender-to-exit-on-conflict-with-recovery.patchtext/x-patch; charset=US-ASCII; name=0017-Allow-walsender-to-exit-on-conflict-with-recovery.patchDownload+1-14
0018-Tests-for-db-drop-during-decoding-on-standby.patchtext/x-patch; charset=US-ASCII; name=0018-Tests-for-db-drop-during-decoding-on-standby.patchDownload+24-23
On 22 November 2016 at 10:20, Craig Ringer <craig@2ndquadrant.com> wrote:
I'm currently looking at making detection of replay conflict with a
slot work by separating the current catalog_xmin into two effective
parts - the catalog_xmin currently needed by any known slots
(ProcArray->replication_slot_catalog_xmin, as now), and the oldest
actually valid catalog_xmin where we know we haven't removed anything
yet.
OK, more detailed plan.
The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
are already held down by ProcArray's catalog_xmin. But that doesn't
mean we haven't removed newer tuples from specific relations and
logged that in xl_heap_clean, etc, including catalogs or user
catalogs, it only means the clog still exists for those XIDs. We don't
emit a WAL record when we advance oldestXid in
SetTransactionIdLimit(), and doing so is useless because vacuum will
have already removed needed tuples from needed catalogs before calling
SetTransactionIdLimit() from vac_truncate_clog(). We know that if
oldestXid is n, the true valid catalog_xmin where no needed tuples
have been removed must be >= n. But we need to know the lower bound of
valid catalog_xmin, which oldestXid doesn't give us.
So right now a standby has no way to reliably know if the catalog_xmin
requirement for a given replication slot can be satisfied. A standby
can't tell based on a xl_heap_cleanup_info record, xl_heap_clean
record, etc whether the affected table is a catalog or not, and
shouldn't generate conflicts for non-catalogs since otherwise it'll be
constantly clobbering walsenders.
A 2-phase advance of the global catalog_xmin would mean that
GetOldestXmin() would return a value from ShmemVariableCache, not the
oldest catalog xmin from ProcArray like it does now. (auto)vacuum
would then be responsible for:
* Reading the oldest catalog_xmin from procarray
* If it has advanced vs what's present in ShmemVariableCache, writing
a new xlog record type recording an advance of oldest catalog xmin
* advancing ShmemVariableCache's oldest catalog xmin
and would do so before it called GetOldestXmin via
vacuum_set_xid_limits() to determine what it can remove.
GetOldestXmin would return the ProcArray's copy of the oldest
catalog_xmin when in recovery, so we report it via hot_standby_fedback
to the upstream, it's recorded on our physical slot, and in turn
causes vacuum to advance the master's effective oldest catalog_xmin
for vacuum.
On the standby we'd replay the catalog_xmin advance record, advance
the standby's ShmemVariableCache's oldest catalog xmin, and check to
see if any replication slots, active or not, have a catalog_xmin <
than the new threshold. If none do, there's no conflict and we're
fine. If any do, we wait
max_standby_streaming_delay/max_standby_archive_delay as appropriate,
then generate recovery conflicts against all backends that have an
active replication slot based on the replication slot state in shmem.
Those backends - walsender or normal decoding backend - would promptly
die. New decoding sessions will check the ShmemVariableCache and
refuse to start if their catalog_xmin is < the threshold. Since we
advance it before generating recovery conflicts there's no race with
clients trying to reconnect after their backend is killed with a
conflict.
If we wanted to get fancy we could set the latches of walsender
backends at risk of conflicting and they could check
ShmemVariableCache's oldest valid catalog xmin, so they could send
immediate keepalives with reply_requested set and hopefully get flush
confirmation from the client and advance their catalog_xmin before we
terminate them as conflicting with recovery. But that can IMO be done
later/separately.
Going to prototype this.
Alternate approach:
---------------
The oldest xid in heap_xlog_cleanup_info is relation-specific, but the
standby has no way to know if it's a catalog relation or not during
redo and know whether to kill slots and decoding sessions based on its
latestRemovedXid. Same for xl_heap_clean and the other records that
can cause snapshot conflicts (xl_xlog_visible, xl_heap_freeze_page,
xl_btree_delete xl_btree_reuse_page, spgxlogVacuumRedirect).
Instead of adding a 2-phase advance of the global catalog_xmin, we can
instead add a rider to each of these records that identifies whether
it's a catalog table or not. This would only be emitted when wal_level
= logical, but it *would* increase WAL sizes a bit when logical
decoding is enabled even if it's not going to be used on a standby.
The rider would be a simple:
typedef struct xl_rel_catalog_info
{
bool rel_accessible_from_logical_decoding;
} xl_catalog_info;
or similar. During redo we call a new
ResolveRecoveryConflictWithLogicalSlot function from each of those
records' redo routines that does what I outlined above.
This way add more info to more xlog records, and the upstream has to
use RelationIsAccessibleInLogicalDecoding() to set up the records when
writing the xlogs. In exchange, we don't have to add a new field to
CheckPoint or ShmemVariableCache or add a new xlog record type. It
seems the worse option to me.
(BTW, as comments on GetOldestSafeDecodingTransactionId() note, we
can't rely on KnownAssignedXidsGetOldestXmin() since it can be
incomplete at least on standby.)
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Nov 22, 2016 at 1:49 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 22 November 2016 at 10:20, Craig Ringer <craig@2ndquadrant.com> wrote:
I'm currently looking at making detection of replay conflict with a
slot work by separating the current catalog_xmin into two effective
parts - the catalog_xmin currently needed by any known slots
(ProcArray->replication_slot_catalog_xmin, as now), and the oldest
actually valid catalog_xmin where we know we haven't removed anything
yet.OK, more detailed plan.
The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
are already held down by ProcArray's catalog_xmin. But that doesn't
mean we haven't removed newer tuples from specific relations and
logged that in xl_heap_clean, etc, including catalogs or user
catalogs, it only means the clog still exists for those XIDs.
Really?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23 November 2016 at 03:55, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Nov 22, 2016 at 1:49 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 22 November 2016 at 10:20, Craig Ringer <craig@2ndquadrant.com> wrote:
I'm currently looking at making detection of replay conflict with a
slot work by separating the current catalog_xmin into two effective
parts - the catalog_xmin currently needed by any known slots
(ProcArray->replication_slot_catalog_xmin, as now), and the oldest
actually valid catalog_xmin where we know we haven't removed anything
yet.OK, more detailed plan.
The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
are already held down by ProcArray's catalog_xmin. But that doesn't
mean we haven't removed newer tuples from specific relations and
logged that in xl_heap_clean, etc, including catalogs or user
catalogs, it only means the clog still exists for those XIDs.Really?
(Note the double negative above).
Yes, necessarily so. You can't look up xids older than the clog
truncation threshold at oldestXid, per our discussion on txid_status()
and traceable commit. But the tuples from that xact aren't guaranteed
to exist in any given relation; vacuum uses vacuum_set_xid_limits(...)
which calls GetOldestXmin(...); that in turn scans ProcArray to find
the oldest xid any running xact cares about. It might bump it down
further if there's a replication slot requirement or based on
vacuum_defer_cleanup_age, but it doesn't care in the slightest about
oldestXmin.
oldestXmin cannot advance until vacuum has removed all tuples for that
xid and advanced the database's datfrozenxid. But a given oldestXmin
says nothing about which tuples, catalog or otherwise, still exist and
are acessible.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Nov 23, 2016 at 8:37 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
are already held down by ProcArray's catalog_xmin. But that doesn't
mean we haven't removed newer tuples from specific relations and
logged that in xl_heap_clean, etc, including catalogs or user
catalogs, it only means the clog still exists for those XIDs.Really?
(Note the double negative above).
Yes, necessarily so. You can't look up xids older than the clog
truncation threshold at oldestXid, per our discussion on txid_status()
and traceable commit. But the tuples from that xact aren't guaranteed
to exist in any given relation; vacuum uses vacuum_set_xid_limits(...)
which calls GetOldestXmin(...); that in turn scans ProcArray to find
the oldest xid any running xact cares about. It might bump it down
further if there's a replication slot requirement or based on
vacuum_defer_cleanup_age, but it doesn't care in the slightest about
oldestXmin.oldestXmin cannot advance until vacuum has removed all tuples for that
xid and advanced the database's datfrozenxid. But a given oldestXmin
says nothing about which tuples, catalog or otherwise, still exist and
are acessible.
Right. Sorry, my mistake.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 Nov. 2016 23:40, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Wed, Nov 23, 2016 at 8:37 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:
The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
are already held down by ProcArray's catalog_xmin. But that doesn't
mean we haven't removed newer tuples from specific relations and
logged that in xl_heap_clean, etc, including catalogs or user
catalogs, it only means the clog still exists for those XIDs.Really?
(Note the double negative above).
Yes, necessarily so. You can't look up xids older than the clog
truncation threshold at oldestXid, per our discussion on txid_status()
and traceable commit. But the tuples from that xact aren't guaranteed
to exist in any given relation; vacuum uses vacuum_set_xid_limits(...)
which calls GetOldestXmin(...); that in turn scans ProcArray to find
the oldest xid any running xact cares about. It might bump it down
further if there's a replication slot requirement or based on
vacuum_defer_cleanup_age, but it doesn't care in the slightest about
oldestXmin.oldestXmin cannot advance until vacuum has removed all tuples for that
xid and advanced the database's datfrozenxid. But a given oldestXmin
says nothing about which tuples, catalog or otherwise, still exist and
are acessible.Right. Sorry, my mistake.
Phew. Had me worried there.
Thanks for looking over it. Prototype looks promising so far.
Hi,
I did look at the code a bit. The first 6 patches seem reasonable.
I don't understand why some patches are separate tbh (like 7-10, or 11).
About the 0009:
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c index 9985e3e..4fa3ad4 100644 --- a/contrib/pg_visibility/pg_visibility.c +++ b/contrib/pg_visibility/pg_visibility.c @@ -538,7 +538,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) if (all_visible) { /* Don't pass rel; that will fail in recovery. */ - OldestXmin = GetOldestXmin(NULL, true); + OldestXmin = GetOldestXmin(NULL, true, false); }rel = relation_open(relid, AccessShareLock); @@ -660,7 +660,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen) * a buffer lock. And this shouldn't happen often, so it's * worth being careful so as to avoid false positives. */ - RecomputedOldestXmin = GetOldestXmin(NULL, true); + RecomputedOldestXmin = GetOldestXmin(NULL, true, false);if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin)) record_corrupt_item(items, &tuple.t_self); diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c index f524fc4..5b33c97 100644 --- a/contrib/pgstattuple/pgstatapprox.c +++ b/contrib/pgstattuple/pgstatapprox.c @@ -70,7 +70,7 @@ statapprox_heap(Relation rel, output_type *stat) TransactionId OldestXmin; uint64 misc_count = 0;- OldestXmin = GetOldestXmin(rel, true); + OldestXmin = GetOldestXmin(rel, true, false); bstrategy = GetAccessStrategy(BAS_BULKREAD);nblocks = RelationGetNumberOfBlocks(rel);
This does not seem correct, you are sending false as pointer parameter.
0012:
I think there should be parameter saying if snapshot should be exported
or not and if user asks for it on standby it should fail.
0014 makes 0011 even more pointless.
Not going into deeper detail as this is still very WIP. I go agree with
the general design though.
This also replaces the previous timeline following and decoding
threads/CF entries so maybe those should be closed in CF?
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--- a/contrib/pgstattuple/pgstatapprox.c +++ b/contrib/pgstattuple/pgstatapprox.c @@ -70,7 +70,7 @@ statapprox_heap(Relation rel, output_type *stat) TransactionId OldestXmin; uint64 misc_count = 0;- OldestXmin = GetOldestXmin(rel, true); + OldestXmin = GetOldestXmin(rel, true, false); bstrategy = GetAccessStrategy(BAS_BULKREAD);nblocks = RelationGetNumberOfBlocks(rel);
This does not seem correct, you are sending false as pointer parameter.
Thanks. That's an oversight from the GetOldestXmin interface change
per your prior feedback. C doesn't care since null is 0 and false is
0, and I missed it when transforming the patch.
0012:
I think there should be parameter saying if snapshot should be exported
or not and if user asks for it on standby it should fail.
Sounds reasonable. That also means clients can suppress standby export
on master, which as we recently learned can be desirable sometimes.
0014 makes 0011 even more pointless.
Yeah, as I said, it's a bit WIP still and needs some rebasing and rearrangement.
This also replaces the previous timeline following and decoding
threads/CF entries so maybe those should be closed in CF?
I wasn't sure what to do about that, since it's all a set of related
functionality. I think it's going to get more traction as a single
"logical decoding onstandby" feature though, since the other parts are
hard to test and use in isolation. So yeah, probably, I'll do so
unless someone objects.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21 November 2016 at 16:17, Craig Ringer <craig@2ndquadrant.com> wrote:
Hi all
I've prepared a working initial, somewhat raw implementation for
logical decoding on physical standbys.
Hi all
I've attached a significantly revised patch, which now incorporates
safeguards to ensure that we prevent decoding if the master has not
retained needed catalogs and cancel decoding sessions that are holding
up apply because they need too-old catalogs
The biggest change in this patch, and the main intrusive part, is that
procArray->replication_slot_catalog_xmin is no longer directly used by
vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
added, with a corresponding CheckPoint field. Vacuum notices if
procArray->replication_slot_catalog_xmin has advanced past
ShmemVariableCache->oldestCatalogXmin and writes a new xact rmgr
record with the new value before it copies it to oldestCatalogXmin.
This means that a standby can now reliably tell when catalogs are
about to be removed or become candidates for removal, so it can pause
redo until logical decoding sessions on the standby advance far enough
that their catalog_xmin passes that point. It also means that if our
hot_standby_feedback somehow fails to lock in the catalogs our slots
need on a standby, we can cancel sessions with a conflict with
recovery.
If wal_level is < logical this won't do anything, since
replication_slot_catalog_xmin and oldestCatalogXmin will both always
be 0.
Because oldestCatalogXmin advances eagerly as soon as vacuum sees the
new replication_slot_catalog_xmin, this won't impact catalog bloat.
Ideally this mechanism won't generally actually be needed, since
hot_standby_feedback stops the master from removing needed catalogs,
and we make an effort to ensure that the standby has
hot_standby_feedback enabled and is using a replication slot. We
cannot prevent the user from dropping and re-creating the physical
slot on the upstream, though, and it doesn't look simple to stop them
turning off hot_standby_feedback or turning off use of a physical slot
after creating logical slots, either. So we try to stop users shooting
themselves in the foot, but if they do it anyway we notice and cope
gracefully. Logging catalog_xmin also helps slots created on standbys
know where to start, and makes sure we can deal gracefully with a race
between hs_feedback and slot creation on a standby.
There can be a significant delay for slot creation on standby since we
have to wait until there's a new xl_running_xacts record logged. I'd
like to extend the hot_standby_feedback protocol a little to address
that and some other issues, but that's a separate step.
I haven't addressed Petr's point yet, that "there should be parameter
saying if snapshot should be exported
or not and if user asks for it on standby it should fail". Otherwise I
think it's looking fairly solid.
Due to the amount of churn I landed up flattening the patchset. It
probably makes sense to split it up, likely into the sequence of
changes listed in the commit message. I'll wait for a general opinion
on the validity of this approach first.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-PostgresNode-methods-to-wait-for-node-catchup.patchtext/x-patch; charset=US-ASCII; name=0001-PostgresNode-methods-to-wait-for-node-catchup.patchDownload+120-13
0002-Create-new-pg_lsn-class-to-deal-with-awkward-LSNs-in.patchtext/x-patch; charset=US-ASCII; name=0002-Create-new-pg_lsn-class-to-deal-with-awkward-LSNs-in.patchDownload+224-1
0003-Add-an-optional-endpos-LSN-argument-to-pg_recvlogica.patchtext/x-patch; charset=UTF-8; name=0003-Add-an-optional-endpos-LSN-argument-to-pg_recvlogica.patchDownload+164-16
0004-Add-a-pg_recvlogical-wrapper-to-PostgresNode.patchtext/x-patch; charset=US-ASCII; name=0004-Add-a-pg_recvlogical-wrapper-to-PostgresNode.patchDownload+104-3
0005-Follow-timeline-switches-in-logical-decoding.patchtext/x-patch; charset=US-ASCII; name=0005-Follow-timeline-switches-in-logical-decoding.patchDownload+347-23
0006-Expand-streaming-replication-tests-to-cover-hot-stan.patchtext/x-patch; charset=US-ASCII; name=0006-Expand-streaming-replication-tests-to-cover-hot-stan.patchDownload+104-2
0007-Don-t-attempt-to-export-a-snapshot-from-CREATE_REPLI.patchtext/x-patch; charset=US-ASCII; name=0007-Don-t-attempt-to-export-a-snapshot-from-CREATE_REPLI.patchDownload+12-2
0008-ERROR-if-timeline-is-zero-in-walsender.patchtext/x-patch; charset=US-ASCII; name=0008-ERROR-if-timeline-is-zero-in-walsender.patchDownload+5-1
0009-Logical-decoding-on-standby.patchtext/x-patch; charset=US-ASCII; name=0009-Logical-decoding-on-standby.patchDownload+1547-130
On 07/12/16 07:05, Craig Ringer wrote:
On 21 November 2016 at 16:17, Craig Ringer <craig@2ndquadrant.com> wrote:
Hi all
I've prepared a working initial, somewhat raw implementation for
logical decoding on physical standbys.Hi all
I've attached a significantly revised patch, which now incorporates
safeguards to ensure that we prevent decoding if the master has not
retained needed catalogs and cancel decoding sessions that are holding
up apply because they need too-old catalogsThe biggest change in this patch, and the main intrusive part, is that
procArray->replication_slot_catalog_xmin is no longer directly used by
vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
added, with a corresponding CheckPoint field. Vacuum notices if
procArray->replication_slot_catalog_xmin has advanced past
ShmemVariableCache->oldestCatalogXmin and writes a new xact rmgr
record with the new value before it copies it to oldestCatalogXmin.
This means that a standby can now reliably tell when catalogs are
about to be removed or become candidates for removal, so it can pause
redo until logical decoding sessions on the standby advance far enough
that their catalog_xmin passes that point. It also means that if our
hot_standby_feedback somehow fails to lock in the catalogs our slots
need on a standby, we can cancel sessions with a conflict with
recovery.If wal_level is < logical this won't do anything, since
replication_slot_catalog_xmin and oldestCatalogXmin will both always
be 0.Because oldestCatalogXmin advances eagerly as soon as vacuum sees the
new replication_slot_catalog_xmin, this won't impact catalog bloat.Ideally this mechanism won't generally actually be needed, since
hot_standby_feedback stops the master from removing needed catalogs,
and we make an effort to ensure that the standby has
hot_standby_feedback enabled and is using a replication slot. We
cannot prevent the user from dropping and re-creating the physical
slot on the upstream, though, and it doesn't look simple to stop them
turning off hot_standby_feedback or turning off use of a physical slot
after creating logical slots, either. So we try to stop users shooting
themselves in the foot, but if they do it anyway we notice and cope
gracefully. Logging catalog_xmin also helps slots created on standbys
know where to start, and makes sure we can deal gracefully with a race
between hs_feedback and slot creation on a standby.
Hi,
If this mechanism would not be needed most of the time, wouldn't it be
better to not have it and just have a way to ask physical slot about
what's the current reserved catalog_xmin (in most cases the standby
should actually know what it is since it's sending the hs_feedback, but
first slot created on such standby may need to check). WRT preventing
hs_feedback going off, we can refuse to start with hs_feedback off when
there are logical slots detected. We can also refuse to connect to the
master without physical slot if there are logical slots detected. I
don't see problem with either of those.
You may ask what if user drops the slot and recreates or somehow
otherwise messes up catalog_xmin on master, well, considering that under
my proposal we'd first (on connect) check the slot for catalog_xmin we'd
know about it so we'd either mark the local slots broken/drop them or
plainly refuse to connect to the master same way as if it didn't have
required WAL anymore (not sure which behavior is better). Note that user
could mess up catalog_xmin even in your design the same way, so it's not
really a regression.
In general I don't think that it's necessary to WAL log anything for
this. It will not work without slot and therefore via archiving anyway
so writing to WAL does not seem to buy us anything. There are some
interesting side effects of cascading (ie having A->B->C replication and
creating logical slot on C) but those should not be insurmountable. Plus
it might even be okay to only allow creating logical slots on standbys
connected directly to master in v1.
That's about approach, but since there are prerequisite patches in the
patchset that don't really depend on the approach I will comment about
them as well.
0001 and 0002 add testing infrastructure and look fine to me, possibly
committable.
But in 0003 I don't understand following code:
+ if (endpos != InvalidXLogRecPtr && !do_start_slot) + { + fprintf(stderr, + _("%s: cannot use --create-slot or --drop-slot together with --endpos\n"), + progname); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), + progname); + exit(1); + }
Why is --create-slot and --endpos not allowed together?
0004 again looks good but depends on 0003.
0005 is timeline following which is IMHO ready for committer, as is 0006
and 0008 and I still maintain the opinion that these should go in soon.
0007 is unfinished as you said in your mail (missing option to specify
behavior). And the last one 0009 is the implementation discussed above,
which I think needs rework. IMHO 0007 and 0009 should be ultimately merged.
I think parts of this could be committed separately and are ready for
committer IMHO, but there is no way in CF application to mark only part
of patch-set for committer to attract the attention.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 20 December 2016 at 15:03, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
The biggest change in this patch, and the main intrusive part, is that
procArray->replication_slot_catalog_xmin is no longer directly used by
vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
added, with a corresponding CheckPoint field.
[snip]If this mechanism would not be needed most of the time, wouldn't it be
better to not have it and just have a way to ask physical slot about
what's the current reserved catalog_xmin (in most cases the standby
should actually know what it is since it's sending the hs_feedback, but
first slot created on such standby may need to check).
Yes, and that was actually my originally preferred approach, though
the one above does offer the advantage that if something goes wrong we
can detect it and fail gracefully. Possibly not worth the complexity
though.
Your approach requires us to make very sure that hot_standby_feedback
does not get turned off by user or become ineffective once we're
replicating, though, since we won't have any way to detect when needed
tuples are removed. We'd probably just bail out with relcache/syscache
lookup errors, but I can't guarantee we wouldn't crash if we tried
logical decoding on WAL where needed catalogs have been removed.
I initially ran into trouble doing that, but now think I have a way forward.
WRT preventing
hs_feedback going off, we can refuse to start with hs_feedback off when
there are logical slots detected.
Yes. There are some ordering issues there though. We load slots quite
late in startup and they don't get tracked in checkpoints. So we might
launch the walreceiver before we load slots and notice their needed
xmin/catalog_xmin. So we need to prevent sending of
hot_standby_feedback until slots are loaded, or load slots earlier in
startup. The former sounds less intrusive and safer - probably just
add an "initialized" flag to ReplicationSlotCtlData and suppress
hs_feedback until it becomes true.
We'd also have to suppress the validation callback action on the
hot_standby_feedback GUC until we know replication slot state is
initialised, and perform the check during slot startup instead. The
hot_standby_feedback GUC validation check might get called before
shmem is even set up so we have to guard against attempts to access a
shmem segment that may not event exist yet.
The general idea is workable though. Refuse to start if logical slots
exist and hot_standby_feedback is off or walreceiver isn't using a
physical slot. Refuse to allow hot_standby_feedback to change
We can also refuse to connect to the
master without physical slot if there are logical slots detected. I
don't see problem with either of those.
Agreed. We must also be able to reliably enforce that the walreceiver
is using a replication slot to connect to the master and refuse to
start if it is not. The user could change recovery.conf and restart
the walreceiver while we're running, after we perform that check, so
walreceiver must also refuse to start if logical replication slots
exist but it has no primary slot name configured.
You may ask what if user drops the slot and recreates or somehow
otherwise messes up catalog_xmin on master, well, considering that under
my proposal we'd first (on connect) check the slot for catalog_xmin we'd
know about it so we'd either mark the local slots broken/drop them or
plainly refuse to connect to the master same way as if it didn't have
required WAL anymore (not sure which behavior is better). Note that user
could mess up catalog_xmin even in your design the same way, so it's not
really a regression.
Agreed. Checking catalog_xmin of the slot when we connect is
sufficient to guard against that, assuming we can trust that the
catalog_xmin is actually in effect on the master. Consider cascading
setups, where we set our catalog_xmin but it might not be "locked in"
until the middle cascaded server relays it to the master.
I have a proposed solution to that which I'll outline in a separate
patch+post; it ties in to some work on addressing the race between hot
standby feedback taking effect and queries starting on the hot
standby. It boils down to "add a hot_standby_feedback reply protocol
message".
Plus
it might even be okay to only allow creating logical slots on standbys
connected directly to master in v1.
True. I didn't consider that.
We haven't had much luck in the past with such limitations, but
personally I'd consider it a perfectly reasonable one.
But in 0003 I don't understand following code:
+ if (endpos != InvalidXLogRecPtr && !do_start_slot) + { + fprintf(stderr, + _("%s: cannot use --create-slot or --drop-slot together with --endpos\n"), + progname); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), + progname); + exit(1); + }Why is --create-slot and --endpos not allowed together?
What would --create-slot with --endpos do?
Ah. I had not realised that it is legal to do
pg_recvlogical -S test --create-slot --start -f - -d 'test'
i.e. "create a slot then in the same invocation begin decoding from it".
I misread and thought that --create-slot and --start were mutually exclusive.
I will address that.
0005 is timeline following which is IMHO ready for committer, as is 0006
and 0008 and I still maintain the opinion that these should go in soon.
I wonder if I should re-order 0005 and 0006 so we can commit the
hot_standby test improvements before logical decoding timeline
following.
I think parts of this could be committed separately and are ready for
committer IMHO, but there is no way in CF application to mark only part
of patch-set for committer to attract the attention.
Yeah. I raised that before and nobody was really sure what to do about
it. It's confusing to post patches on the same thread on separate CF
entries. It's also confusing to post patches on a nest of
inter-related threads to allow each thread to be tracked by a separate
CF entry.
At the moment I'm aiming to progressively get the underlying
infrastructure/test stuff in so we can focus on the core feature.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 20 December 2016 at 15:03, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
But in 0003 I don't understand following code:
+ if (endpos != InvalidXLogRecPtr && !do_start_slot) + { + fprintf(stderr, + _("%s: cannot use --create-slot or --drop-slot together with --endpos\n"), + progname); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), + progname); + exit(1); + }Why is --create-slot and --endpos not allowed together?
Actually, the test is fine, the error is just misleading due to my
misunderstanding.
The fix is simply to change the error message to
_("%s: --endpos may only be specified
with --start\n"),
so I won't post a separate followup patch.
Okano Naoki tried to bring this to my attention earlier, but I didn't
understand as I hadn't yet realised you could use --create-slot
--start, they weren't mutually exclusive.
(We test to ensure --start --drop-slot isn't specified earlier so no
test for do_drop_slot is required).
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21/12/16 04:31, Craig Ringer wrote:
On 20 December 2016 at 15:03, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
But in 0003 I don't understand following code:
+ if (endpos != InvalidXLogRecPtr && !do_start_slot) + { + fprintf(stderr, + _("%s: cannot use --create-slot or --drop-slot together with --endpos\n"), + progname); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), + progname); + exit(1); + }Why is --create-slot and --endpos not allowed together?
Actually, the test is fine, the error is just misleading due to my
misunderstanding.The fix is simply to change the error message to
_("%s: --endpos may only be specified
with --start\n"),so I won't post a separate followup patch.
Ah okay makes sense. The --create-slot + --endpos should definitely be
allowed combination, especially now that we can extend this to
optionally use temporary slot.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21/12/16 04:06, Craig Ringer wrote:
On 20 December 2016 at 15:03, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
The biggest change in this patch, and the main intrusive part, is that
procArray->replication_slot_catalog_xmin is no longer directly used by
vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
added, with a corresponding CheckPoint field.
[snip]If this mechanism would not be needed most of the time, wouldn't it be
better to not have it and just have a way to ask physical slot about
what's the current reserved catalog_xmin (in most cases the standby
should actually know what it is since it's sending the hs_feedback, but
first slot created on such standby may need to check).Yes, and that was actually my originally preferred approach, though
the one above does offer the advantage that if something goes wrong we
can detect it and fail gracefully. Possibly not worth the complexity
though.Your approach requires us to make very sure that hot_standby_feedback
does not get turned off by user or become ineffective once we're
replicating, though, since we won't have any way to detect when needed
tuples are removed. We'd probably just bail out with relcache/syscache
lookup errors, but I can't guarantee we wouldn't crash if we tried
logical decoding on WAL where needed catalogs have been removed.I initially ran into trouble doing that, but now think I have a way forward.
WRT preventing
hs_feedback going off, we can refuse to start with hs_feedback off when
there are logical slots detected.Yes. There are some ordering issues there though. We load slots quite
late in startup and they don't get tracked in checkpoints. So we might
launch the walreceiver before we load slots and notice their needed
xmin/catalog_xmin. So we need to prevent sending of
hot_standby_feedback until slots are loaded, or load slots earlier in
startup. The former sounds less intrusive and safer - probably just
add an "initialized" flag to ReplicationSlotCtlData and suppress
hs_feedback until it becomes true.We'd also have to suppress the validation callback action on the
hot_standby_feedback GUC until we know replication slot state is
initialised, and perform the check during slot startup instead. The
hot_standby_feedback GUC validation check might get called before
shmem is even set up so we have to guard against attempts to access a
shmem segment that may not event exist yet.The general idea is workable though. Refuse to start if logical slots
exist and hot_standby_feedback is off or walreceiver isn't using a
physical slot. Refuse to allow hot_standby_feedback to change
These are all problems associated with replication slots being in shmem
if I understand correctly. I wonder, could we put just bool someplace
which is available early that says if there are any logical slots
defined? We don't actually need all the slot info, just to know if there
are some.
You may ask what if user drops the slot and recreates or somehow
otherwise messes up catalog_xmin on master, well, considering that under
my proposal we'd first (on connect) check the slot for catalog_xmin we'd
know about it so we'd either mark the local slots broken/drop them or
plainly refuse to connect to the master same way as if it didn't have
required WAL anymore (not sure which behavior is better). Note that user
could mess up catalog_xmin even in your design the same way, so it's not
really a regression.Agreed. Checking catalog_xmin of the slot when we connect is
sufficient to guard against that, assuming we can trust that the
catalog_xmin is actually in effect on the master. Consider cascading
setups, where we set our catalog_xmin but it might not be "locked in"
until the middle cascaded server relays it to the master.I have a proposed solution to that which I'll outline in a separate
patch+post; it ties in to some work on addressing the race between hot
standby feedback taking effect and queries starting on the hot
standby. It boils down to "add a hot_standby_feedback reply protocol
message".Plus
it might even be okay to only allow creating logical slots on standbys
connected directly to master in v1.True. I didn't consider that.
We haven't had much luck in the past with such limitations, but
personally I'd consider it a perfectly reasonable one.
I think it's infinitely better with that limitation than the status quo.
Especially for failover scenario (you usually won't failover to replica
down the cascade as it's always more behind). Not to mention that with
every level of cascading you get automatically more lag which means more
bloat so it might not even be all that desirable to go that route
immediately in v1 when we don't have way to control that bloat/maximum
slot lag.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 20, 2016 at 10:06 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 20 December 2016 at 15:03, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
The biggest change in this patch, and the main intrusive part, is that
procArray->replication_slot_catalog_xmin is no longer directly used by
vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
added, with a corresponding CheckPoint field.
[snip]If this mechanism would not be needed most of the time, wouldn't it be
better to not have it and just have a way to ask physical slot about
what's the current reserved catalog_xmin (in most cases the standby
should actually know what it is since it's sending the hs_feedback, but
first slot created on such standby may need to check).Yes, and that was actually my originally preferred approach, though
the one above does offer the advantage that if something goes wrong we
can detect it and fail gracefully. Possibly not worth the complexity
though.Your approach requires us to make very sure that hot_standby_feedback
does not get turned off by user or become ineffective once we're
replicating, though, since we won't have any way to detect when needed
tuples are removed. We'd probably just bail out with relcache/syscache
lookup errors, but I can't guarantee we wouldn't crash if we tried
logical decoding on WAL where needed catalogs have been removed.
I dunno, Craig, I think your approach sounds more robust. It's not
very nice to introduce a bunch of random prohibitions on what works
with what, and it doesn't sound like it's altogether watertight
anyway. Incorporating an occasional, small record into the WAL stream
to mark the advancement of the reserved catalog_xmin seems like a
cleaner and safer solution. We certainly do NOT want to find out
about corruption only because of random relcache/syscache lookup
failures, let alone crashes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Dec 20, 2016 at 4:03 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
That's about approach, but since there are prerequisite patches in the
patchset that don't really depend on the approach I will comment about
them as well.0001 and 0002 add testing infrastructure and look fine to me, possibly
committable.But in 0003 I don't understand following code:
+ if (endpos != InvalidXLogRecPtr && !do_start_slot) + { + fprintf(stderr, + _("%s: cannot use --create-slot or --drop-slot together with --endpos\n"), + progname); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), + progname); + exit(1); + }Why is --create-slot and --endpos not allowed together?
0004 again looks good but depends on 0003.
0005 is timeline following which is IMHO ready for committer, as is 0006
and 0008 and I still maintain the opinion that these should go in soon.0007 is unfinished as you said in your mail (missing option to specify
behavior). And the last one 0009 is the implementation discussed above,
which I think needs rework. IMHO 0007 and 0009 should be ultimately merged.I think parts of this could be committed separately and are ready for
committer IMHO, but there is no way in CF application to mark only part
of patch-set for committer to attract the attention.
Craig has pinged me about looking at 0001, 0002, 0004 and 0006 as
those involve the TAP infrastructure.
So, for 0001:
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -93,6 +93,7 @@ use RecursiveCopy;
use Socket;
use Test::More;
use TestLib ();
+use pg_lsn qw(parse_lsn);
use Scalar::Util qw(blessed);
This depends on 0002, so the order should be reversed.
+sub lsn
+{
+ my $self = shift;
+ return $self->safe_psql('postgres', 'select case when
pg_is_in_recovery() then pg_last_xlog_replay_location() else
pg_current_xlog_insert_location() end as lsn;');
+}
The design of the test should be in charge of choosing which value it
wants to get, and the routine should just blindly do the work. More
flexibility is more useful to design tests. So it would be nice to
have one routine able to switch at will between 'flush', 'insert',
'write', 'receive' and 'replay modes to get values from the
corresponding xlog functions.
- die "error running SQL: '$$stderr'\nwhile running '@psql_params'"
+ die "error running SQL: '$$stderr'\nwhile running
'@psql_params' with sql '$sql'"
if $ret == 3;
That's separate from this patch, and definitely useful.
+ if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) {
+ die "valid modes are restart, confirmed_flush";
+ }
+ if (!defined($target_lsn)) {
+ $target_lsn = $self->lsn;
+ }
That's not really the style followed by the perl scripts present in
the code regarding the use of the brackets. Do we really need to care
about the object type checks by the way?
Regarding wait_for_catchup, there are two ways to do things. Either
query the standby like in the way 004_timeline_switch.pl does it or
the way this routine does. The way of this routine looks more
straight-forward IMO, and other tests should be refactored to use it.
In short I would make the target LSN a mandatory argument, and have
the caller send a standby's application_name instead of a PostgresNode
object, the current way to enforce the value of $standby_name being
really confusing.
+ my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1,
'replay' => 1 );
What's actually the point of 'sent'?
+ my @fields = ('plugin', 'slot_type', 'datoid', 'database',
'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn');
+ my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ',
@fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name =
'$slot_name'");
+ $result = undef if $result eq '';
+ # hash slice, see http://stackoverflow.com/a/16755894/398670 .
Couldn't this portion be made more generic? Other queries could
benefit from that by having a routine that accepts as argument an
array of column names for example.
Now looking at 0002....
One whitespace:
$ git diff HEAD~1 --check
src/test/perl/pg_lsn.pm:139: trailing whitespace.
+=cut
pg_lsn sounds like a fine name, now we are more used to camel case for
module names. And routines are written as lower case separated by an
underscore.
+++ b/src/test/perl/t/002_pg_lsn.pl
@@ -0,0 +1,68 @@
+use strict;
+use warnings;
+use Test::More tests => 42;
+use Scalar::Util qw(blessed);
Most of the tests added don't have a description. This makes things
harder to debug when tracking an issue.
It may be good to begin using this module within the other tests in
this patch as well. Now do we actually need it? Most of the existing
tests I recall rely on the backend's operators for the pg_lsn data
type, so this is actually duplicating an exiting facility. And all the
values are just passed as-is.
+++ b/src/test/perl/t/001_load.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+use Test::More tests => 5;
I can guess the meaning of this test, having a comment on top of it to
explain the purpose of the test is good practice though.
Looking at 0004...
+Disallows pg_recvlogial from internally retrying on error by passing --no-loop.
s/pg_recvlogial/pg_recvlogical
+sub pg_recvlogical_upto
+{
This looks like a good idea for your tests.
+my $endpos = $node_master->safe_psql('postgres', "SELECT location
FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL) ORDER BY
location DESC LIMIT 1;");
+diag "waiting to replay $endpos";
On the same wave as the pg_recvlogical wrapper, you may want to
consider some kind of wrapper at SQL level calling the slot functions.
And finally 0006.
+$node_standby_1->append_conf('recovery.conf', "primary_slot_name =
$slotname_1\n");
+$node_standby_1->append_conf('postgresql.conf',
"wal_receiver_status_interval = 1\n");
+$node_standby_1->append_conf('postgresql.conf', "max_replication_slots = 4\n");
No need to call multiple times this routine.
Increasing the test coverage is definitely worth it.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22 December 2016 at 13:43, Michael Paquier <michael.paquier@gmail.com> wrote:
So, for 0001: --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -93,6 +93,7 @@ use RecursiveCopy; use Socket; use Test::More; use TestLib (); +use pg_lsn qw(parse_lsn); use Scalar::Util qw(blessed); This depends on 0002, so the order should be reversed.
Will do. That was silly.
I think I should probably also move the standby tests earlier, then
add a patch to update them when the results change.
+sub lsn +{ + my $self = shift; + return $self->safe_psql('postgres', 'select case when pg_is_in_recovery() then pg_last_xlog_replay_location() else pg_current_xlog_insert_location() end as lsn;'); +} The design of the test should be in charge of choosing which value it wants to get, and the routine should just blindly do the work. More flexibility is more useful to design tests. So it would be nice to have one routine able to switch at will between 'flush', 'insert', 'write', 'receive' and 'replay modes to get values from the corresponding xlog functions.
Fair enough. I can amend that.
- die "error running SQL: '$$stderr'\nwhile running '@psql_params'" + die "error running SQL: '$$stderr'\nwhile running '@psql_params' with sql '$sql'" if $ret == 3; That's separate from this patch, and definitely useful.
Yeah. Slipped through. I don't think it really merits a separate patch
though tbh.
+ if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) { + die "valid modes are restart, confirmed_flush"; + } + if (!defined($target_lsn)) { + $target_lsn = $self->lsn; + } That's not really the style followed by the perl scripts present in the code regarding the use of the brackets. Do we really need to care about the object type checks by the way?
Brackets: will look / fix.
Type checks (not in quoted snippet above): that's a convenience to let
you pass a PostgresNode instance or a string name. Maybe there's a
more idiomatic Perl-y way to write it. My Perl is pretty dire.
Regarding wait_for_catchup, there are two ways to do things. Either
query the standby like in the way 004_timeline_switch.pl does it or
the way this routine does. The way of this routine looks more
straight-forward IMO, and other tests should be refactored to use it.
In short I would make the target LSN a mandatory argument, and have
the caller send a standby's application_name instead of a PostgresNode
object, the current way to enforce the value of $standby_name being
really confusing.
Hm, ok. I'll take a look. Making LSN mandatory so you have to pass
$self->lsn is ok with me.
+ my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1,
'replay' => 1 );
What's actually the point of 'sent'?
Pretty useless, but we expose it in Pg, so we might as well in the tests.
+ my @fields = ('plugin', 'slot_type', 'datoid', 'database', 'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn'); + my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ', @fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name'"); + $result = undef if $result eq ''; + # hash slice, see http://stackoverflow.com/a/16755894/398670 . Couldn't this portion be made more generic? Other queries could benefit from that by having a routine that accepts as argument an array of column names for example.
Yeah, probably. I'm not sure where it should live though - TestLib.pm ?
Not sure if there's an idomatic way to pass a string (in this case
queyr) in Perl with a placeholder for interpolation of values (in this
case columns). in Python you'd pass it with pre-defined
%(placeholders)s for %.
Now looking at 0002....
One whitespace:
$ git diff HEAD~1 --check
src/test/perl/pg_lsn.pm:139: trailing whitespace.
+=cut
Will fix.
pg_lsn sounds like a fine name, now we are more used to camel case for
module names. And routines are written as lower case separated by an
underscore.
Unsure what the intent of this is.
+++ b/src/test/perl/t/002_pg_lsn.pl @@ -0,0 +1,68 @@ +use strict; +use warnings; +use Test::More tests => 42; +use Scalar::Util qw(blessed); Most of the tests added don't have a description. This makes things harder to debug when tracking an issue.It may be good to begin using this module within the other tests in
this patch as well. Now do we actually need it? Most of the existing
tests I recall rely on the backend's operators for the pg_lsn data
type, so this is actually duplicating an exiting facility. And all the
values are just passed as-is.
I added it mainly for ordered tests of whether some expected lsn had
passed/increased. But maybe it makes sense to just call into the
server and let it evaluate such tests.
+++ b/src/test/perl/t/001_load.pl @@ -0,0 +1,9 @@ +use strict; +use warnings; +use Test::More tests => 5; I can guess the meaning of this test, having a comment on top of it to explain the purpose of the test is good practice though.
Will.
Looking at 0004...
+Disallows pg_recvlogial from internally retrying on error by passing --no-loop.
s/pg_recvlogial/pg_recvlogical
Thanks.
+sub pg_recvlogical_upto +{ This looks like a good idea for your tests.
Yeah, and likely others too as we start doing more with logical
replication in future.
+my $endpos = $node_master->safe_psql('postgres', "SELECT location FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL) ORDER BY location DESC LIMIT 1;"); +diag "waiting to replay $endpos"; On the same wave as the pg_recvlogical wrapper, you may want to consider some kind of wrapper at SQL level calling the slot functions.
I'd really rather beg off that until needed later. The SQL functions
are simple to invoke from PostgresNode::psql in the mean time; not so
much so with pg_recvlogical.
And finally 0006. +$node_standby_1->append_conf('recovery.conf', "primary_slot_name = $slotname_1\n"); +$node_standby_1->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n"); +$node_standby_1->append_conf('postgresql.conf', "max_replication_slots = 4\n"); No need to call multiple times this routine.Increasing the test coverage is definitely worth it.
Thanks.
I'll follow up with amendments. I've also implemented Petr's
suggestion to allow explicit omission of a snapshot on slot creation.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22 December 2016 at 14:21, Craig Ringer <craig@2ndquadrant.com> wrote:
changes-in-0001-v2.diff shows the changes to PostgresNode.pm per
Michael's comments, and applies on top of 0001.
I also attach a patch to add a new CREATE_REPLICATION_SLOT option per
Petr's suggestion, so you can request a slot be created
WITHOUT_SNAPSHOT. This replaces the patch series's behaviour of
silently suppressing snapshot export when a slot was created on a
replica. It'll conflict (easily resolved) if applied on top of the
current series.
I have more to do before re-posting the full series, so waiting on
author at this point. The PostgresNode changes likely break later
tests, I'm just posting them so there's some progress here and so I
don't forget over the next few days' distraction.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services