pgsql: Restore replication protocol's duplicate command tags
Restore replication protocol's duplicate command tags
I removed the duplicate command tags for START_REPLICATION inadvertently
in commit 07082b08cc5d, but the replication protocol requires them. The
fact that the replication protocol was broken was not noticed because
all our test cases use an optimized code path that exits early, failing
to verify that the behavior is correct for non-optimized cases. Put
them back.
Also document this protocol quirk.
Add a test case that shows the failure. It might still succeed even
without the patch when run on a fast enough server, but it suffices to
show the bug in enough cases that it would be noticed in buildfarm.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Henry Hinze <henry.hinze@gmail.com>
Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com>
Discussion: /messages/by-id/16643-eaadeb2a1a58d28c@postgresql.org
Branch
------
REL_13_STABLE
Details
-------
https://git.postgresql.org/pg/commitdiff/72e43fc313e93c95704c574bcf98805805668063
Modified Files
--------------
doc/src/sgml/protocol.sgml | 8 +++--
src/backend/replication/logical/worker.c | 1 -
src/backend/replication/walsender.c | 3 +-
src/test/subscription/t/100_bugs.pl | 55 +++++++++++++++++++++++++++++++-
4 files changed, 61 insertions(+), 6 deletions(-)
On 2020-Oct-14, Alvaro Herrera wrote:
Add a test case that shows the failure. It might still succeed even
without the patch when run on a fast enough server, but it suffices to
show the bug in enough cases that it would be noticed in buildfarm.
Hm, this failed on sidewinder. I think the "wait for catchup" stuff in
logical replication is broken; I added a wait for sync workers to go
away after the normal wait_for_catchup, but evidently it is not
sufficient even with that.
On Thu, Oct 15, 2020 at 6:07 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2020-Oct-14, Alvaro Herrera wrote:
Add a test case that shows the failure. It might still succeed even
without the patch when run on a fast enough server, but it suffices to
show the bug in enough cases that it would be noticed in buildfarm.Hm, this failed on sidewinder.
Now, curculio [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2020-10-15%2005%3A30%3A43 also seems to be failing for the same reason.
I think the "wait for catchup" stuff in
logical replication is broken; I added a wait for sync workers to go
away after the normal wait_for_catchup, but evidently it is not
sufficient even with that.
For the initial table sync, we use below in some of the tests (see
001_rep_changes):
# Also wait for initial table sync to finish
my $synced_query =
"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
$node_subscriber->poll_query_until('postgres', $synced_query)
or die "Timed out while waiting for subscriber to synchronize data";
Is it not possible to use the same thing in this test as well?
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2020-10-15%2005%3A30%3A43
--
With Regards,
Amit Kapila.
On 2020-Oct-15, Amit Kapila wrote:
On Thu, Oct 15, 2020 at 6:07 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2020-Oct-14, Alvaro Herrera wrote:
Hm, this failed on sidewinder.
Now, curculio [1] also seems to be failing for the same reason.
Yeah ... and now they're both green. Anyway clearly the test is
unstable.
For the initial table sync, we use below in some of the tests (see
001_rep_changes):
Ah yeah, thanks, this should work. Pushed, we'll see how it goes.
Thanks,