Missing rows after migrating from postgres 11 to 12 with logical replication

Started by Lars Vonkover 5 years ago13 messagesgeneral
Jump to latest
#1Lars Vonk
lars.vonk@gmail.com

Hi,

We migrated from postgres 11 to 12 using logical replication (over local
network). Today we noticed that one table is missing 1252 rows after the
replication finished and we flipped to the new primary (we still have the
old master database so we can recover).

We see that these rows were inserted in the table after starting the
initial copy of the table. Most of the missing rows seem from new inserts
happening **during the initial copy** (1230) and the rest (22) from inserts
**during the period the replication ran** (7 days).

After further investigation unfortunately more tables have missing rows,
all of them are after the initial table copy phase. We took a per-table
approach for the replication, starting with creating an empty publication
and adding tables via

ALTER PUBLICATION pg12_migration ADD TABLE FOO

After that we refreshed the publication on the "new postgres 12 primary"
using

ALTER SUBSCRIPTION pg12_migration REFRESH PUBLICATION;

We only added new tables after the the initial copy of the previous was
done (the internal state was replicating).

We never stopped the subscriptions during all this and we started with a
fresh schema.

We did some sanity checks before we switched to the new master, like
comparing max(id) to see if the replica was up to date (including this
table) and counts on some smaller tables and that all checked out okay, we
never thought of missing rows somewhere in between....

So how can this happen?

Lars

#2Lars Vonk
lars.vonk@gmail.com
In reply to: Lars Vonk (#1)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

Hi,

Just wondering if someone knows how this could have happened? Did we miss
out on something when setting up the logical replication? Are there any
scenario's in which this could happen (like database restart or anything
else?).
Or should I report this a bug (although I can't image it is)?
We really would like to know how we can prevent this from happening the
next time.

We still have the old primary, and a snapshot of the current primary around
the time we flipped from the old to the new. So we could some digging into
the cause, but we don't know what to look for...

Any help or tips are appreciated.

Thanks in advance,

Lars

On Fri, Dec 18, 2020 at 4:42 PM Lars Vonk <lars.vonk@gmail.com> wrote:

Show quoted text

Hi,

We migrated from postgres 11 to 12 using logical replication (over local
network). Today we noticed that one table is missing 1252 rows after the
replication finished and we flipped to the new primary (we still have the
old master database so we can recover).

We see that these rows were inserted in the table after starting the
initial copy of the table. Most of the missing rows seem from new inserts
happening **during the initial copy** (1230) and the rest (22) from inserts
**during the period the replication ran** (7 days).

After further investigation unfortunately more tables have missing rows,
all of them are after the initial table copy phase. We took a per-table
approach for the replication, starting with creating an empty publication
and adding tables via

ALTER PUBLICATION pg12_migration ADD TABLE FOO

After that we refreshed the publication on the "new postgres 12 primary"
using

ALTER SUBSCRIPTION pg12_migration REFRESH PUBLICATION;

We only added new tables after the the initial copy of the previous was
done (the internal state was replicating).

We never stopped the subscriptions during all this and we started with a
fresh schema.

We did some sanity checks before we switched to the new master, like
comparing max(id) to see if the replica was up to date (including this
table) and counts on some smaller tables and that all checked out okay, we
never thought of missing rows somewhere in between....

So how can this happen?

Lars

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Lars Vonk (#2)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

On 12/20/20 8:33 AM, Lars Vonk wrote:

Hi,

Just wondering if someone knows how this could have happened? Did we
miss out on something when setting up the logical replication? Are there
any scenario's in which this could happen (like database restart or
anything else?).
Or should I report this a bug (although I can't image it is)?
We really would like to know how we can prevent this from happening the
next time.

We still have the old primary, and a snapshot of the current primary
around the time we flipped from the old to the new. So we could some
digging into the cause, but we don't know what to look for...

Questions I have:

1) Was there activity on the 12 instance while it was being replicated
to that could account for the missing(deleted?) rows?

2) Are the logs still available for inspection to see if there where any
errors thrown?

3) Are there FK relationships involved?

4) How did you determine the rows where missing?

Any help or tips are appreciated.

Thanks in advance,

Lars

On Fri, Dec 18, 2020 at 4:42 PM Lars Vonk <lars.vonk@gmail.com
<mailto:lars.vonk@gmail.com>> wrote:

Hi,

We migrated from postgres 11 to 12 using logical replication (over
local network). Today we noticed that one table is missing 1252 rows
after the replication finished and we flipped to the new primary (we
still have the old master database so we can recover).

We see that these rows were inserted in the table after starting the
initial copy of the table. Most of the missing rows seem from new
inserts happening **during the initial copy** (1230) and the rest
(22) from inserts **during the period the replication ran** (7 days).

After further investigation unfortunately more tables have missing
rows, all of them are after the initial table copy phase. We took a
per-table approach for the replication, starting with creating an
empty publication and adding tables via

ALTER PUBLICATION pg12_migration ADD TABLE FOO

After that we refreshed the publication on the "new postgres 12
primary" using

ALTER SUBSCRIPTION pg12_migration REFRESH PUBLICATION;

We only added new tables after the the initial copy of the previous
was done (the internal state was replicating).

We never stopped the subscriptions during all this and we started
with a fresh schema.

We did some sanity checks before we switched to the new master, like
comparing max(id) to see if the replica was up to date (including
this table) and counts on some smaller tables and that all checked
out okay, we never thought of missing rows somewhere in between....

So how can this happen?

Lars

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Lars Vonk
lars.vonk@gmail.com
In reply to: Adrian Klaver (#3)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

Hi Adrian,

Thanks for taking the time to reply!

First to answer your questions:

1) Was there activity on the 12 instance while it was being replicated

to that could account for the missing(deleted?) rows?

No there was no activity other than us doing some queries to check how far
the replication was.

2) Are the logs still available for inspection to see if there where any

errors thrown?

Yes, and we dug into those. And we also found some indications that
something went wrong.

3) Are there FK relationships involved?

No

4) How did you determine the rows where missing?

We were alerted by a bug later that day and found that some rows were
missing in the new primary. We did a compare based on primary key and found
that several tables were missing rows. Before the switch we unfortunately
only checked max(id) and did some counts on tables and those all checked
out. We didn't do a count on all tables...

So to come back at the logs:

We dug a little deeper and we did found ERROR logs around the time we ran
the initial copies. During a period of several hours that day we see a
couple of messages like:

ERROR: requested WAL segment 00000001000001F10000001D has already been

removed

This message is logged a few times and then no more (perhaps it recovered
from it?)

Other than this error there are no other errors, but unfortunately we never
checked this before migrating to the new primary...
In hindsight not very smart of course, but we never thought of this because:

a) the initial copy and the catching up all seemed fine;
b) in previous attempts when we made some errors we noticed for instance
that the WAL files on the previous primary were kept because the new
primary did not yet process them.
So we assumed when all WAL files are "gone" and the max(id) checks out the
replica is in sync and consistent with the primary;
c) our experience with hotstandby replication is that whenever a WAL
segment is missing it won't skip over it and wait until you restore it. We
assumed (and still assume) that this was also the case with logical
replication;

So the questions we now have is:

1) is it correct that a replica of a logical replication skips over missing
WAL files.
2) if so how can you know that it skipped a wal without looking at the log
files or doing a full count?
3) Is there a fail fast mechanism for logical replication (like hotstandy)
that when a WAL file is missing that it stops with further replication

Regards,
Lars

On Sun, Dec 20, 2020 at 6:58 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 12/20/20 8:33 AM, Lars Vonk wrote:

Hi,

Just wondering if someone knows how this could have happened? Did we
miss out on something when setting up the logical replication? Are there
any scenario's in which this could happen (like database restart or
anything else?).
Or should I report this a bug (although I can't image it is)?
We really would like to know how we can prevent this from happening the
next time.

We still have the old primary, and a snapshot of the current primary
around the time we flipped from the old to the new. So we could some
digging into the cause, but we don't know what to look for...

Questions I have:

1) Was there activity on the 12 instance while it was being replicated
to that could account for the missing(deleted?) rows?

2) Are the logs still available for inspection to see if there where any
errors thrown?

3) Are there FK relationships involved?

4) How did you determine the rows where missing?

Any help or tips are appreciated.

Thanks in advance,

Lars

On Fri, Dec 18, 2020 at 4:42 PM Lars Vonk <lars.vonk@gmail.com
<mailto:lars.vonk@gmail.com>> wrote:

Hi,

We migrated from postgres 11 to 12 using logical replication (over
local network). Today we noticed that one table is missing 1252 rows
after the replication finished and we flipped to the new primary (we
still have the old master database so we can recover).

We see that these rows were inserted in the table after starting the
initial copy of the table. Most of the missing rows seem from new
inserts happening **during the initial copy** (1230) and the rest
(22) from inserts **during the period the replication ran** (7 days).

After further investigation unfortunately more tables have missing
rows, all of them are after the initial table copy phase. We took a
per-table approach for the replication, starting with creating an
empty publication and adding tables via

ALTER PUBLICATION pg12_migration ADD TABLE FOO

After that we refreshed the publication on the "new postgres 12
primary" using

ALTER SUBSCRIPTION pg12_migration REFRESH PUBLICATION;

We only added new tables after the the initial copy of the previous
was done (the internal state was replicating).

We never stopped the subscriptions during all this and we started
with a fresh schema.

We did some sanity checks before we switched to the new master, like
comparing max(id) to see if the replica was up to date (including
this table) and counts on some smaller tables and that all checked
out okay, we never thought of missing rows somewhere in between....

So how can this happen?

Lars

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Lars Vonk (#4)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

On 12/21/20 12:26 PM, Lars Vonk wrote:

Hi Adrian,

Thanks for taking the time to reply!

2) Are the logs still available for inspection to see if there where
any
errors thrown?

Yes, and we dug into those. And we also found some indications that
something went wrong.

4) How did you determine the rows where missing?

We were alerted by a bug later that day and found that some rows were
missing in the new primary. We did a compare based on primary key and
found that several tables were missing rows. Before the switch we
unfortunately only checked max(id) and did some counts on tables and
those all checked out. We didn't do a count on all tables...

So to come back at the logs:

We dug a little deeper and we did found ERROR logs around the time we
ran the initial copies. During a period of several hours that day we see
a couple of messages like:

ERROR: requested WAL segment 00000001000001F10000001D has already
been removed

What was being run when the above ERROR was triggered?

Regards,
Lars

On Sun, Dec 20, 2020 at 6:58 PM Adrian Klaver <adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Lars Vonk
lars.vonk@gmail.com
In reply to: Adrian Klaver (#5)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

What was being run when the above ERROR was triggered?

The initial copy of a table. Other than that we ran select
pg_size_pretty(pg_relation_size('table_name')) to see the current size of
the table being copied to get a feeling on progress.

And whenever we added a new table to the publication we ran ALTER
SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table to the
subscription. But not around that timestamp, about 50 minutes before the
first occurence of that ERROR. (no ERRORS after prior ALTER SUBSCRIPTIONs).

But after the initial copy's ended there are more ERROR's on different WAL
segments missing. Each missing wal segment is logged as ERROR a couple of
times and then no more. After a couple of hours no errors are logged.

Lars

On Mon, Dec 21, 2020 at 10:23 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 12/21/20 12:26 PM, Lars Vonk wrote:

Hi Adrian,

Thanks for taking the time to reply!

2) Are the logs still available for inspection to see if there where
any
errors thrown?

Yes, and we dug into those. And we also found some indications that
something went wrong.

4) How did you determine the rows where missing?

We were alerted by a bug later that day and found that some rows were
missing in the new primary. We did a compare based on primary key and
found that several tables were missing rows. Before the switch we
unfortunately only checked max(id) and did some counts on tables and
those all checked out. We didn't do a count on all tables...

So to come back at the logs:

We dug a little deeper and we did found ERROR logs around the time we
ran the initial copies. During a period of several hours that day we see
a couple of messages like:

ERROR: requested WAL segment 00000001000001F10000001D has already
been removed

What was being run when the above ERROR was triggered?

Regards,
Lars

On Sun, Dec 20, 2020 at 6:58 PM Adrian Klaver <adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Lars Vonk (#6)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

On 12/21/20 2:42 PM, Lars Vonk wrote:

What was being run when the above ERROR was triggered?

The initial copy of a table. Other than that we ran select
pg_size_pretty(pg_relation_size('table_name')) to see the current size
of the table being copied to get a feeling on progress.

And whenever we added a new table to the publication we ran ALTER
SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table to the
subscription. But not around that timestamp, about 50 minutes before the
first occurence of that ERROR. (no ERRORS after prior ALTER SUBSCRIPTIONs).

But after the initial copy's ended there are more ERROR's on different
WAL segments missing. Each missing wal segment is logged as ERROR a
couple of times and then no more. After a couple of hours no errors are
logged.

Something was looking for the WAL segment.

Did you have some other replication running on the 11 instance?

In any case what was the command logged just before the ERROR.

Lars

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Lars Vonk
lars.vonk@gmail.com
In reply to: Adrian Klaver (#7)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

Did you have some other replication running on the 11 instance?

Yes the 11 instance also had another (11) replica running. (But these logs
are from the 12 instance)

The new 12 instance also had a replica running.

In any case what was the command logged just before the ERROR.

There is nothing logged.

These are the only log statements just before the error message, one second
later the ERROR is logged:

2020-12-10 13:26:43 UTC::@:[5537]:LOG: checkpoints are occurring too
frequently (20 seconds apart)
2020-12-10 13:26:43 UTC::@:[5537]:HINT: Consider increasing the
configuration parameter "max_wal_size".
2020-12-10 13:26:43 UTC::@:[5537]:LOG: checkpoint starting: wal

Lars

On Mon, Dec 21, 2020 at 11:51 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 12/21/20 2:42 PM, Lars Vonk wrote:

What was being run when the above ERROR was triggered?

The initial copy of a table. Other than that we ran select
pg_size_pretty(pg_relation_size('table_name')) to see the current size
of the table being copied to get a feeling on progress.

And whenever we added a new table to the publication we ran ALTER
SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table to the
subscription. But not around that timestamp, about 50 minutes before the
first occurence of that ERROR. (no ERRORS after prior ALTER

SUBSCRIPTIONs).

But after the initial copy's ended there are more ERROR's on different
WAL segments missing. Each missing wal segment is logged as ERROR a
couple of times and then no more. After a couple of hours no errors are
logged.

Something was looking for the WAL segment.

Did you have some other replication running on the 11 instance?

In any case what was the command logged just before the ERROR.

Lars

--
Adrian Klaver
adrian.klaver@aklaver.com

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Lars Vonk (#8)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

On 12/22/20 9:10 AM, Lars Vonk wrote:

Did you have some other replication running on the 11 instance?

Yes the 11 instance also had another (11) replica running. (But these
logs are from the 12 instance)

The 11 instance had the data that went missing in the 12 instance, so
what shows up in logs for the 11 instance during this period that is
relevant?

The new 12 instance also had a replica running.

So the setup was?:

1) 11 primary --> 11 standby via what replication logical or binary?
| --> 12 new instance via logical

2) 12(new) primary --> 12(?) standby via what replication logical or binary?

In any case what was the command logged just before the ERROR.

There is nothing logged.

These are the only log statements just before the error message, one
second later the ERROR is logged:

2020-12-10 13:26:43 UTC::@:[5537]:LOG:  checkpoints are occurring too
frequently (20 seconds apart)
2020-12-10 13:26:43 UTC::@:[5537]:HINT:  Consider increasing the
configuration parameter "max_wal_size".
2020-12-10 13:26:43 UTC::@:[5537]:LOG:  checkpoint starting: wal

Lars

On Mon, Dec 21, 2020 at 11:51 PM Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

On 12/21/20 2:42 PM, Lars Vonk wrote:

     What was being run when the above ERROR was triggered?

The initial copy of a table. Other than that we ran select
pg_size_pretty(pg_relation_size('table_name')) to see the current

size

of the table being copied to get a feeling on progress.

And whenever we added a new table to the publication we ran ALTER
SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table

to the

subscription. But not around that timestamp, about 50 minutes

before the

first occurence of that ERROR. (no ERRORS after prior ALTER

SUBSCRIPTIONs).

But after the initial copy's ended there are more ERROR's on

different

WAL segments missing. Each missing wal segment is logged as ERROR a
couple of times and then no more. After a couple of hours no

errors are

logged.

Something was looking for the WAL segment.

Did you have some other replication running on the 11 instance?

In any case what was the command logged just before the ERROR.

Lars

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

#10Lars Vonk
lars.vonk@gmail.com
In reply to: Adrian Klaver (#9)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

The full setup is:

**Before:
11 primary -> 11 hotstandby binary

**During migration
11 primary -> 11 hotstandby binary
| -> 12 new instance via logical
|-> 12 new replica via binary

**After migration
12 primary
|-> 12 replica via binary

On Tue, Dec 22, 2020 at 7:16 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 12/22/20 9:10 AM, Lars Vonk wrote:

Did you have some other replication running on the 11 instance?

Yes the 11 instance also had another (11) replica running. (But these
logs are from the 12 instance)

The 11 instance had the data that went missing in the 12 instance, so
what shows up in logs for the 11 instance during this period that is
relevant?

The new 12 instance also had a replica running.

So the setup was?:

1) 11 primary --> 11 standby via what replication logical or binary?
| --> 12 new instance via logical

2) 12(new) primary --> 12(?) standby via what replication logical or
binary?

In any case what was the command logged just before the ERROR.

There is nothing logged.

These are the only log statements just before the error message, one
second later the ERROR is logged:

2020-12-10 13:26:43 UTC::@:[5537]:LOG: checkpoints are occurring too
frequently (20 seconds apart)
2020-12-10 13:26:43 UTC::@:[5537]:HINT: Consider increasing the
configuration parameter "max_wal_size".
2020-12-10 13:26:43 UTC::@:[5537]:LOG: checkpoint starting: wal

Lars

On Mon, Dec 21, 2020 at 11:51 PM Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

On 12/21/20 2:42 PM, Lars Vonk wrote:

What was being run when the above ERROR was triggered?

The initial copy of a table. Other than that we ran select
pg_size_pretty(pg_relation_size('table_name')) to see the current

size

of the table being copied to get a feeling on progress.

And whenever we added a new table to the publication we ran ALTER
SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table

to the

subscription. But not around that timestamp, about 50 minutes

before the

first occurence of that ERROR. (no ERRORS after prior ALTER

SUBSCRIPTIONs).

But after the initial copy's ended there are more ERROR's on

different

WAL segments missing. Each missing wal segment is logged as ERROR

a

couple of times and then no more. After a couple of hours no

errors are

logged.

Something was looking for the WAL segment.

Did you have some other replication running on the 11 instance?

In any case what was the command logged just before the ERROR.

Lars

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

#11Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Lars Vonk (#10)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

On 12/23/20 1:40 AM, Lars Vonk wrote:

The full setup is:

**Before:
11 primary -> 11 hotstandby binary

**During migration
11 primary -> 11 hotstandby binary
  | -> 12 new instance via logical
          |-> 12 new replica via binary

**After migration
12 primary
|-> 12 replica via binary

There are several moving parts here. I have to believe the problem is
related. Just not sure how to figure it out after the fact. The best I
can come up with is retry the process and monitor closely in real or
near real time to see if you can catch the issue. Another option is to
reduce the parts count by not running the binary 12 --> 12 replication
at the same time you are doing the 11 --> 12 logical replication.

--
Adrian Klaver
adrian.klaver@aklaver.com

#12Lars Vonk
lars.vonk@gmail.com
In reply to: Adrian Klaver (#11)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

Well thanks for taking the time anyway. Indeed next time reduce the parts
is a good idea.

I would still expect though that if a logical replica misses a WAL it would
stop replicating (and / or report an inconsistent state). I know this is
the case with binary replication (it stops replication).
As a last question, do you know if this is also the case with logical
replication as well, or is what happened here an "expected outcome" when a
logical replica misses a WAL?

Lars

On Thu, Dec 24, 2020 at 5:52 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 12/23/20 1:40 AM, Lars Vonk wrote:

The full setup is:

**Before:
11 primary -> 11 hotstandby binary

**During migration
11 primary -> 11 hotstandby binary
| -> 12 new instance via logical
|-> 12 new replica via binary

**After migration
12 primary
|-> 12 replica via binary

There are several moving parts here. I have to believe the problem is
related. Just not sure how to figure it out after the fact. The best I
can come up with is retry the process and monitor closely in real or
near real time to see if you can catch the issue. Another option is to
reduce the parts count by not running the binary 12 --> 12 replication
at the same time you are doing the 11 --> 12 logical replication.

--
Adrian Klaver
adrian.klaver@aklaver.com

#13Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Lars Vonk (#12)
Re: Missing rows after migrating from postgres 11 to 12 with logical replication

On 12/24/20 12:24 PM, Lars Vonk wrote:

Well thanks for taking the time anyway. Indeed next time reduce the
parts is a good idea.

I would still expect though that if a logical replica misses a WAL it
would stop replicating (and / or report an inconsistent state). I know
this is the case with binary replication (it stops replication).
As a last question, do you know if this is also the case with logical
replication as well, or is what happened here an "expected outcome" when
a logical replica misses a WAL?

It is still not clear to me what of the process was complaining about
the WAL. Without knowing that any answer as to what effect it had would
just be pulled out of thin air.

As to logical replication and WAL read this thread(I thought I
remembered a previous discussion on this, took me a bit to pull it up):

/messages/by-id/CAGvVEFvq_VM9LhYPeu+Uw__gEVvrBffGL=FO-88cZEp-35+arA@mail.gmail.com

Lars

On Thu, Dec 24, 2020 at 5:52 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 12/23/20 1:40 AM, Lars Vonk wrote:

The full setup is:

**Before:
11 primary -> 11 hotstandby binary

**During migration
11 primary -> 11 hotstandby binary
    | -> 12 new instance via logical
            |-> 12 new replica via binary

**After migration
12 primary
|-> 12 replica via binary

There are several moving parts here. I have to believe the problem is
related. Just not sure how to figure it out after the fact. The best I
can come up with is retry the process and monitor closely in real or
near real time to see if you can catch the issue. Another option is to
reduce the parts count by not running the binary 12 --> 12 replication
at the same time you are doing the 11 --> 12 logical replication.

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com