Records count mismatch with logical replication

Started by Durgamahesh Manneabout 1 year ago14 messagesgeneral
Jump to latest
#1Durgamahesh Manne
maheshpostgres9@gmail.com

Hi Team,

I have publication and subscription servers .So seems data replication
running with minimal lag but records count mismatch with more than 10
thousand records between source and destination tables

Could you please help in resolving this issue?

Regards,
Durga Mahesh

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Durgamahesh Manne (#1)
Re: Records count mismatch with logical replication

On 1/21/25 04:08, Durgamahesh Manne wrote:

Hi  Team,

I have publication and subscription servers .So seems data replication
running with minimal lag but records count mismatch with more than 10
thousand records between source and destination tables

Could you please help in resolving this issue?

Not without a good deal more information:

1) Postgres version on either side of the replication.

2) The replication configuration.

3) Define how lag is being calculated and what 'minimal' is.

4) Define how the record counts are being derived.

5) The network distance between the servers.

Regards,
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Durgamahesh Manne
maheshpostgres9@gmail.com
In reply to: Adrian Klaver (#2)
Re: Records count mismatch with logical replication

On Tue, Jan 21, 2025 at 9:24 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 1/21/25 04:08, Durgamahesh Manne wrote:

Hi Team,

I have publication and subscription servers .So seems data replication
running with minimal lag but records count mismatch with more than 10
thousand records between source and destination tables

Could you please help in resolving this issue?

Not without a good deal more information:

1) Postgres version on either side of the replication.

2) The replication configuration.

3) Define how lag is being calculated and what 'minimal' is.

4) Define how the record counts are being derived.

5) The network distance between the servers.

Regards,
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi Adrian Klaver

Really Thanks for your quick response

This happened during repack lag went to more than 350Gb then gradually
decreased to minimal lag after running pg_repack

1) Postgres version on either side of the replication.

Source(publication) :16.4
Destination(subscription) : 14.11

2) The replication configuration.
OLAP workload archiving (32GB 8 Vcpus)
Source : wal_level = logical
wal_sender_timeout = 30s
max_wal_senders = 40
max_replication_slots = 20
max_logical_replication_workers = 4
wal_buffers = 64MB
commit_delay = 2000
commi_siblings = 12
wal_writer_delay = 300
wal_writer_flush_after = 1MB
bgwriter_delay = 20
min_wal_size = 8GB
max_wal_size = 32Gb

Destination : 128GB 32 vcpus
wal_level = logical
wal_receiver_timeout = 30s
max_wal_senders = 40
max_replication_slots = 60
max_logical_replication_workers = 23
wal_buffers = 64MB
commit_delay = default
commi_siblings = default
wal_writer_delay = default
wal_writer_flush_after = default
bgwriter_delay = 20
min_wal_size = 8GB
max_wal_size = 32Gb

3) Define how lag is being calculated and what 'minimal' is.

postgres=> select
slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as bytes_behind,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as
behind_size, active from pg_replication_slots;
slot_name | bytes_behind | behind_size | active
-------------+--------------+-------------+--------
cls_eva_msa | 22906216 | 22 MB | t

4) Define how the record counts are being derived.
Source : archiving=> select count(*) from archiving.events_archive ;
count
---------
1262908
(1 row)

Destination : archiving=> select count(*) from archiving.events_archive ;
count
---------
1252062
(1 row)

5) The network distance between the servers.

Both are under same vpc security groups

Regards
Durga Mahesh

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Durgamahesh Manne (#3)
Re: Records count mismatch with logical replication

On 1/21/25 09:38, Durgamahesh Manne wrote:

Hi Adrian Klaver

Really Thanks for your quick response

This happened during repack lag went to more than 350Gb then gradually
decreased to minimal lag after running pg_repack

I don't use pg_repack so I don't know what effect it would have on the
process.

3) Define how lag is being calculated and what 'minimal' is.

postgres=> select
slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as bytes_behind,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as
behind_size, active from pg_replication_slots;
  slot_name  | bytes_behind | behind_size | active
-------------+--------------+-------------+--------
 cls_eva_msa |     22906216 | 22 MB       | t

4) Define how the record counts are being derived.
  Source : archiving=> select count(*) from archiving.events_archive ;
  count
---------
 1262908
(1 row)

Destination : archiving=> select count(*) from archiving.events_archive ;
  count
---------
 1252062
(1 row)

22,906,216 bytes/10,846 rows works out to 2112 bytes per row.

Is that a reasonable per row estimate?

5) The network distance between the servers.

 Both are under same vpc security groups

Regards
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Durgamahesh Manne
maheshpostgres9@gmail.com
In reply to: Adrian Klaver (#4)
Re: Records count mismatch with logical replication

On Tue, Jan 21, 2025 at 11:26 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 1/21/25 09:38, Durgamahesh Manne wrote:

Hi Adrian Klaver

Really Thanks for your quick response

This happened during repack lag went to more than 350Gb then gradually
decreased to minimal lag after running pg_repack

I don't use pg_repack so I don't know what effect it would have on the
process.

3) Define how lag is being calculated and what 'minimal' is.

postgres=> select
slot_name,pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn) as

bytes_behind,

pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as
behind_size, active from pg_replication_slots;
slot_name | bytes_behind | behind_size | active
-------------+--------------+-------------+--------
cls_eva_msa | 22906216 | 22 MB | t

4) Define how the record counts are being derived.
Source : archiving=> select count(*) from archiving.events_archive ;
count
---------
1262908
(1 row)

Destination : archiving=> select count(*) from archiving.events_archive ;
count
---------
1252062
(1 row)

22,906,216 bytes/10,846 rows works out to 2112 bytes per row.

Is that a reasonable per row estimate?

5) The network distance between the servers.

Both are under same vpc security groups

Regards
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi Adrian Klaver

22,906,216 bytes/10,846 rows works out to 2112 bytes per row.

Is that a reasonable per row estimate?

Yes sometimes would be vary

Regards,
Durga Mahesh

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Durgamahesh Manne (#5)
Re: Records count mismatch with logical replication

On 1/21/25 10:06 AM, Durgamahesh Manne wrote:

Hi Adrian Klaver

22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.

Is that a reasonable per row estimate?

Yes  sometimes would be vary

If I am following the lag went from 350GB behind to 22MB.

Is the issue that the lag has stalled at 22MB?

Regards,
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

#7Durgamahesh Manne
maheshpostgres9@gmail.com
In reply to: Adrian Klaver (#6)
Re: Records count mismatch with logical replication

On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com>
wrote:

On 1/21/25 10:06 AM, Durgamahesh Manne wrote:

Hi Adrian Klaver

22,906,216 bytes/10,846 rows works out to 2112 bytes per row.

Is that a reasonable per row estimate?

Yes sometimes would be vary

If I am following the lag went from 350GB behind to 22MB.

Is the issue that the lag has stalled at 22MB?

Regards,
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi Adrian Klaver

Is the issue that the lag has stalled at 22MB?

Depends on load of source
The lag would be either decrease or increase in Kb 's and Mb's (not Gb's)
It s not constant as Data being replicated to target

But records count varies with difference of more than 10 thousand

How to mitigate this issue in simplest way ?

Regards
Durga Mahesh

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Durgamahesh Manne (#7)
Re: Records count mismatch with logical replication

On 1/21/25 11:40, Durgamahesh Manne wrote:

On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 1/21/25 10:06 AM, Durgamahesh Manne wrote:

Hi Adrian Klaver

22,906,216 bytes/10,846 rows  works out to 2112 bytes per row.

Is that a reasonable per row estimate?

Yes  sometimes would be vary

If I am following the lag went from 350GB behind to 22MB.

Is the issue that the lag has stalled at 22MB?

Regards,
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

Hi Adrian Klaver

Is the issue that the lag has stalled at 22MB?

Depends on load of source
The lag would be either decrease or increase in Kb 's and Mb's (not Gb's)
It s not constant as Data being replicated to target

Previously you stated:

"Both are under same vpc security groups"

Does this refer to AWS?

If so per:

https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html

"The following diagram shows an example VPC. The VPC has one subnet in
each of the Availability Zones in the Region, EC2 instances in each
subnet, and an internet gateway to allow communication between the
resources in your VPC and the internet."

So where are the two Postgres instances physically located relative to
each other?

But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

How to mitigate this issue in simplest way ?

Until it is determined what is causing the lag there is no way to deal
with it.

Regards
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

#9Durgamahesh Manne
maheshpostgres9@gmail.com
In reply to: Adrian Klaver (#8)
Re: Records count mismatch with logical replication

On Wed, 22 Jan, 2025, 03:11 Adrian Klaver, <adrian.klaver@aklaver.com>
wrote:

On 1/21/25 11:40, Durgamahesh Manne wrote:

On Wed, 22 Jan, 2025, 00:22 Adrian Klaver, <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 1/21/25 10:06 AM, Durgamahesh Manne wrote:

Hi Adrian Klaver

22,906,216 bytes/10,846 rows works out to 2112 bytes per row.

Is that a reasonable per row estimate?

Yes sometimes would be vary

If I am following the lag went from 350GB behind to 22MB.

Is the issue that the lag has stalled at 22MB?

Regards,
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

Hi Adrian Klaver

Is the issue that the lag has stalled at 22MB?

Depends on load of source
The lag would be either decrease or increase in Kb 's and Mb's (not Gb's)
It s not constant as Data being replicated to target

Previously you stated:

"Both are under same vpc security groups"

Does this refer to AWS?

If so per:

https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html

"The following diagram shows an example VPC. The VPC has one subnet in
each of the Availability Zones in the Region, EC2 instances in each
subnet, and an internet gateway to allow communication between the
resources in your VPC and the internet."

So where are the two Postgres instances physically located relative to
each other?

But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

How to mitigate this issue in simplest way ?

Until it is determined what is causing the lag there is no way to deal
with it.

Regards
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi

Hi
⁹So where are the two Postgres instances physically located relative to
each other?
Both in lreland under same vpc security groups

But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

Seems everything looks good with pg replication slots

Does this pg logical slot get changes function help to pull pending changes
to subscription that can be sync with publication server for real time sync
?

Regards,
Durgamahesh

#10Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Durgamahesh Manne (#9)
Re: Records count mismatch with logical replication

On 1/22/25 18:53, Durgamahesh Manne wrote:

But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

Seems everything looks good with pg replication slots

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.

Does this pg logical slot get changes function help to pull pending
changes to subscription that can be sync with publication server for
real time sync ?

Are you referring to this?:

https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

Though I am not sure you want to do this as from above:

"When requesting synchronous replication, each commit of a write
transaction will wait until confirmation is received that the commit has
been written to the write-ahead log on disk of both the primary and
standby server. The only possibility that data can be lost is if both
the primary and the standby suffer crashes at the same time. This can
provide a much higher level of durability, though only if the sysadmin
is cautious about the placement and management of the two servers.
Waiting for confirmation increases the user's confidence that the
changes will not be lost in the event of server crashes but it also
necessarily increases the response time for the requesting transaction.
The minimum wait time is the round-trip time between primary and standby."

If you are not referring to above then you will need to explain further.

Regards,
Durgamahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

#11Durgamahesh Manne
maheshpostgres9@gmail.com
In reply to: Adrian Klaver (#10)
Re: Records count mismatch with logical replication

On Thu, Jan 23, 2025 at 10:08 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 1/22/25 18:53, Durgamahesh Manne wrote:

But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

Seems everything looks good with pg replication slots

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.

Does this pg logical slot get changes function help to pull pending
changes to subscription that can be sync with publication server for
real time sync ?

Are you referring to this?:

https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

Though I am not sure you want to do this as from above:

"When requesting synchronous replication, each commit of a write
transaction will wait until confirmation is received that the commit has
been written to the write-ahead log on disk of both the primary and
standby server. The only possibility that data can be lost is if both
the primary and the standby suffer crashes at the same time. This can
provide a much higher level of durability, though only if the sysadmin
is cautious about the placement and management of the two servers.
Waiting for confirmation increases the user's confidence that the
changes will not be lost in the event of server crashes but it also
necessarily increases the response time for the requesting transaction.
The minimum wait time is the round-trip time between primary and standby."

If you are not referring to above then you will need to explain further.

Regards,
Durgamahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi

Source Publication Side:
archiving=> select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name | cls_eva_msa
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3237
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
-[ RECORD 2 ]-------+--------------
slot_name | cle_clm_mka
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3501
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
archiving=> select * from pg_stat_replication;
client_hostname |
client_port | 52506
backend_start | 2025-01-23 16:58:04.697304+00
backend_xmin |
state | streaming
sent_lsn | 16C7/BDE4BB48
write_lsn | 16C7/BDE4BB48
flush_lsn | 16C7/BDE4BB48
replay_lsn | 16C7/BDE4BB48
write_lag | 00:00:00.002271
flush_lag | 00:00:00.002271
replay_lag | 00:00:00.002271
sync_priority | 0
sync_state | async
reply_time | 2025-01-23 17:34:39.901979+00
-[ RECORD 2 ]----+------------------------------
pid | 3501
usesysid | 14604130
usename | archiving
application_name | cle_clm_mka
client_addr | 10.80.0.168
client_hostname |
client_port | 55412
backend_start | 2025-01-22 09:31:11.83963+00
backend_xmin |
state | streaming
sent_lsn | 16C7/BDE4BB48
write_lsn | 16C7/BDE4BB48
flush_lsn | 16C7/BDE4BB48
replay_lsn | 16C7/BDE4BB48
write_lag | 00:00:00.001642
flush_lag | 00:00:00.023143
replay_lag | 00:00:00.001642
sync_priority | 0
sync_state | async
reply_time | 2025-01-23 17:34:39.903052+00

Subscription Side : archiving=> select * from pg_stat_subscription where
subname = 'cls_eva_msa';
-[ RECORD 1 ]---------+------------------------------
subid | 1936652827
subname | cls_eva_msa
pid | 18746
relid |
received_lsn | 16C7/FB48DFE0
last_msg_send_time | 2025-01-23 17:41:11.924562+00
last_msg_receipt_time | 2025-01-23 17:41:11.933344+00
latest_end_lsn | 16C7/FB48DFE0
latest_end_time | 2025-01-23 17:41:11.924562+00

archiving=> select * from pg_stat_subscription where subname =
'cle_clm_mka';
-[ RECORD 1 ]---------+------------------------------
subid | 1892055116
subname | cle_clm_mka
pid | 507
relid |
received_lsn | 16C7/FB8CDF68
last_msg_send_time | 2025-01-23 17:41:17.375879+00
last_msg_receipt_time | 2025-01-23 17:41:17.378932+00
latest_end_lsn | 16C7/FB8CDF68
latest_end_time | 2025-01-23 17:41:17.375879+00

If you need to see more stats based on your info will give you
Your response in this regard is valuable

Are you referring to this?:

https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

No.I am using logical replication in asynchronous mode

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.
Correct

Regards
Durga Mahesh

#12Durgamahesh Manne
maheshpostgres9@gmail.com
In reply to: Durgamahesh Manne (#11)
Re: Records count mismatch with logical replication

On Thu, Jan 23, 2025 at 11:24 PM Durgamahesh Manne <
maheshpostgres9@gmail.com> wrote:

On Thu, Jan 23, 2025 at 10:08 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 1/22/25 18:53, Durgamahesh Manne wrote:

But records count varies with difference of more than 10 thousand

Have you looked at the I/0 statistics between the Postgres instances?

Seems everything looks good with pg replication slots

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.

Does this pg logical slot get changes function help to pull pending
changes to subscription that can be sync with publication server for
real time sync ?

Are you referring to this?:

https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

Though I am not sure you want to do this as from above:

"When requesting synchronous replication, each commit of a write
transaction will wait until confirmation is received that the commit has
been written to the write-ahead log on disk of both the primary and
standby server. The only possibility that data can be lost is if both
the primary and the standby suffer crashes at the same time. This can
provide a much higher level of durability, though only if the sysadmin
is cautious about the placement and management of the two servers.
Waiting for confirmation increases the user's confidence that the
changes will not be lost in the event of server crashes but it also
necessarily increases the response time for the requesting transaction.
The minimum wait time is the round-trip time between primary and standby."

If you are not referring to above then you will need to explain further.

Regards,
Durgamahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi

Source Publication Side:
archiving=> select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name | cls_eva_msa
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3237
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
-[ RECORD 2 ]-------+--------------
slot_name | cle_clm_mka
plugin | pgoutput
slot_type | logical
datoid | 16601
database | archiving
temporary | f
active | t
active_pid | 3501
xmin |
catalog_xmin | 2935229621
restart_lsn | 16C8/40CEC600
confirmed_flush_lsn | 16C8/440FFF50
wal_status | reserved
safe_wal_size |
two_phase | f
conflicting | f
archiving=> select * from pg_stat_replication;
client_hostname |
client_port | 52506
backend_start | 2025-01-23 16:58:04.697304+00
backend_xmin |
state | streaming
sent_lsn | 16C7/BDE4BB48
write_lsn | 16C7/BDE4BB48
flush_lsn | 16C7/BDE4BB48
replay_lsn | 16C7/BDE4BB48
write_lag | 00:00:00.002271
flush_lag | 00:00:00.002271
replay_lag | 00:00:00.002271
sync_priority | 0
sync_state | async
reply_time | 2025-01-23 17:34:39.901979+00
-[ RECORD 2 ]----+------------------------------
pid | 3501
usesysid | 14604130
usename | archiving
application_name | cle_clm_mka
client_addr | 10.80.0.168
client_hostname |
client_port | 55412
backend_start | 2025-01-22 09:31:11.83963+00
backend_xmin |
state | streaming
sent_lsn | 16C7/BDE4BB48
write_lsn | 16C7/BDE4BB48
flush_lsn | 16C7/BDE4BB48
replay_lsn | 16C7/BDE4BB48
write_lag | 00:00:00.001642
flush_lag | 00:00:00.023143
replay_lag | 00:00:00.001642
sync_priority | 0
sync_state | async
reply_time | 2025-01-23 17:34:39.903052+00

Subscription Side : archiving=> select * from pg_stat_subscription where
subname = 'cls_eva_msa';
-[ RECORD 1 ]---------+------------------------------
subid | 1936652827
subname | cls_eva_msa
pid | 18746
relid |
received_lsn | 16C7/FB48DFE0
last_msg_send_time | 2025-01-23 17:41:11.924562+00
last_msg_receipt_time | 2025-01-23 17:41:11.933344+00
latest_end_lsn | 16C7/FB48DFE0
latest_end_time | 2025-01-23 17:41:11.924562+00

archiving=> select * from pg_stat_subscription where subname =
'cle_clm_mka';
-[ RECORD 1 ]---------+------------------------------
subid | 1892055116
subname | cle_clm_mka
pid | 507
relid |
received_lsn | 16C7/FB8CDF68
last_msg_send_time | 2025-01-23 17:41:17.375879+00
last_msg_receipt_time | 2025-01-23 17:41:17.378932+00
latest_end_lsn | 16C7/FB8CDF68
latest_end_time | 2025-01-23 17:41:17.375879+00

If you need to see more stats based on your info will give you
Your response in this regard is valuable

Are you referring to this?:

https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION

No.I am using logical replication in asynchronous mode

Except the subscriber is lagging behind the publisher.

'... everything looks good' is an opinion not actual data.
Correct

Regards
Durga Mahesh

Hi Adrian Klaver

At subscription side
getting this error repeatedly So Do we have any solution without resync
full data again from scratch

- 2025-01-23 18:11:46 UTC::@:[507]:DEBUG: logical replication did not
find row to be updated in replication target relation's partition
"markets_archive_p20250118"

Regards,
Durga Mahesh

#13Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Durgamahesh Manne (#11)
Re: Records count mismatch with logical replication

On 1/23/25 09:54, Durgamahesh Manne wrote:

See comments in line below.

Source Publication Side:

archiving=> select * from pg_stat_replication;

There is missing information here.

Am I right in assuming this is for slot cls_eva_msa?

And that it going to same client_addr 10.80.0.168?

client_hostname  |
client_port      | 52506
backend_start    | 2025-01-23 16:58:04.697304+00
backend_xmin     |
state            | streaming
sent_lsn         | 16C7/BDE4BB48
write_lsn        | 16C7/BDE4BB48
flush_lsn        | 16C7/BDE4BB48
replay_lsn       | 16C7/BDE4BB48
write_lag        | 00:00:00.002271
flush_lag        | 00:00:00.002271
replay_lag       | 00:00:00.002271
sync_priority    | 0
sync_state       | async
reply_time       | 2025-01-23 17:34:39.901979+00
-[ RECORD 2 ]----+------------------------------
pid              | 3501
usesysid         | 14604130
usename          | archiving
application_name | cle_clm_mka
client_addr      | 10.80.0.168
client_hostname  |
client_port      | 55412
backend_start    | 2025-01-22 09:31:11.83963+00
backend_xmin     |
state            | streaming
sent_lsn         | 16C7/BDE4BB48
write_lsn        | 16C7/BDE4BB48
flush_lsn        | 16C7/BDE4BB48
replay_lsn       | 16C7/BDE4BB48
write_lag        | 00:00:00.001642
flush_lag        | 00:00:00.023143
replay_lag       | 00:00:00.001642
sync_priority    | 0
sync_state       | async
reply_time       | 2025-01-23 17:34:39.903052+00

The lag times are minimal.

Where the queries done below done at later time then those above?

Subscription Side : archiving=> select * from pg_stat_subscription where
subname = 'cls_eva_msa';
-[ RECORD 1 ]---------+------------------------------
subid                 | 1936652827
subname               | cls_eva_msa
pid                   | 18746
relid                 |
received_lsn          | 16C7/FB48DFE0
last_msg_send_time    | 2025-01-23 17:41:11.924562+00
last_msg_receipt_time | 2025-01-23 17:41:11.933344+00
latest_end_lsn        | 16C7/FB48DFE0
latest_end_time       | 2025-01-23 17:41:11.924562+00

archiving=> select * from pg_stat_subscription where subname =
'cle_clm_mka';
-[ RECORD 1 ]---------+------------------------------
subid                 | 1892055116
subname               | cle_clm_mka
pid                   | 507
relid                 |
received_lsn          | 16C7/FB8CDF68
last_msg_send_time    | 2025-01-23 17:41:17.375879+00
last_msg_receipt_time | 2025-01-23 17:41:17.378932+00
latest_end_lsn        | 16C7/FB8CDF68
latest_end_time       | 2025-01-23 17:41:17.375879+00

'... everything looks good' is an opinion not actual data.
Correct

So what does the AWS dashboard show for the I/0 between the servers?

Regards
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com

#14Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Durgamahesh Manne (#12)
Re: Records count mismatch with logical replication

On 1/23/25 10:21, Durgamahesh Manne wrote:

At subscription side
getting this error repeatedly  So Do we have any solution without resync
full data again from scratch

* |2025-01-23 18:11:46 UTC::@:[507]:DEBUG: logical replication did not
find row to be updated in replication target relation's partition
"markets_archive_p20250118"|

That looks like a different issue.

On the subscriber side is someone/something changing the data?

Are there more log reports related to this at something above the DEBUG
level?

I'm thinking this is related to from here:

https://www.postgresql.org/docs/current/logical-replication-conflicts.html

"When replicating UPDATE or DELETE operations, missing data will not
produce a conflict and such operations will simply be skipped."

I don't have time at the moment to dig any further.

 Regards,
Durga Mahesh

--
Adrian Klaver
adrian.klaver@aklaver.com