Logical Replication, CPU load and Locking contention

Started by Martín Fernándezabout 5 years ago6 messagesgeneral

fmartin91@gmail.com

about 5 years ago

Hello,

I’m troubleshooting a problem at my company with a pg 12 cluster that we run.

We are using Amazon DMS to replicate data from our database into S3 buckets. DMS replicates data by using logical replication slots.

After introducing DMS in our environment, we have seen an increase in CPU load of 20 points at business hours (from ~60% to ~80%).

The other thing that we have identified is that AccessShareLocks increase considerably when DMS running.

Based on this information, I’m trying to understand if this is something expected when running logical replication or not. We’ve been running physical replication for several years and we haven’t seen nothing like this. It could be the case that the issue is not related at all with logical replication and is purely a DMS artifact.

Thanks before hand!

Best,
Martín

Radoslav Nedyalkov

rnedyalkov@gmail.com

about 5 years ago

In reply to: Martín Fernández (#1)

Re: Logical Replication, CPU load and Locking contention

On Wed, Mar 10, 2021 at 3:56 AM Martín Fernández <fmartin91@gmail.com>
wrote:

Hello,

I’m troubleshooting a problem at my company with a pg 12 cluster that we
run.

We are using Amazon DMS to replicate data from our database into S3
buckets. DMS replicates data by using logical replication slots.

After introducing DMS in our environment, we have seen an increase in CPU
load of 20 points at business hours (from ~60% to ~80%).

The other thing that we have identified is that AccessShareLocks increase
considerably when DMS running.

Based on this information, I’m trying to understand if this is something
expected when running logical replication or not. We’ve been running
physical replication for several years and we haven’t seen nothing like
this. It could be the case that the issue is not related at all with
logical replication and is purely a DMS artifact.

Thanks before hand!

Best,
Martín

Hi,

I would check in pg_stat_activity what those logical replication slots do.
I guess COPY.
Are you doing one shot copy ? every day ? Then copying all the tables will
lead to load increase.
How many tables at a time DMS copies? It should be configurable.
AccessShareLock is absolutely normal. You have a transaction doing SELECT
(COPY) over a table.

Physical replication is a different thing. It happens at another level.

Regards
Rado

Ron

ronljohnsonjr@gmail.com

about 5 years ago

In reply to: Radoslav Nedyalkov (#2)

Re: Logical Replication, CPU load and Locking contention

On 3/10/21 2:10 AM, Radoslav Nedyalkov wrote:

On Wed, Mar 10, 2021 at 3:56 AM Martín Fernández <fmartin91@gmail.com
<mailto:fmartin91@gmail.com>> wrote:

Hello,

I’m troubleshooting a problem at my company with a pg 12 cluster that
we run.

We are using Amazon DMS to replicate data from our database into S3
buckets. DMS replicates data by using logical replication slots.

After introducing DMS in our environment, we have seen an increase in
CPU load of 20 points at business hours (from ~60% to ~80%).

The other thing that we have identified is that AccessShareLocks
increase considerably when DMS running.

Based on this information, I’m trying to understand if this is
something expected when running logical replication or not. We’ve been
running physical replication for several years and we haven’t seen
nothing like this. It could be the case that the issue is not related
at all with logical replication and is purely a DMS artifact.

Thanks before hand!

Best,
Martín

Hi,

I would check in pg_stat_activity what those logical replication slots do.
I guess COPY.

If it's a full-load or full-load-and-cdc, then it's almost certainly a COPY.

Are you doing one shot copy ? every day ? Then copying all the tables will
lead to load increase.
How many tables at a time DMS copies? It should be configurable.

It definitely is, with the MaxFullLoadSubTasks parameter. The default is 8.

AccessShareLock is absolutely normal. You have a transaction doing SELECT
(COPY) over a table.

If DMS with Postgresql as the source is anything like when Oracle is the
source (we're testing Oracle -> RDS Postgresql) then it starts a
SERIALIZABLE transaction.

Physical replication is a different thing. It happens at another level.

Regards
Rado

--
Angular momentum makes the world go 'round.

Martín Fernández

fmartin91@gmail.com

about 5 years ago

In reply to: Ron (#3)

Re: Logical Replication, CPU load and Locking contention

On 10 Mar 2021, at 11:25, Ron <ronljohnsonjr@gmail.com> wrote:

On 3/10/21 2:10 AM, Radoslav Nedyalkov wrote:

On Wed, Mar 10, 2021 at 3:56 AM Martín Fernández <fmartin91@gmail.com <mailto:fmartin91@gmail.com>> wrote:
Hello,

I’m troubleshooting a problem at my company with a pg 12 cluster that we run.

We are using Amazon DMS to replicate data from our database into S3 buckets. DMS replicates data by using logical replication slots.

After introducing DMS in our environment, we have seen an increase in CPU load of 20 points at business hours (from ~60% to ~80%).

The other thing that we have identified is that AccessShareLocks increase considerably when DMS running.

Based on this information, I’m trying to understand if this is something expected when running logical replication or not. We’ve been running physical replication for several years and we haven’t seen nothing like this. It could be the case that the issue is not related at all with logical replication and is purely a DMS artifact.

Thanks before hand!

Best,
Martín

Hi,

I would check in pg_stat_activity what those logical replication slots do. I guess COPY.

If it's a full-load or full-load-and-cdc, then it's almost certainly a COPY.

We are doing full-load and full-load-and-cdc. At this point it’s just cdc since the full load was done month ago.

Are you doing one shot copy ? every day ? Then copying all the tables will lead to load increase.
How many tables at a time DMS copies? It should be configurable.

It definitely is, with the MaxFullLoadSubTasks parameter. The default is 8.

We are copying a lot of tables. 100+

AccessShareLock is absolutely normal. You have a transaction doing SELECT (COPY) over a table.

If DMS with Postgresql as the source is anything like when Oracle is the source (we're testing Oracle -> RDS Postgresql) then it starts a SERIALIZABLE transaction.

My guess at this point is that the CPU load increases due to the increase of AccessShareLocks, is that a fair assumption ? Have you seen a similar behavior in Oracle ?

Show quoted text

Physical replication is a different thing. It happens at another level.

Regards
Rado

--
Angular momentum makes the world go 'round.

Ron

ronljohnsonjr@gmail.com

about 5 years ago

In reply to: Martín Fernández (#4)

Re: Logical Replication, CPU load and Locking contention

On 3/10/21 11:56 AM, Martín Fernández wrote:

On 10 Mar 2021, at 11:25, Ron <ronljohnsonjr@gmail.com
<mailto:ronljohnsonjr@gmail.com>> wrote:

On 3/10/21 2:10 AM, Radoslav Nedyalkov wrote:

On Wed, Mar 10, 2021 at 3:56 AM Martín Fernández <fmartin91@gmail.com
<mailto:fmartin91@gmail.com>> wrote:

Hello,

I’m troubleshooting a problem at my company with a pg 12 cluster
that we run.

We are using Amazon DMS to replicate data from our database into S3
buckets. DMS replicates data by using logical replication slots.

After introducing DMS in our environment, we have seen an increase
in CPU load of 20 points at business hours (from ~60% to ~80%).

The other thing that we have identified is that AccessShareLocks
increase considerably when DMS running.

Based on this information, I’m trying to understand if this is
something expected when running logical replication or not. We’ve
been running physical replication for several years and we haven’t
seen nothing like this. It could be the case that the issue is not
related at all with logical replication and is purely a DMS artifact.

Thanks before hand!

Best,
Martín

Hi,

I would check in pg_stat_activity what those logical replication slots
do. I guess COPY.

If it's a full-load or full-load-and-cdc, then it's almost certainly a COPY.

We are doing full-load and full-load-and-cdc. At this point it’s just cdc
since the full load was done month ago.

Then it's "just" logical replication.

Are you doing one shot copy ? every day ? Then copying all the tables
will lead to load increase.
How many tables at a time DMS copies? It should be configurable.

It definitely is, with the MaxFullLoadSubTasks parameter. The default is 8.

We are copying a lot of tables. 100+

If right now you're just doing CDC replication, then you're *synchronizing*
lots of tables. How often are they modified?

AccessShareLock is absolutely normal. You have a transaction doing
SELECT (COPY) over a table.

If DMS with Postgresql as the source is anything like when Oracle is the
source (we're testing Oracle -> RDS Postgresql) then it starts a
SERIALIZABLE transaction.

My guess at this point is that the CPU load increases due to the increase
of AccessShareLocks, is that a fair assumption ? Have you seen a similar
behavior in Oracle ?

I don't have access to the Oracle database, and so far we're just testing on
the low-volume QA system.

Further, I don't know enough about how logical replication works to have an
opinion.

--
Angular momentum makes the world go 'round.

Martín Fernández

fmartin91@gmail.com

about 5 years ago

In reply to: Ron (#5)

Re: Logical Replication, CPU load and Locking contention

Sent from my iPhone

On Mar 10, 2021, at 15:03, Ron <ronljohnsonjr@gmail.com> wrote:
On 3/10/21 11:56 AM, Martín Fernández wrote:

On 10 Mar 2021, at 11:25, Ron <ronljohnsonjr@gmail.com> wrote:

On 3/10/21 2:10 AM, Radoslav Nedyalkov wrote:

On Wed, Mar 10, 2021 at 3:56 AM Martín Fernández <fmartin91@gmail.com> wrote:

Hello,

I’m troubleshooting a problem at my company with a pg 12 cluster that we run.

We are using Amazon DMS to replicate data from our database into S3 buckets. DMS replicates data by using logical replication slots.

After introducing DMS in our environment, we have seen an increase in CPU load of 20 points at business hours (from ~60% to ~80%).

The other thing that we have identified is that AccessShareLocks increase considerably when DMS running.

Based on this information, I’m trying to understand if this is something expected when running logical replication or not. We’ve been running physical replication for several years and we haven’t seen nothing like this. It could be the case that the issue is not related at all with logical replication and is purely a DMS artifact.

Thanks before hand!

Best,
Martín

Hi,

I would check in pg_stat_activity what those logical replication slots do. I guess COPY.

If it's a full-load or full-load-and-cdc, then it's almost certainly a COPY.

We are doing full-load and full-load-and-cdc. At this point it’s just cdc since the full load was done month ago.

Then it's "just" logical replication.

Right!

Are you doing one shot copy ? every day ? Then copying all the tables will lead to load increase.
How many tables at a time DMS copies? It should be configurable.

It definitely is, with the MaxFullLoadSubTasks parameter. The default is 8.

We are copying a lot of tables. 100+

If right now you're just doing CDC replication, then you're synchronizing lots of tables. How often are they modified?

Yes, used the word copying incorrectly. Last time we paused DMS we acumulated 180GB in the replication slot in 1 hour.

Show quoted text

AccessShareLock is absolutely normal. You have a transaction doing SELECT (COPY) over a table.

If DMS with Postgresql as the source is anything like when Oracle is the source (we're testing Oracle -> RDS Postgresql) then it starts a SERIALIZABLE transaction.

My guess at this point is that the CPU load increases due to the increase of AccessShareLocks, is that a fair assumption ? Have you seen a similar behavior in Oracle ?

I don't have access to the Oracle database, and so far we're just testing on the low-volume QA system.

Further, I don't know enough about how logical replication works to have an opinion.

--
Angular momentum makes the world go 'round.