Delay the variable initialization in get_rel_sync_entry

Started by houzj.fnst@fujitsu.comabout 4 years ago8 messages
#1houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
1 attachment(s)

Hi,

When reviewing some logical replication patches. I noticed that in
function get_rel_sync_entry() we always invoke get_rel_relispartition()
and get_rel_relkind() at the beginning which could cause unnecessary
cache access.

---
get_rel_sync_entry(PGOutputData *data, Oid relid)
{
RelationSyncEntry *entry;
bool am_partition = get_rel_relispartition(relid);
char relkind = get_rel_relkind(relid);
---

The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.

Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.

--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind

So, I think it would be better if we do the initialization only when
RelationSyncEntry in invalid.

Attach a small patch which delay the initialization.

Thoughts ?

Best regards,
Hou zj

Attachments:

0001-Delay-the-variable-initialization-in-get_rel_sync_en.patchapplication/octet-stream; name=0001-Delay-the-variable-initialization-in-get_rel_sync_en.patchDownload
From fcb74fd20cd5549f2f69770140f194e0e25c4362 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@fujitsu.com>
Date: Wed, 22 Dec 2021 14:57:32 +0800
Subject: [PATCH] Delay the variable initialization in get_rel_sync_entry

Delay the variable initialization in get_rel_sync_entry which will cause
unnecessary cache access in a hot function.
---
 src/backend/replication/pgoutput/pgoutput.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 6f6a203..a08da85 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1119,8 +1119,6 @@ static RelationSyncEntry *
 get_rel_sync_entry(PGOutputData *data, Oid relid)
 {
 	RelationSyncEntry *entry;
-	bool		am_partition = get_rel_relispartition(relid);
-	char		relkind = get_rel_relkind(relid);
 	bool		found;
 	MemoryContext oldctx;
 
@@ -1160,6 +1158,8 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 		List	   *schemaPubids = GetSchemaPublications(schemaId);
 		ListCell   *lc;
 		Oid			publish_as_relid = relid;
+		bool		am_partition = get_rel_relispartition(relid);
+		char		relkind = get_rel_relkind(relid);
 
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
-- 
2.7.2.windows.1

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: houzj.fnst@fujitsu.com (#1)
Re: Delay the variable initialization in get_rel_sync_entry

At Wed, 22 Dec 2021 13:11:38 +0000, "houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> wrote in

Hi,

When reviewing some logical replication patches. I noticed that in
function get_rel_sync_entry() we always invoke get_rel_relispartition()
and get_rel_relkind() at the beginning which could cause unnecessary
cache access.

---
get_rel_sync_entry(PGOutputData *data, Oid relid)
{
RelationSyncEntry *entry;
bool am_partition = get_rel_relispartition(relid);
char relkind = get_rel_relkind(relid);
---

The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.

Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.

--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind

So, I think it would be better if we do the initialization only when
RelationSyncEntry in invalid.

Attach a small patch which delay the initialization.

Thoughts ?

A simple benchmarking that replicates pgbench workload showed me that
the function doesn't enter the path to use the am_partition and
relkind in almost all (99.999..%) cases and I don't think it is a
special case. Thus I think we can expect that we gain about 10%
without any possibility of loss.

Addition to that, it is simply a good practice to keep variable scopes
narrow.

So +1 for this change.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Euler Taveira
euler@eulerto.com
In reply to: houzj.fnst@fujitsu.com (#1)
Re: Delay the variable initialization in get_rel_sync_entry

On Wed, Dec 22, 2021, at 10:11 AM, houzj.fnst@fujitsu.com wrote:

When reviewing some logical replication patches. I noticed that in
function get_rel_sync_entry() we always invoke get_rel_relispartition()
and get_rel_relkind() at the beginning which could cause unnecessary
cache access.

---
get_rel_sync_entry(PGOutputData *data, Oid relid)
{
RelationSyncEntry *entry;
bool am_partition = get_rel_relispartition(relid);
char relkind = get_rel_relkind(relid);
---

The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.

Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.

--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind

Good catch. WFM. Deferring variable initialization close to its first use is
good practice.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#4Michael Paquier
michael@paquier.xyz
In reply to: Euler Taveira (#3)
Re: Delay the variable initialization in get_rel_sync_entry

On Thu, Dec 23, 2021 at 12:54:41PM -0300, Euler Taveira wrote:

On Wed, Dec 22, 2021, at 10:11 AM, houzj.fnst@fujitsu.com wrote:

The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.

Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.

--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind

How does the perf balance change once you apply the patch? Do we have
anything else that stands out? Getting rid of this bottleneck is fine
by itself, but I am wondering if there are more things to worry about
or not.

Good catch. WFM. Deferring variable initialization close to its first use is
good practice.

Yeah, it is usually a good practice to have the declaration within
the code block that uses it rather than piling everything at the
beginning of the function. Being able to see that in profiles is
annoying, and the change is simple, so I'd like to backpatch it.

This is a period of vacations for a lot of people, so I'll wait until
the beginning-ish of January before doing anything.
--
Michael

#5houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Michael Paquier (#4)
1 attachment(s)
RE: Delay the variable initialization in get_rel_sync_entry

On Friday, December 24, 2021 8:13 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 23, 2021 at 12:54:41PM -0300, Euler Taveira wrote:

On Wed, Dec 22, 2021, at 10:11 AM, houzj.fnst@fujitsu.com wrote:

The extra cost could sometimes be noticeable because get_rel_sync_entry is

a

hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.

Here is the perf result for the case when inserted large amounts of data into

a

un-published table in which case the cost is noticeable.

--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind

How does the perf balance change once you apply the patch? Do we have
anything else that stands out? Getting rid of this bottleneck is fine
by itself, but I am wondering if there are more things to worry about
or not.

Thanks for the response.
Here is the perf result of pgoutput_change after applying the patch.
I didn't notice something else that stand out.

|--2.99%--pgoutput_change
|--1.80%--get_rel_sync_entry
|--1.56%--hash_search

Also attach complete profiles.

Good catch. WFM. Deferring variable initialization close to its first use is
good practice.

Yeah, it is usually a good practice to have the declaration within
the code block that uses it rather than piling everything at the
beginning of the function. Being able to see that in profiles is
annoying, and the change is simple, so I'd like to backpatch it.

+1

This is a period of vacations for a lot of people, so I'll wait until
the beginning-ish of January before doing anything.

Thanks, added it to CF.
https://commitfest.postgresql.org/36/3471/

Best regards,
Hou zj

Attachments:

perf.zipapplication/x-zip-compressed; name=perf.zipDownload
#6Michael Paquier
michael@paquier.xyz
In reply to: houzj.fnst@fujitsu.com (#5)
Re: Delay the variable initialization in get_rel_sync_entry

On Fri, Dec 24, 2021 at 01:27:26PM +0000, houzj.fnst@fujitsu.com wrote:

Here is the perf result of pgoutput_change after applying the patch.
I didn't notice something else that stand out.

|--2.99%--pgoutput_change
|--1.80%--get_rel_sync_entry
|--1.56%--hash_search

Also attach complete profiles.

Thanks. I have also done my own set of measurements, and the
difference is noticeable in the profiles I looked at. So, applied
down to 13.
--
Michael

#7houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Michael Paquier (#6)
RE: Delay the variable initialization in get_rel_sync_entry

On Wednesday, January 5, 2022 9:31 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Dec 24, 2021 at 01:27:26PM +0000, houzj.fnst@fujitsu.com wrote:

Here is the perf result of pgoutput_change after applying the patch.
I didn't notice something else that stand out.

|--2.99%--pgoutput_change
|--1.80%--get_rel_sync_entry
|--1.56%--hash_search

Also attach complete profiles.

Thanks. I have also done my own set of measurements, and the difference is
noticeable in the profiles I looked at. So, applied down to 13.

Thanks for pushing!

Best regards,
Hou zj

#8Amit Langote
amitlangote09@gmail.com
In reply to: Michael Paquier (#6)
Re: Delay the variable initialization in get_rel_sync_entry

On Wed, Jan 5, 2022 at 10:31 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Dec 24, 2021 at 01:27:26PM +0000, houzj.fnst@fujitsu.com wrote:

Here is the perf result of pgoutput_change after applying the patch.
I didn't notice something else that stand out.

|--2.99%--pgoutput_change
|--1.80%--get_rel_sync_entry
|--1.56%--hash_search

Also attach complete profiles.

Thanks. I have also done my own set of measurements, and the
difference is noticeable in the profiles I looked at. So, applied
down to 13.

Thanks. I agree the variables were being defined in the wrong place
before this patch.

--
Amit Langote
EDB: http://www.enterprisedb.com