Delay the variable initialization in get_rel_sync_entry
Hi,
When reviewing some logical replication patches. I noticed that in
function get_rel_sync_entry() we always invoke get_rel_relispartition()
and get_rel_relkind() at the beginning which could cause unnecessary
cache access.
---
get_rel_sync_entry(PGOutputData *data, Oid relid)
{
RelationSyncEntry *entry;
bool am_partition = get_rel_relispartition(relid);
char relkind = get_rel_relkind(relid);
---
The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.
Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.
--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind
So, I think it would be better if we do the initialization only when
RelationSyncEntry in invalid.
Attach a small patch which delay the initialization.
Thoughts ?
Best regards,
Hou zj
Attachments:
0001-Delay-the-variable-initialization-in-get_rel_sync_en.patchapplication/octet-stream; name=0001-Delay-the-variable-initialization-in-get_rel_sync_en.patchDownload
From fcb74fd20cd5549f2f69770140f194e0e25c4362 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@fujitsu.com>
Date: Wed, 22 Dec 2021 14:57:32 +0800
Subject: [PATCH] Delay the variable initialization in get_rel_sync_entry
Delay the variable initialization in get_rel_sync_entry which will cause
unnecessary cache access in a hot function.
---
src/backend/replication/pgoutput/pgoutput.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 6f6a203..a08da85 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1119,8 +1119,6 @@ static RelationSyncEntry *
get_rel_sync_entry(PGOutputData *data, Oid relid)
{
RelationSyncEntry *entry;
- bool am_partition = get_rel_relispartition(relid);
- char relkind = get_rel_relkind(relid);
bool found;
MemoryContext oldctx;
@@ -1160,6 +1158,8 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
List *schemaPubids = GetSchemaPublications(schemaId);
ListCell *lc;
Oid publish_as_relid = relid;
+ bool am_partition = get_rel_relispartition(relid);
+ char relkind = get_rel_relkind(relid);
/* Reload publications if needed before use. */
if (!publications_valid)
--
2.7.2.windows.1
At Wed, 22 Dec 2021 13:11:38 +0000, "houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> wrote in
Hi,
When reviewing some logical replication patches. I noticed that in
function get_rel_sync_entry() we always invoke get_rel_relispartition()
and get_rel_relkind() at the beginning which could cause unnecessary
cache access.---
get_rel_sync_entry(PGOutputData *data, Oid relid)
{
RelationSyncEntry *entry;
bool am_partition = get_rel_relispartition(relid);
char relkind = get_rel_relkind(relid);
---The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkindSo, I think it would be better if we do the initialization only when
RelationSyncEntry in invalid.Attach a small patch which delay the initialization.
Thoughts ?
A simple benchmarking that replicates pgbench workload showed me that
the function doesn't enter the path to use the am_partition and
relkind in almost all (99.999..%) cases and I don't think it is a
special case. Thus I think we can expect that we gain about 10%
without any possibility of loss.
Addition to that, it is simply a good practice to keep variable scopes
narrow.
So +1 for this change.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Wed, Dec 22, 2021, at 10:11 AM, houzj.fnst@fujitsu.com wrote:
When reviewing some logical replication patches. I noticed that in
function get_rel_sync_entry() we always invoke get_rel_relispartition()
and get_rel_relkind() at the beginning which could cause unnecessary
cache access.---
get_rel_sync_entry(PGOutputData *data, Oid relid)
{
RelationSyncEntry *entry;
bool am_partition = get_rel_relispartition(relid);
char relkind = get_rel_relkind(relid);
---The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind
Good catch. WFM. Deferring variable initialization close to its first use is
good practice.
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Thu, Dec 23, 2021 at 12:54:41PM -0300, Euler Taveira wrote:
On Wed, Dec 22, 2021, at 10:11 AM, houzj.fnst@fujitsu.com wrote:
The extra cost could sometimes be noticeable because get_rel_sync_entry is a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.Here is the perf result for the case when inserted large amounts of data into a
un-published table in which case the cost is noticeable.--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkind
How does the perf balance change once you apply the patch? Do we have
anything else that stands out? Getting rid of this bottleneck is fine
by itself, but I am wondering if there are more things to worry about
or not.
Good catch. WFM. Deferring variable initialization close to its first use is
good practice.
Yeah, it is usually a good practice to have the declaration within
the code block that uses it rather than piling everything at the
beginning of the function. Being able to see that in profiles is
annoying, and the change is simple, so I'd like to backpatch it.
This is a period of vacations for a lot of people, so I'll wait until
the beginning-ish of January before doing anything.
--
Michael
On Friday, December 24, 2021 8:13 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Dec 23, 2021 at 12:54:41PM -0300, Euler Taveira wrote:
On Wed, Dec 22, 2021, at 10:11 AM, houzj.fnst@fujitsu.com wrote:
The extra cost could sometimes be noticeable because get_rel_sync_entry is
a
hot function which is executed for each change. And the 'am_partition' and
'relkind' are necessary only when we need to rebuild the RelationSyncEntry.Here is the perf result for the case when inserted large amounts of data into
a
un-published table in which case the cost is noticeable.
--12.83%--pgoutput_change
|--11.84%--get_rel_sync_entry
|--4.76%--get_rel_relispartition
|--4.70%--get_rel_relkindHow does the perf balance change once you apply the patch? Do we have
anything else that stands out? Getting rid of this bottleneck is fine
by itself, but I am wondering if there are more things to worry about
or not.
Thanks for the response.
Here is the perf result of pgoutput_change after applying the patch.
I didn't notice something else that stand out.
|--2.99%--pgoutput_change
|--1.80%--get_rel_sync_entry
|--1.56%--hash_search
Also attach complete profiles.
Good catch. WFM. Deferring variable initialization close to its first use is
good practice.Yeah, it is usually a good practice to have the declaration within
the code block that uses it rather than piling everything at the
beginning of the function. Being able to see that in profiles is
annoying, and the change is simple, so I'd like to backpatch it.
+1
This is a period of vacations for a lot of people, so I'll wait until
the beginning-ish of January before doing anything.
Thanks, added it to CF.
https://commitfest.postgresql.org/36/3471/
Best regards,
Hou zj
Attachments:
On Fri, Dec 24, 2021 at 01:27:26PM +0000, houzj.fnst@fujitsu.com wrote:
Here is the perf result of pgoutput_change after applying the patch.
I didn't notice something else that stand out.|--2.99%--pgoutput_change
|--1.80%--get_rel_sync_entry
|--1.56%--hash_searchAlso attach complete profiles.
Thanks. I have also done my own set of measurements, and the
difference is noticeable in the profiles I looked at. So, applied
down to 13.
--
Michael
On Wednesday, January 5, 2022 9:31 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 24, 2021 at 01:27:26PM +0000, houzj.fnst@fujitsu.com wrote:
Here is the perf result of pgoutput_change after applying the patch.
I didn't notice something else that stand out.|--2.99%--pgoutput_change
|--1.80%--get_rel_sync_entry
|--1.56%--hash_searchAlso attach complete profiles.
Thanks. I have also done my own set of measurements, and the difference is
noticeable in the profiles I looked at. So, applied down to 13.
Thanks for pushing!
Best regards,
Hou zj
On Wed, Jan 5, 2022 at 10:31 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 24, 2021 at 01:27:26PM +0000, houzj.fnst@fujitsu.com wrote:
Here is the perf result of pgoutput_change after applying the patch.
I didn't notice something else that stand out.|--2.99%--pgoutput_change
|--1.80%--get_rel_sync_entry
|--1.56%--hash_searchAlso attach complete profiles.
Thanks. I have also done my own set of measurements, and the
difference is noticeable in the profiles I looked at. So, applied
down to 13.
Thanks. I agree the variables were being defined in the wrong place
before this patch.
--
Amit Langote
EDB: http://www.enterprisedb.com