Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Hi hackers,
I found a problem in logical replication.
It seems to have the same cause as the following problem.
Creating many tables gets logical replication stuck
/messages/by-id/20f3de7675f83176253f607b5e199b228406c21c.camel@cybertec.at
Logical decoding CPU-bound w/ large number of tables
/messages/by-id/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
# problem
* logical replication enabled
* walsender process has RelfilenodeMap cache(2000 relations in this case)
* TRUNCATE or DROP or CREATE many tables in same transaction
At this time, walsender process continues to use 100% of the CPU 1core.
# environment
PostgreSQL 12.4(rpm)
CentOS 7.6.1810
# test case
sh logical_replication_truncate.sh
# perf report walsender process
Samples: 25K of event 'cpu-clock:uhH', Event count (approx.): 6315250000
Overhead Command Shared Object Symbol
87.34% postgres postgres [.] hash_seq_search
2.80% postgres postgres [.] hash_search_with_hash_value
1.57% postgres postgres [.] LocalExecuteInvalidationMessage
1.50% postgres postgres [.] hash_search
1.31% postgres postgres [.] RelfilenodeMapInvalidateCallback
0.83% postgres postgres [.] uint32_hash
0.79% postgres pgoutput.so [.] rel_sync_cache_relation_cb
0.63% postgres postgres [.] hash_uint32
0.59% postgres postgres [.] PlanCacheRelCallback
0.48% postgres postgres [.] CatCacheInvalidate
0.43% postgres postgres [.] ReorderBufferCommit
0.36% postgres libc-2.17.so [.] __memcmp_sse4_1
0.34% postgres postgres [.] deregister_seq_scan
0.27% postgres postgres [.] hash_seq_init
0.27% postgres postgres [.] CallSyscacheCallbacks
0.23% postgres postgres [.] SysCacheInvalidate
0.10% postgres postgres [.] memcmp@plt
0.08% postgres postgres [.] RelationCacheInvalidateEntry
0.05% postgres postgres [.] InvalidateCatalogSnapshot
0.03% postgres postgres [.] GetCurrentTransactionNestLevel
0.01% postgres pgoutput.so [.] hash_search@plt
0.00% postgres libpthread-2.17.so [.] __pread_nocancel
# backtrace walsender process
0x0000000000889cb4 in hash_seq_search
(status=status@entry=0x7ffd5ae38310) at dynahash.c:1442
#0 0x0000000000889cb4 in hash_seq_search
(status=status@entry=0x7ffd5ae38310) at dynahash.c:1442
#1 0x0000000000877df8 in RelfilenodeMapInvalidateCallback
(arg=<optimized out>, relid=17284) at relfilenodemap.c:64
#2 0x000000000086999a in LocalExecuteInvalidationMessage
(msg=0x2f2ef18) at inval.c:595
#3 0x000000000070b81e in ReorderBufferExecuteInvalidations
(rb=<optimized out>, txn=<optimized out>, txn=<optimized out>) at
reorderbuffer.c:2149
#4 ReorderBufferCommit (rb=0x2cbda90, xid=xid@entry=490,
commit_lsn=94490944, end_lsn=<optimized out>,
commit_time=commit_time@entry=653705313747147,
origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at
reorderbuffer.c:1770
#5 0x0000000000701248 in DecodeCommit (xid=490,
parsed=0x7ffd5ae38600, buf=0x7ffd5ae387c0, ctx=0x2c2f1e0) at
decode.c:640
#6 DecodeXactOp (ctx=0x2c2f1e0, buf=buf@entry=0x7ffd5ae387c0) at decode.c:248
#7 0x00000000007015fa in LogicalDecodingProcessRecord (ctx=0x2c2f1e0,
record=0x2c2f458) at decode.c:117
#8 0x0000000000712cd1 in XLogSendLogical () at walsender.c:2848
#9 0x0000000000714e92 in WalSndLoop
(send_data=send_data@entry=0x712c70 <XLogSendLogical>) at
walsender.c:2199
#10 0x0000000000715b51 in StartLogicalReplication (cmd=0x2c8c5c0) at
walsender.c:1129
#11 exec_replication_command (
cmd_string=cmd_string@entry=0x2c0af60 "START_REPLICATION SLOT
\"subdb_test\" LOGICAL 0/0 (proto_version '1', publication_names
'\"pubdb_test\"')") at walsender.c:1545
#12 0x0000000000760ff1 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x2c353d0, dbname=0x2c35288 "postgres",
username=<optimized out>)
at postgres.c:4243
#13 0x0000000000484172 in BackendRun (port=<optimized out>,
port=<optimized out>) at postmaster.c:4448
#14 BackendStartup (port=0x2c2d200) at postmaster.c:4139
#15 ServerLoop () at postmaster.c:1704
#16 0x00000000006f10b3 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x2c05b20) at postmaster.c:1377
#17 0x0000000000485073 in main (argc=3, argv=0x2c05b20) at main.c:228
# probable cause
RelfilenodeMapInvalidateCallback is called many times.
Every time the CommandId is incremented, execute all invalidations.
ReorderBufferExecuteInvalidations repeats the invalidation
of all TRUNCATE objects in same transaction by the number of objects.
hash_seq_search in RelfilenodeMapInvalidateCallback is heavy,
but I have no idea to solve this problem...
./src/backend/replication/logical/reorderbuffer.c
1746 case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
1747 Assert(change->data.command_id != InvalidCommandId);
1748
1749 if (command_id < change->data.command_id)
1750 {
1751 command_id = change->data.command_id;
1752
1753 if (!snapshot_now->copied)
1754 {
1755 /* we don't use the global one anymore */
1756 snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
1757 txn, command_id);
1758 }
1759
1760 snapshot_now->curcid = command_id;
1761
1762 TeardownHistoricSnapshot(false);
1763 SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
1764
1765 /*
1766 * Every time the CommandId is incremented, we could
1767 * see new catalog contents, so execute all
1768 * invalidations.
1769 */
1770 ReorderBufferExecuteInvalidations(rb, txn);
1771 }
1772
1773 break;
Do you have any solutions?
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com
Attachments:
logical_replication_truncate.shtext/x-sh; charset=US-ASCII; name=logical_replication_truncate.shDownload
On Wed, Sep 23, 2020 at 1:09 PM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
Hi hackers,
I found a problem in logical replication.
It seems to have the same cause as the following problem.Creating many tables gets logical replication stuck
/messages/by-id/20f3de7675f83176253f607b5e199b228406c21c.camel@cybertec.atLogical decoding CPU-bound w/ large number of tables
/messages/by-id/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com# problem
* logical replication enabled
* walsender process has RelfilenodeMap cache(2000 relations in this case)
* TRUNCATE or DROP or CREATE many tables in same transactionAt this time, walsender process continues to use 100% of the CPU 1core.
...
...
./src/backend/replication/logical/reorderbuffer.c
1746 case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
1747 Assert(change->data.command_id != InvalidCommandId);
1748
1749 if (command_id < change->data.command_id)
1750 {
1751 command_id = change->data.command_id;
1752
1753 if (!snapshot_now->copied)
1754 {
1755 /* we don't use the global one anymore */
1756 snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
1757 txn, command_id);
1758 }
1759
1760 snapshot_now->curcid = command_id;
1761
1762 TeardownHistoricSnapshot(false);
1763 SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
1764
1765 /*
1766 * Every time the CommandId is incremented, we could
1767 * see new catalog contents, so execute all
1768 * invalidations.
1769 */
1770 ReorderBufferExecuteInvalidations(rb, txn);
1771 }
1772
1773 break;Do you have any solutions?
Yeah, I have an idea on how to solve this problem. This problem is
primarily due to the reason that we use to receive invalidations only
at commit time and then we need to execute them after each command id
change. However, after commit c55040ccd0 (When wal_level=logical,
write invalidations at command end into WAL so that decoding can use
this information.) we actually know exactly when we need to execute
each invalidation. The idea is that instead of collecting
invalidations only in ReorderBufferTxn, we need to collect them in
form of ReorderBufferChange as well similar to what we do for other
changes (for ex. REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID). In this
case, we need to collect additionally in ReorderBufferTxn because if
the transaction is aborted or some exception occurred while executing
the changes we need to perform all the invalidations.
--
With Regards,
Amit Kapila.
Hi Amit,
Thank you for the reply!
However, after commit c55040ccd0 (When wal_level=logical,
write invalidations at command end into WAL so that decoding can use
this information.) we actually know exactly when we need to execute
each invalidation.
I see, thank you for your explaination.
I'll try to think about the solution by using XLOG_INVALIDATIONS
and referring to the thread
"PATCH: logical_work_mem and logical streaming of large in-progress
transactions".
Best Regards,
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com
On Mon, Sep 28, 2020 at 7:50 AM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
Hi Amit,
Thank you for the reply!
However, after commit c55040ccd0 (When wal_level=logical,
write invalidations at command end into WAL so that decoding can use
this information.) we actually know exactly when we need to execute
each invalidation.I see, thank you for your explaination.
I'll try to think about the solution by using XLOG_INVALIDATIONS
You need to refer to XLOG_XACT_INVALIDATIONS, not XLOG_INVALIDATIONS.
and referring to the thread
"PATCH: logical_work_mem and logical streaming of large in-progress
transactions".
Okay. Feel free to clarify your questions if you have any? Are you
interested in writing a patch for the same?
--
With Regards,
Amit Kapila.
Hi Amit,
You need to refer to XLOG_XACT_INVALIDATIONS, not XLOG_INVALIDATIONS.
Thank you, that's right. XLOG_INVALIDATIONS is added at COMMIT,
so I need to refer to XLOG_XACT_INVALIDATIONS.
Okay. Feel free to clarify your questions if you have any? Are you
interested in writing a patch for the same?
Thank you! Yes, I would like to write a patch.
If you already have a discussed thread or patch for this idea,
please let me know.
Best Regards,
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com
On Mon, Sep 28, 2020 at 10:01 AM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
Okay. Feel free to clarify your questions if you have any? Are you
interested in writing a patch for the same?Thank you! Yes, I would like to write a patch.
If you already have a discussed thread or patch for this idea,
please let me know.
I don't have a patch for this idea but you can refer [1]/messages/by-id/CAFiTN-t+nKqv+oTE37NWYaZHiJGzogoWcsW1_qMcXeV5u3ErmA@mail.gmail.com
(v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see
what I was trying to say about registering the invalidations in the
form of ReorderBufferChange. We have something along the lines of what
I described above in that patch but we have removed it because we need
all the invalidations to be accumulated to handle aborts and we don't
want two different mechanisms to store invalidations.
[1]: /messages/by-id/CAFiTN-t+nKqv+oTE37NWYaZHiJGzogoWcsW1_qMcXeV5u3ErmA@mail.gmail.com
--
With Regards,
Amit Kapila.
Hi Amit,
I don't have a patch for this idea but you can refer [1]
(v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see
what I was trying to say about registering the invalidations in the
form of ReorderBufferChange. We have something along the lines of what
I described above in that patch but we have removed it because we need
all the invalidations to be accumulated to handle aborts and we don't
want two different mechanisms to store invalidations.
Thanks, I read the patch
(v25-0002-Issue-individual-invalidations-with-wal_level-lo)
and the review.
I understand the following.
* In v25-0002, there were two different mechanisms,
XLOG_XACT_INVALIDATIONS (ReorderBufferAddInvalidation) and
XLOG_INVALIDATIONS (ReorderBufferAddInvalidations).
* After a review, XLOG_XACT_INVALIDATIONS was implemented to
generate all invalidation messages.
I'm going to write a patch that looks like the following.
* Add the process of collecting invalidation to
XLOG_XACT_INVALIDATIONS in the form of ReorderBufferChange.
* Invalidation is executed in case
REORDER_BUFFER_CHANGE_INVALIDATION.
Best Regards,
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com
On Mon, Sep 28, 2020 at 2:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Sep 28, 2020 at 10:01 AM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:Okay. Feel free to clarify your questions if you have any? Are you
interested in writing a patch for the same?Thank you! Yes, I would like to write a patch.
If you already have a discussed thread or patch for this idea,
please let me know.I don't have a patch for this idea but you can refer [1]
(v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see
what I was trying to say about registering the invalidations in the
form of ReorderBufferChange. We have something along the lines of what
I described above in that patch but we have removed it because we need
all the invalidations to be accumulated to handle aborts and we don't
want two different mechanisms to store invalidations.
In some of the intermediate version of the logical-decoding, we had
collected in form of changes and later we changed it so that we can
execute all the invalidation on abort. I just browsed old patch and
fetch the changes to collect the invalidation in form of changes. I
have rebased on the current head so that we collect both in form of
changes as well as collection of the invalidation. So if we get
individiaual invalidation we execute them and on abort we execute all
the invalidation together.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v1-0001-Collect-command-invalidation-in-form-of-changes.patchapplication/octet-stream; name=v1-0001-Collect-command-invalidation-in-form-of-changes.patchDownload+94-18
On Wed, Sep 30, 2020 at 12:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Sep 28, 2020 at 2:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't have a patch for this idea but you can refer [1]
(v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see
what I was trying to say about registering the invalidations in the
form of ReorderBufferChange. We have something along the lines of what
I described above in that patch but we have removed it because we need
all the invalidations to be accumulated to handle aborts and we don't
want two different mechanisms to store invalidations.In some of the intermediate version of the logical-decoding, we had
collected in form of changes and later we changed it so that we can
execute all the invalidation on abort. I just browsed old patch and
fetch the changes to collect the invalidation in form of changes. I
have rebased on the current head so that we collect both in form of
changes as well as collection of the invalidation. So if we get
individiaual invalidation we execute them and on abort we execute all
the invalidation together.
Yeah, this is in-line with what I had in mind but please update the
comments in ReorderBufferAddInvalidations why we need to collect this
in both the forms. The comments are there but expand them a bit more.
And you might need to update the below comment as well:
typedef struct ReorderBufferTXN
{
..
/*
* List of ReorderBufferChange structs, including new Snapshots and new
* CommandIds
*/
dlist_head changes;
I have tried to think of a way to capture these changes only in one
form to serve our purpose but couldn't come up with any good ideas but
if you can think of some idea to achieve that I am all ears.
Keisuke-San, can you please test Dilip's patch to see if this has
fixed your problem and whether you see any other problem with this?
--
With Regards,
Amit Kapila.
Hi Dilip, Amit,
Thank you for the patch!
I test the patch on the master HEAD(9796f455) and it works fine.
* make installcheck-world: passed
* walsender process continues to use 100% of the CPU 1core when
TRUNCATE 1000 partition: 1s or less
** before patch : 230s
There is "ReorderBufferAddInvalidation" in reorderbuffer.h, but I
don't think it's needed.
src/include/replication/reorderbuffer.h
+void ReorderBufferAddInvalidation(ReorderBuffer *, TransactionId,
XLogRecPtr lsn,
+ int nmsgs, SharedInvalidationMessage *msgs);
If possible, I'd like to improve it even before PG13,
but I recognize that it's difficult because it uses a PG14 or later
mechanism...
Best Regards,
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com
On Thu, Oct 1, 2020 at 8:22 AM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
Hi Dilip, Amit,
Thank you for the patch!
I test the patch on the master HEAD(9796f455) and it works fine.
* make installcheck-world: passed
* walsender process continues to use 100% of the CPU 1core when
TRUNCATE 1000 partition: 1s or less
** before patch : 230s
Does this result indicate that it is still CPU bound but it does the
actual decoding and completes in 1s instead of spending 230s mainly to
execute unnecessary invalidations?
There is "ReorderBufferAddInvalidation" in reorderbuffer.h, but I
don't think it's needed.src/include/replication/reorderbuffer.h +void ReorderBufferAddInvalidation(ReorderBuffer *, TransactionId, XLogRecPtr lsn, + int nmsgs, SharedInvalidationMessage *msgs);
From the patch perspective, I think it is better if we can add one
test case as well where we process some invalidations and then the
rollback happens and we need to process all the invalidations
together. Basically, this is to cover the new code, if such a test
already exists then it is fine.
If possible, I'd like to improve it even before PG13,
but I recognize that it's difficult because it uses a PG14 or later
mechanism...
Yeah, it won't be possible before PG14 as it depends on the new
feature added in PG14.
--
With Regards,
Amit Kapila.
On Thu, Oct 1, 2020 at 9:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 1, 2020 at 8:22 AM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:Hi Dilip, Amit,
Thank you for the patch!
I test the patch on the master HEAD(9796f455) and it works fine.
* make installcheck-world: passed
* walsender process continues to use 100% of the CPU 1core when
TRUNCATE 1000 partition: 1s or less
** before patch : 230sDoes this result indicate that it is still CPU bound but it does the
actual decoding and completes in 1s instead of spending 230s mainly to
execute unnecessary invalidations?There is "ReorderBufferAddInvalidation" in reorderbuffer.h, but I
don't think it's needed.src/include/replication/reorderbuffer.h +void ReorderBufferAddInvalidation(ReorderBuffer *, TransactionId, XLogRecPtr lsn, + int nmsgs, SharedInvalidationMessage *msgs);From the patch perspective, I think it is better if we can add one
test case as well where we process some invalidations and then the
rollback happens and we need to process all the invalidations
together. Basically, this is to cover the new code, if such a test
already exists then it is fine.
I think we already have such a test case.
019_stream_subxact_ddl_abort.pl is covering this scenario.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 30, 2020 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 12:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Sep 28, 2020 at 2:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't have a patch for this idea but you can refer [1]
(v25-0002-Issue-individual-invalidations-with-wal_level-lo) to see
what I was trying to say about registering the invalidations in the
form of ReorderBufferChange. We have something along the lines of what
I described above in that patch but we have removed it because we need
all the invalidations to be accumulated to handle aborts and we don't
want two different mechanisms to store invalidations.In some of the intermediate version of the logical-decoding, we had
collected in form of changes and later we changed it so that we can
execute all the invalidation on abort. I just browsed old patch and
fetch the changes to collect the invalidation in form of changes. I
have rebased on the current head so that we collect both in form of
changes as well as collection of the invalidation. So if we get
individiaual invalidation we execute them and on abort we execute all
the invalidation together.Yeah, this is in-line with what I had in mind but please update the
comments in ReorderBufferAddInvalidations why we need to collect this
in both the forms. The comments are there but expand them a bit more.
Done
And you might need to update the below comment as well:
typedef struct ReorderBufferTXN
{
..
/*
* List of ReorderBufferChange structs, including new Snapshots and new
* CommandIds
*/
dlist_head changes;I have tried to think of a way to capture these changes only in one
form to serve our purpose but couldn't come up with any good ideas but
if you can think of some idea to achieve that I am all ears.
Even I am not sure of any better way to do this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-0001-Collect-command-invalidation-in-form-of-changes.patchapplication/octet-stream; name=v2-0001-Collect-command-invalidation-in-form-of-changes.patchDownload+103-23
I test the patch on the master HEAD(9796f455) and it works fine.
* make installcheck-world: passed
* walsender process continues to use 100% of the CPU 1core when
TRUNCATE 1000 partition: 1s or less
** before patch : 230sDoes this result indicate that it is still CPU bound but it does the
actual decoding and completes in 1s instead of spending 230s mainly to
execute unnecessary invalidations?
Okay, to check the decodeing and invalidation times,
I got the time to return the results to pg_recvlogical and perf.
Before the patch, most of the processing was done
by hash_seq_search in the ReferenodeMapInvalidateCallback.
After the patch, this problem has been resolved.
# test
(1) start pg_recvlogical
(2) INSERT to 2000 partition for creating RelfilenodemapHash
(3) TRUNCATE 1000 partition
(1)
pg_recvlogical --create-slot --start -f - --if-not-exists \
--plugin=test_decoding --slot=test1 --dbname=postgres --username=postgres \
--option=include-timestamp | gawk '{ print strftime("%Y-%m-%d
%H:%M:%S"), $0; fflush(); }'
(2)
DO $$
DECLARE
i INT;
j INT;
schema TEXT;
tablename TEXT;
BEGIN
FOR i IN 1 .. 2 LOOP
schema := 'nsp_' || to_char(i, 'FM000');
tablename := 'tbl_' || to_char(i, 'FM000');
EXECUTE 'INSERT INTO ' || schema || '.' || tablename || '
SELECT i,i,i,0,0,0,0,0,0,0,0,0,''a''::bytea,''a''::bytea FROM
generate_series(0,9999,1) AS i';
END LOOP;
END;
$$ LANGUAGE plpgsql;
(3)
TRUNCATE TABLE nsp_001.tbl_001;
# before
(3) TRUNCATE
decode + invalidation = 222s
2020-10-01 16:58:29 BEGIN 529
2020-10-01 17:02:11 COMMIT 529 (at 2020-10-01 16:58:29.281747+09)
(Before the patch was applied, I was concerned that
the "table nsp_001.tbl_001 ... TRUNCATE" was not outputting.)
Samples: 228K of event 'cpu-clock:uhH', Event count (approx.): 57080750000
Overhead Command Shared Object Symbol
- 96.00% postgres postgres [.] hash_seq_search
hash_seq_search
- RelfilenodeMapInvalidateCallback
- LocalExecuteInvalidationMessage
ReorderBufferExecuteInvalidations
ReorderBufferProcessTXN
ReorderBufferCommit
DecodeCommit
DecodeXactOp
LogicalDecodingProcessRecord
XLogSendLogical
WalSndLoop
StartLogicalReplication
exec_replication_command
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
main
__libc_start_main
_start
+ 0.77% postgres postgres [.] LocalExecuteInvalidationMessage
0.47% postgres postgres [.] hash_bytes_uint32
0.42% postgres postgres [.] hash_search_with_hash_value
0.37% postgres postgres [.] RelfilenodeMapInvalidateCallback
0.32% postgres postgres [.] CatCacheInvalidate
0.23% postgres postgres [.] PlanCacheRelCallback
0.23% postgres postgres [.] ReorderBufferExecuteInvalidations
# after
(3) TRUNCATE
decode + invalidation = 1s or less
2020-10-01 17:09:43 BEGIN 537
2020-10-01 17:09:43 table nsp_001.tbl_001, nsp_001.part_0001,
nsp_001.part_0002 ... nsp_001.part_0999, nsp_001.part_1000: TRUNCATE:
(no-flags)
2020-10-01 17:09:43 COMMIT 537 (at 2020-10-01 17:09:43.192989+09)
Samples: 337 of event 'cpu-clock:uhH', Event count (approx.): 84250000
Overhead Command Shared Object Symbol
- 47.48% postgres postgres [.] hash_seq_search
hash_seq_search
RelfilenodeMapInvalidateCallback
LocalExecuteInvalidationMessage
ReorderBufferExecuteInvalidations
ReorderBufferProcessTXN
ReorderBufferCommit
DecodeCommit
DecodeXactOp
LogicalDecodingProcessRecord
XLogSendLogical
WalSndLoop
StartLogicalReplication
exec_replication_command
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
main
__libc_start_main
_start
+ 3.26% postgres postgres [.] heap_hot_search_buffer
+ 2.67% postgres postgres [.] pg_comp_crc32c_sse42
+ 2.08% postgres libc-2.17.so [.] __memcpy_ssse3_back
+ 2.08% postgres postgres [.] DecodeXLogRecord
+ 2.08% postgres postgres [.] hash_search_with_hash_value
+ 1.48% postgres libpthread-2.17.so [.] __libc_recv
+ 1.19% postgres libc-2.17.so [.] __memset_sse2
+ 1.19% postgres libpthread-2.17.so [.] __errno_location
+ 1.19% postgres postgres [.] LocalExecuteInvalidationMessage
+ 1.19% postgres postgres [.] ReadPageInternal
+ 1.19% postgres postgres [.] XLogReadRecord
+ 1.19% postgres postgres [.] socket_is_send_pending
hash_seq_search
Best Regards,
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com
On Thu, Oct 1, 2020 at 2:43 PM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
I test the patch on the master HEAD(9796f455) and it works fine.
* make installcheck-world: passed
* walsender process continues to use 100% of the CPU 1core when
TRUNCATE 1000 partition: 1s or less
** before patch : 230sDoes this result indicate that it is still CPU bound but it does the
actual decoding and completes in 1s instead of spending 230s mainly to
execute unnecessary invalidations?Okay, to check the decodeing and invalidation times,
I got the time to return the results to pg_recvlogical and perf.Before the patch, most of the processing was done
by hash_seq_search in the ReferenodeMapInvalidateCallback.
After the patch, this problem has been resolved.# test
(1) start pg_recvlogical
(2) INSERT to 2000 partition for creating RelfilenodemapHash
(3) TRUNCATE 1000 partition(1)
pg_recvlogical --create-slot --start -f - --if-not-exists \
--plugin=test_decoding --slot=test1 --dbname=postgres --username=postgres \
--option=include-timestamp | gawk '{ print strftime("%Y-%m-%d
%H:%M:%S"), $0; fflush(); }'(2)
DO $$
DECLARE
i INT;
j INT;
schema TEXT;
tablename TEXT;
BEGIN
FOR i IN 1 .. 2 LOOP
schema := 'nsp_' || to_char(i, 'FM000');
tablename := 'tbl_' || to_char(i, 'FM000');
EXECUTE 'INSERT INTO ' || schema || '.' || tablename || '
SELECT i,i,i,0,0,0,0,0,0,0,0,0,''a''::bytea,''a''::bytea FROM
generate_series(0,9999,1) AS i';
END LOOP;
END;
$$ LANGUAGE plpgsql;(3)
TRUNCATE TABLE nsp_001.tbl_001;# before
(3) TRUNCATE
decode + invalidation = 222s2020-10-01 16:58:29 BEGIN 529
2020-10-01 17:02:11 COMMIT 529 (at 2020-10-01 16:58:29.281747+09)(Before the patch was applied, I was concerned that
the "table nsp_001.tbl_001 ... TRUNCATE" was not outputting.)
Why is it that before the patch these were not displayed and after the
patch, we started to see this?
--
With Regards,
Amit Kapila.
On Thu, Oct 1, 2020 at 10:12 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 30, 2020 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 12:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
And you might need to update the below comment as well:
typedef struct ReorderBufferTXN
{
..
/*
* List of ReorderBufferChange structs, including new Snapshots and new
* CommandIds
*/
dlist_head changes;
You forgot to address the above comment.
Few other comments:
==================
1.
void
ReorderBufferAddInvalidations(ReorderBuffer *rb, TransactionId xid,
@@ -2813,12 +2830,27 @@ ReorderBufferAddInvalidations(ReorderBuffer
*rb, TransactionId xid,
SharedInvalidationMessage *msgs)
{
ReorderBufferTXN *txn;
+ MemoryContext oldcontext;
+ ReorderBufferChange *change;
txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+ oldcontext = MemoryContextSwitchTo(rb->context);
+
+ change = ReorderBufferGetChange(rb);
+ change->action = REORDER_BUFFER_CHANGE_INVALIDATION;
+ change->data.inval.ninvalidations = nmsgs;
+ change->data.inval.invalidations = (SharedInvalidationMessage *)
+ MemoryContextAlloc(rb->context,
+ sizeof(SharedInvalidationMessage) * nmsgs);
+ memcpy(change->data.inval.invalidations, msgs,
+ sizeof(SharedInvalidationMessage) * nmsgs);
+
+ ReorderBufferQueueChange(rb, xid, lsn, change, false);
+
/*
- * We collect all the invalidations under the top transaction so that we
- * can execute them all together.
+ * Additionally, collect all the invalidations under the top transaction so
+ * that we can execute them all together. See comment atop this function
*/
if (txn->toptxn)
txn = txn->toptxn;
@@ -2830,8 +2862,7 @@ ReorderBufferAddInvalidations(ReorderBuffer *rb,
TransactionId xid,
{
txn->ninvalidations = nmsgs;
txn->invalidations = (SharedInvalidationMessage *)
Here what is the need for using MemoryContextAlloc, can't we directly
use palloc? Also, isn't it better to store the invalidation in txn
before queuing it for change because doing so can lead to the
processing of this and other changes accumulated till that point
before recording the same in txn. As such, there is no problem with it
but still, I think if any error happens while processing those changes
we would be able to clear the cache w.r.t this particular
invalidation.
2.
XXX Do we need to care about relcacheInitFileInval and
+ * the other fields added to ReorderBufferChange, or just
+ * about the message itself?
I don't think we need this comment in the patch.
3.
- * This needs to be called for each XLOG_XACT_INVALIDATIONS message and
- * accumulates all the invalidation messages in the toplevel transaction.
- * This is required because in some cases where we skip processing the
- * transaction (see ReorderBufferForget), we need to execute all the
- * invalidations together.
+ * This needs to be called for each XLOG_XACT_INVALIDATIONS message. The
+ * invalidation messages will be added in the reorder buffer as a change as
+ * well as all the invalidations will be accumulated under the toplevel
+ * transaction. We collect them as a change so that during decoding, we can
+ * execute only those invalidations which are specific to the command instead
+ * of executing all the invalidations, OTOH after decoding is complete or on
+ * transaction abort (see ReorderBufferForget) we need to execute all the
+ * invalidations to avoid any cache pollution so it is better to keep them
+ * together
Can we rewrite the comment as below?
This needs to be called for each XLOG_XACT_INVALIDATIONS message and
accumulates all the invalidation messages in the toplevel transaction
as well as in the form of change in reorder buffer. We require to
record it in form of change so that we can execute only required
invalidations instead of executing all the invalidations on each
CommandId increment. We also need to accumulate these in top-level txn
because in some cases where we skip processing the transaction (see
ReorderBufferForget), we need to execute all the invalidations
together.
4.
+void ReorderBufferAddInvalidation(ReorderBuffer *, TransactionId,
XLogRecPtr lsn,
+ int nmsgs, SharedInvalidationMessage *msgs);
As pointed by Keisuke-San as well, this is not required.
5. Can you please once repeat the performance test done by Keisuke-San
to see if you have similar observations? Additionally, see if you are
also seeing the inconsistency related to the Truncate message reported
by him and if so why?
--
With Regards,
Amit Kapila.
On Thu, Oct 1, 2020 at 4:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 1, 2020 at 10:12 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Sep 30, 2020 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 12:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
And you might need to update the below comment as well:
typedef struct ReorderBufferTXN
{
..
/*
* List of ReorderBufferChange structs, including new Snapshots and new
* CommandIds
*/
dlist_head changes;You forgot to address the above comment.
Fixed
Few other comments:
==================
1.
void
ReorderBufferAddInvalidations(ReorderBuffer *rb, TransactionId xid,
@@ -2813,12 +2830,27 @@ ReorderBufferAddInvalidations(ReorderBuffer
*rb, TransactionId xid,
SharedInvalidationMessage *msgs)
{
ReorderBufferTXN *txn;
+ MemoryContext oldcontext;
+ ReorderBufferChange *change;txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+ oldcontext = MemoryContextSwitchTo(rb->context); + + change = ReorderBufferGetChange(rb); + change->action = REORDER_BUFFER_CHANGE_INVALIDATION; + change->data.inval.ninvalidations = nmsgs; + change->data.inval.invalidations = (SharedInvalidationMessage *) + MemoryContextAlloc(rb->context, + sizeof(SharedInvalidationMessage) * nmsgs); + memcpy(change->data.inval.invalidations, msgs, + sizeof(SharedInvalidationMessage) * nmsgs); + + ReorderBufferQueueChange(rb, xid, lsn, change, false); + /* - * We collect all the invalidations under the top transaction so that we - * can execute them all together. + * Additionally, collect all the invalidations under the top transaction so + * that we can execute them all together. See comment atop this function */ if (txn->toptxn) txn = txn->toptxn; @@ -2830,8 +2862,7 @@ ReorderBufferAddInvalidations(ReorderBuffer *rb, TransactionId xid, { txn->ninvalidations = nmsgs; txn->invalidations = (SharedInvalidationMessage *)Here what is the need for using MemoryContextAlloc, can't we directly
use palloc? Also, isn't it better to store the invalidation in txn
before queuing it for change because doing so can lead to the
processing of this and other changes accumulated till that point
before recording the same in txn. As such, there is no problem with it
but still, I think if any error happens while processing those changes
we would be able to clear the cache w.r.t this particular
invalidation.
Fixed
2. XXX Do we need to care about relcacheInitFileInval and + * the other fields added to ReorderBufferChange, or just + * about the message itself?I don't think we need this comment in the patch.
3. - * This needs to be called for each XLOG_XACT_INVALIDATIONS message and - * accumulates all the invalidation messages in the toplevel transaction. - * This is required because in some cases where we skip processing the - * transaction (see ReorderBufferForget), we need to execute all the - * invalidations together. + * This needs to be called for each XLOG_XACT_INVALIDATIONS message. The + * invalidation messages will be added in the reorder buffer as a change as + * well as all the invalidations will be accumulated under the toplevel + * transaction. We collect them as a change so that during decoding, we can + * execute only those invalidations which are specific to the command instead + * of executing all the invalidations, OTOH after decoding is complete or on + * transaction abort (see ReorderBufferForget) we need to execute all the + * invalidations to avoid any cache pollution so it is better to keep them + * togetherCan we rewrite the comment as below?
This needs to be called for each XLOG_XACT_INVALIDATIONS message and
accumulates all the invalidation messages in the toplevel transaction
as well as in the form of change in reorder buffer. We require to
record it in form of change so that we can execute only required
invalidations instead of executing all the invalidations on each
CommandId increment. We also need to accumulate these in top-level txn
because in some cases where we skip processing the transaction (see
ReorderBufferForget), we need to execute all the invalidations
together.
Done
4. +void ReorderBufferAddInvalidation(ReorderBuffer *, TransactionId, XLogRecPtr lsn, + int nmsgs, SharedInvalidationMessage *msgs); As pointed by Keisuke-San as well, this is not required.
Fixed
5. Can you please once repeat the performance test done by Keisuke-San
to see if you have similar observations? Additionally, see if you are
also seeing the inconsistency related to the Truncate message reported
by him and if so why?
Okay, I will set up and do this test early next week. Keisuke-San,
can you send me your complete test script?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v3-0001-Collect-command-invalidation-in-form-of-changes.patchapplication/octet-stream; name=v3-0001-Collect-command-invalidation-in-form-of-changes.patchDownload+93-24
Hi Dilip, Amit,
5. Can you please once repeat the performance test done by Keisuke-San
to see if you have similar observations? Additionally, see if you are
also seeing the inconsistency related to the Truncate message reported
by him and if so why?Okay, I will set up and do this test early next week. Keisuke-San,
can you send me your complete test script?
Yes, I've attached a test script(test_pg_recvlogical.sh)
Sorry, the issue with TRUNCATE not outputting was due to a build miss
on my part.
Even before the patch, TRUNCATE decodes and outputting correctly.
So, please check the performance only.
I have tested it again and will share the results with you.
Also, the argument of palloc was still MemoryContextAlloc,
which prevented me from applying the patch, so I've only fixed that part.
# test script
Please set PGHOME and CLUSTER_PUB before run.
sh test_pg_recvlogical.sh
# perf command
perf record --call-graph dwarf -p [walsender pid]
perf report -i perf.data --no-children
# before patch
decode + invalidation = 222s
2020-10-02 14:55:50 BEGIN 509
2020-10-02 14:59:42 table nsp_001.tbl_001, nsp_001.part_0001 ...
nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags)
2020-10-02 14:59:42 COMMIT 509 (at 2020-10-02 14:55:50.349219+09)
Samples: 229K of event 'cpu-clock:uhH', Event count (approx.): 57347250000
Overhead Command Shared Object Symbol
- 96.07% postgres postgres [.] hash_seq_search
hash_seq_search
- RelfilenodeMapInvalidateCallback
- LocalExecuteInvalidationMessage
ReorderBufferExecuteInvalidations
ReorderBufferProcessTXN
ReorderBufferCommit
DecodeCommit
DecodeXactOp
LogicalDecodingProcessRecord
XLogSendLogical
WalSndLoop
StartLogicalReplication
exec_replication_command
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
main
__libc_start_main
_start
0.74% postgres postgres [.] LocalExecuteInvalidationMessage
0.47% postgres postgres [.] hash_bytes_uint32
0.46% postgres postgres [.] hash_search_with_hash_value
0.37% postgres postgres [.] RelfilenodeMapInvalidateCallback
0.31% postgres postgres [.] CatCacheInvalidate
0.22% postgres postgres [.] uint32_hash
0.21% postgres postgres [.] PlanCacheRelCallback
0.20% postgres postgres [.] hash_seq_init
0.17% postgres postgres [.] ReorderBufferExecuteInvalidations
# after patch
decode + invalidation = 1s or less
2020-10-02 15:20:08 BEGIN 509
2020-10-02 15:20:08 table nsp_001.tbl_001, nsp_001.part_0001 ...
nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags)
2020-10-02 15:20:08 COMMIT 509 (at 2020-10-02 15:20:08.704503+09)
Samples: 310 of event 'cpu-clock:uhH', Event count (approx.): 77500000
Overhead Command Shared Object Symbol
- 49.35% postgres postgres [.] hash_seq_search
hash_seq_search
RelfilenodeMapInvalidateCallback
LocalExecuteInvalidationMessage
ReorderBufferExecuteInvalidations
ReorderBufferProcessTXN
ReorderBufferCommit
DecodeCommit
DecodeXactOp
LogicalDecodingProcessRecord
XLogSendLogical
WalSndLoop
StartLogicalReplication
exec_replication_command
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
main
__libc_start_main
_start
2.58% postgres libc-2.17.so [.] __memset_sse2
2.58% postgres libpthread-2.17.so [.] __libc_recv
1.61% postgres libc-2.17.so [.] __memcpy_ssse3_back
1.61% postgres libpthread-2.17.so [.] __errno_location
1.61% postgres postgres [.] AllocSetAlloc
1.61% postgres postgres [.] DecodeXLogRecord
1.29% postgres postgres [.] AllocSetFree
1.29% postgres postgres [.] hash_bytes_uint32
0.97% postgres [vdso] [.] __vdso_gettimeofday
0.97% postgres postgres [.] LocalExecuteInvalidationMessage
Best Regards,
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com
On Fri, Oct 2, 2020 at 12:26 PM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
Hi Dilip, Amit,
5. Can you please once repeat the performance test done by Keisuke-San
to see if you have similar observations? Additionally, see if you are
also seeing the inconsistency related to the Truncate message reported
by him and if so why?Okay, I will set up and do this test early next week. Keisuke-San,
can you send me your complete test script?Yes, I've attached a test script(test_pg_recvlogical.sh)
Sorry, the issue with TRUNCATE not outputting was due to a build miss
on my part.
Even before the patch, TRUNCATE decodes and outputting correctly.
So, please check the performance only.I have tested it again and will share the results with you.
Okay, thanks for confirming I will test.
Also, the argument of palloc was still MemoryContextAlloc,
which prevented me from applying the patch, so I've only fixed that part.
Oh my bad, I think I forgot to amend that part in the patch after
compiling. Thanks for fixing this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 2, 2020 at 12:26 PM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
Hi Dilip, Amit,
5. Can you please once repeat the performance test done by Keisuke-San
to see if you have similar observations? Additionally, see if you are
also seeing the inconsistency related to the Truncate message reported
by him and if so why?Okay, I will set up and do this test early next week. Keisuke-San,
can you send me your complete test script?Yes, I've attached a test script(test_pg_recvlogical.sh)
Sorry, the issue with TRUNCATE not outputting was due to a build miss
on my part.
Even before the patch, TRUNCATE decodes and outputting correctly.
So, please check the performance only.I have tested it again and will share the results with you.
Also, the argument of palloc was still MemoryContextAlloc,
which prevented me from applying the patch, so I've only fixed that part.# test script
Please set PGHOME and CLUSTER_PUB before run.
sh test_pg_recvlogical.sh
# perf command
perf record --call-graph dwarf -p [walsender pid]
perf report -i perf.data --no-children# before patch
decode + invalidation = 222s
2020-10-02 14:55:50 BEGIN 509
2020-10-02 14:59:42 table nsp_001.tbl_001, nsp_001.part_0001 ...
nsp_001.part_0999, nsp_001.part_1000: TRUNCATE: (no-flags)
2020-10-02 14:59:42 COMMIT 509 (at 2020-10-02 14:55:50.349219+09)
I could not see this issue even without the patch, it is taking less
than 1s even without the patch. See below results
2020-10-08 13:00:49 BEGIN 509
2020-10-08 13:00:49 table nsp_001.part_0001: INSERT:...
2020-10-08 13:00:49 COMMIT 509 (at 2020-10-08 13:00:48.741986+05:30)
Am I missing something?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com