BUG #17716: walsender process hang while decoding 'DROP PUBLICATION' XLOG
The following bug has been logged on the website:
Bug reference: 17716
Logged by: bowenshi
Email address: zxwsbg12138@gmail.com
PostgreSQL version: 13.9
Operating system: centos
Description:
Dears, I have found a problem while using logical decoding. I have two
publications, and the walsender process hang while decoding DROP PUBLICATION
XLOG of another publication. The pstack looks like following:
#0 0x00000000008b9c24 in hash_seq_search ()
#1 0x00007fd14340590c in rel_sync_cache_publication_cb () from
pgoutput.so
#2 0x0000000000897a1a in CallSyscacheCallbacks ()
#3 0x000000000073682e in ReorderBufferCommit ()
#4 0x000000000072b988 in DecodeXactOp ()
#5 0x000000000072bd52 in LogicalDecodingProcessRecord ()
#6 0x000000000074d14f in XLogSendLogical ()
#7 0x000000000074f562 in WalSndLoop ()
#8 0x0000000000750322 in exec_replication_command ()
#9 0x0000000000792301 in PostgresMain ()
#10 0x000000000048854f in ServerLoop ()
#11 0x000000000071b579 in PostmasterMain ()
#12 0x000000000048948d in main ()
When I user perl to analyze, it spend almost 99% time in `hash_seq_search`
function.
This problem can be reproduced , and I will send the shell script later.
The following step can help reproducing the problem:
1. Run "initinstance.sh" and get two instance, one publisher and one
subscriber, this script will create a subscription as well;
2. Run “createschema.sh” and create 1000 schemas, each schema has 100 tables;
3. Run “createpublication.sh” and create a huge publication, which
publishes 10000 tables;
4. Run “insert.sh” and insert exactly one tuple into each table
created in Step 2;
5. Run “insert into t values(1);” in publisher, and then we can found
it in subscriber;
6. Run “drop publication pub_t_large;” in publisher;
7. Run “insert into t values(1);” in publisher, and then we can not
find it in subscriber, cause the walsender has hanged in decoding Step
6’s XLOG;
PG Bug reporting form <noreply@postgresql.org> 于2022年12月13日周二 15:51写道:
Show quoted text
The following bug has been logged on the website:
Bug reference: 17716
Logged by: bowenshi
Email address: zxwsbg12138@gmail.com
PostgreSQL version: 13.9
Operating system: centos
Description:Dears, I have found a problem while using logical decoding. I have two
publications, and the walsender process hang while decoding DROP PUBLICATION
XLOG of another publication. The pstack looks like following:#0 0x00000000008b9c24 in hash_seq_search ()
#1 0x00007fd14340590c in rel_sync_cache_publication_cb () from
pgoutput.so
#2 0x0000000000897a1a in CallSyscacheCallbacks ()
#3 0x000000000073682e in ReorderBufferCommit ()
#4 0x000000000072b988 in DecodeXactOp ()
#5 0x000000000072bd52 in LogicalDecodingProcessRecord ()
#6 0x000000000074d14f in XLogSendLogical ()
#7 0x000000000074f562 in WalSndLoop ()
#8 0x0000000000750322 in exec_replication_command ()
#9 0x0000000000792301 in PostgresMain ()
#10 0x000000000048854f in ServerLoop ()
#11 0x000000000071b579 in PostmasterMain ()
#12 0x000000000048948d in main ()When I user perl to analyze, it spend almost 99% time in `hash_seq_search`
function.This problem can be reproduced , and I will send the shell script later.
Hello,
The problem is not reproducing on HEAD but reproducing on 13.9
*Problem Description*:
On drop of large publication with 10,000 tables added to it, walsender
hangs in
ReorderBufferCommit
-->ReorderBufferExecuteInvalidations->....hash_seq_search()
*Root Cause:*
Drop publication has 10000 tuples (table entries) to be deleted from
catalogs. So it creates invalidation msgs to invalidate various cache
entries. Each table-entry deletion results in 4 invalidation msgs. So for
complete drop publication operation, it adds to 40,000 invalidation msgs
aprox (+few more for publication drop itself). These invalidation msgs are
added to top-txn.
Later when Walsender tries to read Xlog of 'drop publication' operation, it
tries to replay these invalidation msgs for each command-id increment.
Command-id is incremented from 1 to 10,000. So for each command-id, it
replays all 40,000 invalidation msgs. There is no way to identify that only
4 invalidation msgs belong to a particular command-id and thus all
accumulative msgs are played on each command-id increment. This results in
a hang like situation in the above stack. And thus insert executed
(belonging to different publication) immediately after drop publication is
not received on subscriber.
*What fixes it on HEAD:*
On later versions, there is a new mechanism introduced where-in after each
command id increment, we log invalidations to be done for that command-id.
Record of type XLOG_XACT_INVALIDATIONS is inserted in XLOG. For the case in
hand, each such invalidation record has 4 invalidation msgs for respective
command-id. So all 40,000 invalidation msgs are inserted to Xlog but in
pieces i.e. 10,000 records for invalidations, each record representing
invalidations for 1 particular command id and each having 4 invalidation
msgs.
Later when walsender tries to replay, it replays only that particualr's
command-id xlog-record after each command-id increment (in the order these
were inserted). Thus we perform invalidation 10,000 times but for 4 msgs at
once, instead of 40,000 msgs every time in the previous version.
These changes are committed through below commit Ids:
*Full change:*
commit c55040ccd017962b7b8d3fbcdc184aa90c722a21
Author: Dilip Kumar, Tomas Vondra, Amit Kapila
Date: Thu Jul 23 08:19:07 2020 +0530
*Trivial Fix:*
commit 36e545cd051ffe58064c1f38faeb64f4feef4aec
Author: Amit Kapila
Date: Thu Nov 10 16:56:49 2022 +0530
I have not tried it yet in version 14, hopefully we will see it working
there. I will try and update it soon.
thanks
Shveta
On Tue, Dec 13, 2022 at 5:43 PM Bowen Shi <zxwsbg12138@gmail.com> wrote:
Show quoted text
The following step can help reproducing the problem:
1. Run "initinstance.sh" and get two instance, one publisher and one
subscriber, this script will create a subscription as well;
2. Run “createschema.sh” and create 1000 schemas, each schema has 100
tables;
3. Run “createpublication.sh” and create a huge publication, which
publishes 10000 tables;
4. Run “insert.sh” and insert exactly one tuple into each table
created in Step 2;
5. Run “insert into t values(1);” in publisher, and then we can found
it in subscriber;
6. Run “drop publication pub_t_large;” in publisher;
7. Run “insert into t values(1);” in publisher, and then we can not
find it in subscriber, cause the walsender has hanged in decoding Step
6’s XLOG;PG Bug reporting form <noreply@postgresql.org> 于2022年12月13日周二 15:51写道:
The following bug has been logged on the website:
Bug reference: 17716
Logged by: bowenshi
Email address: zxwsbg12138@gmail.com
PostgreSQL version: 13.9
Operating system: centos
Description:Dears, I have found a problem while using logical decoding. I have two
publications, and the walsender process hang while decoding DROPPUBLICATION
XLOG of another publication. The pstack looks like following:
#0 0x00000000008b9c24 in hash_seq_search ()
#1 0x00007fd14340590c in rel_sync_cache_publication_cb () from
pgoutput.so
#2 0x0000000000897a1a in CallSyscacheCallbacks ()
#3 0x000000000073682e in ReorderBufferCommit ()
#4 0x000000000072b988 in DecodeXactOp ()
#5 0x000000000072bd52 in LogicalDecodingProcessRecord ()
#6 0x000000000074d14f in XLogSendLogical ()
#7 0x000000000074f562 in WalSndLoop ()
#8 0x0000000000750322 in exec_replication_command ()
#9 0x0000000000792301 in PostgresMain ()
#10 0x000000000048854f in ServerLoop ()
#11 0x000000000071b579 in PostmasterMain ()
#12 0x000000000048948d in main ()When I user perl to analyze, it spend almost 99% time in
`hash_seq_search`
function.
This problem can be reproduced , and I will send the shell script later.
Hello,
Thanks for your advice. I make some tests and this problem can't be
reproduced in PG 14+ version. I think adding a new XLOG type will help
resolve this problem. But I think the following patch may be helpful
in the PG 13 version.
The invalidation contains two parts: pgoutput and relfilenodeMap. We
have no way to optimize relfilenodeMap part , since it has been
discussed in previous mails
/messages/by-id/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK+NqhAWJyw8psQ@mail.gmail.com.
However, I'd like to contribute a patch to fix pgoutput part. We can skip
invalidating caches after first time with a lazy tag and this works.
It almost doubles the walsender performance while decoding this XLOG.
I use the test in the last email and reduce the number of relations in
publications to 1000, the test result is following:
Before optimization: 76 min
After optimization: 35 min
Though the result is not good enough, I think this patch is still worthy.
Attachments:
0001-Add-cache-invalid-flag-in-pgoutput.patchapplication/octet-stream; name=0001-Add-cache-invalid-flag-in-pgoutput.patchDownload+10-4
Hello,
The idea looks good to me. For 'relation schema cache (pgoutput one)', on
receiving invalidation msg for one hash-value, we invalidate the complete
cache as there is no way to find an entry corresponding to that
hash-value and thus your fix-proposal will make good difference. But I feel
it makes sense on HEAD as well.
This complete cache invalidation happens multiple times even on HEAD (10k
times for the given case). This cache is mostly empty in given test-case,
but consider the case where we have huge number of publications and
subscriptions (to make this cache have huge number of entries) and then we
try to drop 1 large publication with say 40k-50k tables, in that case
we might see slowness while traversing and invalidating the concerned cache
on HEAD as well. The test case with increased magnitude can be tried for
HEAD once to see if we need it on HEAD or not.
thanks
Shveta
On Mon, Dec 19, 2022 at 5:52 PM Bowen Shi <zxwsbg12138@gmail.com> wrote:
Show quoted text
Hello,
Thanks for your advice. I make some tests and this problem can't be
reproduced in PG 14+ version. I think adding a new XLOG type will help
resolve this problem. But I think the following patch may be helpful
in the PG 13 version.The invalidation contains two parts: pgoutput and relfilenodeMap. We
have no way to optimize relfilenodeMap part , since it has been
discussed in previous mails/messages/by-id/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK+NqhAWJyw8psQ@mail.gmail.com
.However, I'd like to contribute a patch to fix pgoutput part. We can skip
invalidating caches after first time with a lazy tag and this works.
It almost doubles the walsender performance while decoding this XLOG.I use the test in the last email and reduce the number of relations in
publications to 1000, the test result is following:Before optimization: 76 min
After optimization: 35 minThough the result is not good enough, I think this patch is still worthy.
Hello,
I tried to reproduce the lag with a bigger magnitude test case i.e. added more
tables to pub_t_large to increase command_ids and added huge number of tables
to working publication pub_t to increase the number of entries in
rel-cache, but no luck.
No noticeable lag observed on HEAD with the new mechanism of invalidation.
thanks
Shveta
Show quoted text
On Tue, Dec 20, 2022 at 11:40 AM shveta malik <shveta.malik@gmail.com> wrote:
Hello,
The idea looks good to me. For 'relation schema cache (pgoutput one)', on receiving invalidation msg for one hash-value, we invalidate the complete cache as there is no way to find an entry corresponding to that hash-value and thus your fix-proposal will make good difference. But I feel it makes sense on HEAD as well.This complete cache invalidation happens multiple times even on HEAD (10k times for the given case). This cache is mostly empty in given test-case, but consider the case where we have huge number of publications and subscriptions (to make this cache have huge number of entries) and then we try to drop 1 large publication with say 40k-50k tables, in that case we might see slowness while traversing and invalidating the concerned cache on HEAD as well. The test case with increased magnitude can be tried for HEAD once to see if we need it on HEAD or not.
thanks
ShvetaOn Mon, Dec 19, 2022 at 5:52 PM Bowen Shi <zxwsbg12138@gmail.com> wrote:
Hello,
Thanks for your advice. I make some tests and this problem can't be
reproduced in PG 14+ version. I think adding a new XLOG type will help
resolve this problem. But I think the following patch may be helpful
in the PG 13 version.The invalidation contains two parts: pgoutput and relfilenodeMap. We
have no way to optimize relfilenodeMap part , since it has been
discussed in previous mails
/messages/by-id/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK+NqhAWJyw8psQ@mail.gmail.com.However, I'd like to contribute a patch to fix pgoutput part. We can skip
invalidating caches after first time with a lazy tag and this works.
It almost doubles the walsender performance while decoding this XLOG.I use the test in the last email and reduce the number of relations in
publications to 1000, the test result is following:Before optimization: 76 min
After optimization: 35 minThough the result is not good enough, I think this patch is still worthy.
Hello,
Thanks for your tests.
No noticeable lag observed on HEAD with the new mechanism of invalidation.
Does the new mechanism of invalidation mean using
'XLOG_XACT_INVALIDATIONS‘ to invalid cache? And did you use perf or
pstack to watch the walsender process.
thanks
BowenShi
Yes using XLOG_XACT_INVALIDATIONS. It reduced the number of invalidations
executed per commandId.
I used pstack. But it was pretty quick, not much chance to see stacks.
No lag seen
on the next insert on subscriber after drop publication, even with
heavier testcase.
thanks
Shveta
Show quoted text
On Wed, Dec 21, 2022 at 10:45 AM Bowen Shi <zxwsbg12138@gmail.com> wrote:
Hello,
Thanks for your tests.No noticeable lag observed on HEAD with the new mechanism of invalidation.
Does the new mechanism of invalidation mean using
'XLOG_XACT_INVALIDATIONS‘ to invalid cache? And did you use perf or
pstack to watch the walsender process.thanks
BowenShi
On Mon, Dec 19, 2022 at 9:58 PM Bowen Shi <zxwsbg12138@gmail.com> wrote:
Hello,
Thanks for your advice. I make some tests and this problem can't be
reproduced in PG 14+ version. I think adding a new XLOG type will help
resolve this problem. But I think the following patch may be helpful
in the PG 13 version.The invalidation contains two parts: pgoutput and relfilenodeMap. We
have no way to optimize relfilenodeMap part , since it has been
discussed in previous mails
/messages/by-id/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK+NqhAWJyw8psQ@mail.gmail.com.However, I'd like to contribute a patch to fix pgoutput part. We can skip
invalidating caches after first time with a lazy tag and this works.
It almost doubles the walsender performance while decoding this XLOG.
I don't see any obvious problem with your suggested fix but it doesn't
appear to be a very good solution as it will still mark all entries
for an invalidation after any valid DML operation. Also, there doesn't
appear to be much benefit in versions >=v14 (as per tests done till
now). So, instead of partially solving your use case for old versions,
isn't it better if you can upgrade to later versions (>=14)?
--
With Regards,
Amit Kapila.