BUG #16327: Too many files in pg_replslot folder
The following bug has been logged on the website:
Bug reference: 16327
Logged by: Dmitry Romanenko
Email address: romdn@inbox.ru
PostgreSQL version: 12.2
Operating system: Windows
Description:
Hello!
4.5 million rows are inserted from the temporary table. For each of the
three subscriptions, 712,000 xid-*-lsn-*-*.spill files of 3 GB are
generated. The following lines appear in the work log: ERROR: replication
slot is busy processing with PID. Replication stops working.
What to do?
PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
postgresql.auto.conf
max_connections = '30'
shared_buffers = '512MB'
effective_cache_size = '6GB'
maintenance_work_mem = '512MB'
checkpoint_completion_target = '0.9'
wal_buffers = '16MB'
default_statistics_target = '100'
work_mem = '21845kB'
min_wal_size = '2GB'
max_wal_size = '8GB'
max_worker_processes = '4'
max_parallel_workers_per_gather = '2'
max_parallel_workers = '4'
max_parallel_maintenance_workers = '2'
random_page_cost = '2'
P.S. Replication on PostgreSql 11.1 (with default settings) worked fine with
inserting 4.5 million rows. The xid - * - lsn - * - *. Snap files were
smaller and larger in size. But if there were several inserts in a row of
4.5 million lines, an error appeared: ERROR: out of memory.
Best regards, Dmitry
On Sun, 29 Mar 2020 at 11:49, PG Bug reporting form <noreply@postgresql.org>
wrote:
P.S. Replication on PostgreSql 11.1 (with default settings) worked fine
with
inserting 4.5 million rows. The xid - * - lsn - * - *. Snap files were
smaller and larger in size. But if there were several inserts in a row of
4.5 million lines, an error appeared: ERROR: out of memory.Could you provide a test case? Are you using subtransactions?
logical_decoding_work_mem [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cec2edfa7859279f36d2374770ca920c59c73dd8 was added in v13 to ameliorate some high
memory usage while decoding transactions. It is weird that it does not 'out
of memory' in v11 too (v11 introduced Generational memory allocator to
reduce logical decoding memory usage); I expect a high memory usage in v10
in a big transaction scenario.
If you are looking for a workaround, you should try to disable all
subscriptions and then enable only one at a time.
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cec2edfa7859279f36d2374770ca920c59c73dd8
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cec2edfa7859279f36d2374770ca920c59c73dd8
Regards,
--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
29.03.2020 22:52, Euler Taveira пишет:
On Sun, 29 Mar 2020 at 11:49, PG Bug reporting form
<noreply@postgresql.org <mailto:noreply@postgresql.org>> wrote:P.S. Replication on PostgreSql 11.1 (with default settings) worked
fine with
inserting 4.5 million rows. The xid - * - lsn - * - *. Snap files were
smaller and larger in size. But if there were several inserts in a
row of
4.5 million lines, an error appeared: ERROR: out of memory.Could you provide a test case? Are you using subtransactions?
logical_decoding_work_mem [1] was added in v13 to ameliorate some high
memory usage while decoding transactions. It is weird that it does not
'out of memory' in v11 too (v11 introduced Generational memory
allocator to reduce logical decoding memory usage); I expect a high
memory usage in v10 in a big transaction scenario.If you are looking for a workaround, you should try to disable all
subscriptions and then enable only one at a time.Regards,
--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for the answer!
Could you provide a test case?
I will try to do a test in the coming days.
Are you using subtransactions?
No. Functions are called sequentially from the windws scheduler. A large
file is loaded into a temporary table. In the cursor, entries are
processed line by line and placed in another temporary table. From which
they are already transferred to the main table.
If you are looking for a workaround, you should try to disable all
subscriptions and then enable only one at a time.
Good, but it will be inconvenient for the administrator. Or back to v11.1.
Today noticed in the repetition process.
Select from pg_stat_replication after a while gives:
pid application_name client_port backend_start state
sent_lsn write_lsn write_lag replay_lag reply_time
4916 sub_255590 2020-03-30 20:38:20 streaming 5/3344C598
5/3344A7D0 00:00:00.0058 00:00:00.0058 2020-03-30 21:42:50
4724 sub_3 558282020-03-30 20:38:34 streaming 5/3344C598
5/3344A7D0 00:00:00.0015 00:00:00.0015 2020-03-30 21:42:50
928 sub_4 59003 2020-03-30 20:38:56 streaming
5/3344C598 5/334442F0 2020-03-30 21:42:49
3328 sub_2 61992 2020-03-30 21:47:44 startup
And on server 2:
ERROR: failed to start broadcast WAL: ERROR: replication slot "sub_2" is
busy processing with PID 4916
And even later:
pid application_name client_port backend_start state
sent_lsn write_lsn write_lag replay_lag reply_time
4536 sub_250299 2020-03-30 22:33:33 startup
528 sub_3 53671 2020-03-30 22:31:28 startup
364 sub_4 51630 2020-03-30 22:28:29 startup
Best regards, Dmitry