BUG #15313: Waiting `startup` process on a streaming replica
The following bug has been logged on the website:
Bug reference: 15313
Logged by: Victor Yegorov
Email address: vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system: Ubuntu 14.04.5 LTS (trusty)
Description:
We have one customer, that get one of their HotStandby replicas to be stuck
via `startup` process deadlock against session.
Case can be reproduced, we've installed dbg packages and collected
backtraces (I will attach them in a followup e-mail).
This happens during schema changes on the master. Processes is being far
from optimal, as they re-create all triggers,
views and materialized views in the scope. We've recommended this customer
to avoid such heavy schema changes.
Still, as we have evidence at hands, here comes the report.
вт, 7 авг. 2018 г. в 16:36, PG Bug reporting form <noreply@postgresql.org>:
Bug reference: 15313
Logged by: Victor Yegorov
Email address: vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system: Ubuntu 14.04.5 LTS (trusty)
Description:
In the attached files, 22607 is pid of a waiting session (we have had
around 100 sessions in such state)
and 22591 is a pid of a startup process.
--
Victor Yegorov
Attachments:
20180807-15313-pg_stat_activity.txttext/plain; charset=US-ASCII; name=20180807-15313-pg_stat_activity.txtDownload
20180807-15313-pg_locks.txttext/plain; charset=US-ASCII; name=20180807-15313-pg_locks.txtDownload
20180807-15313-gdb.txttext/plain; charset=US-ASCII; name=20180807-15313-gdb.txtDownload
On Tue, Aug 7, 2018 at 4:40 PM, Victor Yegorov <vyegorov@gmail.com> wrote:
вт, 7 авг. 2018 г. в 16:36, PG Bug reporting form <noreply@postgresql.org
:
Bug reference: 15313
Logged by: Victor Yegorov
Email address: vyegorov@gmail.com
PostgreSQL version: 10.4
Operating system: Ubuntu 14.04.5 LTS (trusty)
Description:In the attached files, 22607 is pid of a waiting session (we have had
around 100 sessions in such state)
and 22591 is a pid of a startup process.--
Victor Yegorov
Some additional notes about this problem:
1)deadlock_timeout cannot not resolve this problem or simple cannot detect
it properly.
2)max_standby_streaming_delay doesn't work as well (with sufficiently high
rate of new problem queries from application).
Only one way to fix it on the loaded replica is preventing all new incoming
connections via pg_hba.conf, killing all locked queries and verify that the
offending part of wal had been replayed. No built-in mechanisms designed to
deal with such issues work in that case. Only manual intervention.
I seen such issues like 10 times over last year on different projects. It
isn't once per lifetime issue unfortunately.
--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/ <http://www.postgresql-consulting.com/>
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"