BUG #14973: hung queries
The following bug has been logged on the website:
Bug reference: 14973
Logged by: Dmitry Shalashov
Email address: skaurus@gmail.com
PostgreSQL version: 10.1
Operating system: Debian 9
Description:
We stumbled upon queries running for a day or more. They are simple ones, so
that should not be happening. And most of the time it don't - very small
share of these queries ends up like this.
Moreover, these queries couldn't be stopped.
pg_stat_activity says that they all have wait_event_type = IPC, wait_event =
BtreePage, state = active
strace tells that they all inside epoll_wait syscall
grep over ps says that they all are "postgres: bgworker: parallel worker for
PID ..."
Looks like some bug in parallel seq scan maybe?
We are going to disable parallel seq scan and restart our server in like 4
hours from now. I can get more debug if asked before that.
On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote:
The following bug has been logged on the website:
Bug reference: 14973
Logged by: Dmitry Shalashov
Email address: skaurus@gmail.com
PostgreSQL version: 10.1
Operating system: Debian 9
Description:We stumbled upon queries running for a day or more. They are simple ones, so
that should not be happening. And most of the time it don't - very small
share of these queries ends up like this.Moreover, these queries couldn't be stopped.
pg_stat_activity says that they all have wait_event_type = IPC, wait_event =
BtreePage, state = activestrace tells that they all inside epoll_wait syscall
grep over ps says that they all are "postgres: bgworker: parallel worker for
PID ..."Looks like some bug in parallel seq scan maybe?
We are going to disable parallel seq scan and restart our server in like 4
hours from now. I can get more debug if asked before that.
Hello Dmitry,
Thank you for the report. It sounds like a known bug in 10.0 and 10.1
that was recently fixed:
/messages/by-id/E1ePESn-0005PV-S9@gemulon.postgresql.org
The problem is in Parallel Index Scan for btree. The fix will be in
10.2. One workaround in the meantime would be to disable parallelism
for that query (SET max_parallel_workers_per_gather = 0).
--
Thomas Munro
http://www.enterprisedb.com
On Tue, Dec 19, 2017 at 6:38 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote:
pg_stat_activity says that they all have wait_event_type = IPC, wait_event =
BtreePage, state = active/messages/by-id/E1ePESn-0005PV-S9@gemulon.postgresql.org
The problem is in Parallel Index Scan for btree. The fix will be in
10.2. One workaround in the meantime would be to disable parallelism
for that query (SET max_parallel_workers_per_gather = 0).
On second thoughts, a more targeted workaround to avoid just these
buggy parallel index scans without disabling parallelism in general
might be:
SET min_parallel_index_scan_size = '5TB';
(Assuming you don't have any indexes that large.)
--
Thomas Munro
http://www.enterprisedb.com
Hi Thomas,
I'm glad to help. Thanks for the advice!
By the way, there was a mistake in my bug report - wait_event actually
was BgWorkerShutdown.
Dmitry Shalashov, relap.io & surfingbird.ru
2017-12-18 22:55 GMT+03:00 Thomas Munro <thomas.munro@enterprisedb.com>:
Show quoted text
On Tue, Dec 19, 2017 at 6:38 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote:
pg_stat_activity says that they all have wait_event_type = IPC,
wait_event =
BtreePage, state = active
40gemulon.postgresql.org
The problem is in Parallel Index Scan for btree. The fix will be in
10.2. One workaround in the meantime would be to disable parallelism
for that query (SET max_parallel_workers_per_gather = 0).On second thoughts, a more targeted workaround to avoid just these
buggy parallel index scans without disabling parallelism in general
might be:SET min_parallel_index_scan_size = '5TB';
(Assuming you don't have any indexes that large.)
--
Thomas Munro
http://www.enterprisedb.com
On Tue, Dec 19, 2017 at 2:48 AM, Dmitry Shalashov <skaurus@gmail.com> wrote:
Hi Thomas,
I'm glad to help. Thanks for the advice!
By the way, there was a mistake in my bug report - wait_event actually was
BgWorkerShutdown.
I think BgWorkerShutdown type of wait event can be only for the master
backend not for all the workers. Are there any other wait events?
Can we get a stack trace of one or more workers?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 19, 2017 at 4:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 19, 2017 at 2:48 AM, Dmitry Shalashov <skaurus@gmail.com> wrote:
Hi Thomas,
I'm glad to help. Thanks for the advice!
By the way, there was a mistake in my bug report - wait_event actually was
BgWorkerShutdown.I think BgWorkerShutdown type of wait event can be only for the master
backend not for all the workers.
Yeah, that's what happens when calling
WaitForBackgroundWorkerShutdown() as the primary backend waits for all
the workers to stop. You can see it as well this wait event in a
logirep launcher by the way.
--
Michael