BUG #14973: hung queries

Started by Dmitry Shalashovover 8 years ago6 messagesbugs

skaurus@gmail.com

over 8 years ago

The following bug has been logged on the website:

Bug reference: 14973
Logged by: Dmitry Shalashov
Email address: skaurus@gmail.com
PostgreSQL version: 10.1
Operating system: Debian 9
Description:

We stumbled upon queries running for a day or more. They are simple ones, so
that should not be happening. And most of the time it don't - very small
share of these queries ends up like this.

Moreover, these queries couldn't be stopped.

pg_stat_activity says that they all have wait_event_type = IPC, wait_event =
BtreePage, state = active

strace tells that they all inside epoll_wait syscall

grep over ps says that they all are "postgres: bgworker: parallel worker for
PID ..."

Looks like some bug in parallel seq scan maybe?

We are going to disable parallel seq scan and restart our server in like 4
hours from now. I can get more debug if asked before that.

Thomas Munro

thomas.munro@gmail.com

over 8 years ago

In reply to: Dmitry Shalashov (#1)

Re: BUG #14973: hung queries

On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote:

The following bug has been logged on the website:

Bug reference: 14973
Logged by: Dmitry Shalashov
Email address: skaurus@gmail.com
PostgreSQL version: 10.1
Operating system: Debian 9
Description:

We stumbled upon queries running for a day or more. They are simple ones, so
that should not be happening. And most of the time it don't - very small
share of these queries ends up like this.

Moreover, these queries couldn't be stopped.

pg_stat_activity says that they all have wait_event_type = IPC, wait_event =
BtreePage, state = active

strace tells that they all inside epoll_wait syscall

grep over ps says that they all are "postgres: bgworker: parallel worker for
PID ..."

Looks like some bug in parallel seq scan maybe?

We are going to disable parallel seq scan and restart our server in like 4
hours from now. I can get more debug if asked before that.

Hello Dmitry,

Thank you for the report. It sounds like a known bug in 10.0 and 10.1
that was recently fixed:

/messages/by-id/E1ePESn-0005PV-S9@gemulon.postgresql.org

The problem is in Parallel Index Scan for btree. The fix will be in
10.2. One workaround in the meantime would be to disable parallelism
for that query (SET max_parallel_workers_per_gather = 0).

--
Thomas Munro
http://www.enterprisedb.com

Thomas Munro

thomas.munro@gmail.com

over 8 years ago

In reply to: Thomas Munro (#2)

Re: BUG #14973: hung queries

On Tue, Dec 19, 2017 at 6:38 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote:

pg_stat_activity says that they all have wait_event_type = IPC, wait_event =
BtreePage, state = active

/messages/by-id/E1ePESn-0005PV-S9@gemulon.postgresql.org

The problem is in Parallel Index Scan for btree. The fix will be in
10.2. One workaround in the meantime would be to disable parallelism
for that query (SET max_parallel_workers_per_gather = 0).

On second thoughts, a more targeted workaround to avoid just these
buggy parallel index scans without disabling parallelism in general
might be:

SET min_parallel_index_scan_size = '5TB';

(Assuming you don't have any indexes that large.)

--
Thomas Munro
http://www.enterprisedb.com

Dmitry Shalashov

skaurus@gmail.com

over 8 years ago

In reply to: Thomas Munro (#3)

Re: BUG #14973: hung queries

Hi Thomas,

I'm glad to help. Thanks for the advice!

By the way, there was a mistake in my bug report - wait_event actually
was BgWorkerShutdown.

Dmitry Shalashov, relap.io & surfingbird.ru

2017-12-18 22:55 GMT+03:00 Thomas Munro <thomas.munro@enterprisedb.com>:

Show quoted text

On Tue, Dec 19, 2017 at 6:38 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Dec 15, 2017 at 1:31 AM, <skaurus@gmail.com> wrote:

pg_stat_activity says that they all have wait_event_type = IPC,

wait_event =

BtreePage, state = active

/messages/by-id/E1ePESn-0005PV-S9%25

40gemulon.postgresql.org

The problem is in Parallel Index Scan for btree. The fix will be in
10.2. One workaround in the meantime would be to disable parallelism
for that query (SET max_parallel_workers_per_gather = 0).

On second thoughts, a more targeted workaround to avoid just these
buggy parallel index scans without disabling parallelism in general
might be:

SET min_parallel_index_scan_size = '5TB';

(Assuming you don't have any indexes that large.)

--
Thomas Munro
http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Dmitry Shalashov (#4)

Re: BUG #14973: hung queries

On Tue, Dec 19, 2017 at 2:48 AM, Dmitry Shalashov <skaurus@gmail.com> wrote:

Hi Thomas,

I'm glad to help. Thanks for the advice!

By the way, there was a mistake in my bug report - wait_event actually was
BgWorkerShutdown.

I think BgWorkerShutdown type of wait event can be only for the master
backend not for all the workers. Are there any other wait events?
Can we get a stack trace of one or more workers?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Amit Kapila (#5)

Re: BUG #14973: hung queries

On Tue, Dec 19, 2017 at 4:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 19, 2017 at 2:48 AM, Dmitry Shalashov <skaurus@gmail.com> wrote:

Hi Thomas,

I'm glad to help. Thanks for the advice!

By the way, there was a mistake in my bug report - wait_event actually was
BgWorkerShutdown.

I think BgWorkerShutdown type of wait event can be only for the master
backend not for all the workers.

Yeah, that's what happens when calling
WaitForBackgroundWorkerShutdown() as the primary backend waits for all
the workers to stop. You can see it as well this wait event in a
logirep launcher by the way.
--
Michael