BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Started by PG Bug reporting formalmost 7 years ago9 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15815
Logged by: Steve I
Email address: postgres-ca@byerquest.com
PostgreSQL version: 9.6.12
Operating system: Amazon Aurora
Description:

AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:

LOG: server process (PID 31294) was terminated by signal 11: Segmentation
fault
DETAIL: Failed process was running:
simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000))

= '{string}'

LOG: terminating any other active server processes
FATAL: Can't handle storage runtime process crash

This specific SQL will cause a segfault on our dataset 100%. If I change any
part of it it won't e.g. remove lower, or substring, or change > to <, or
any part of the string. We have a few other variations, but this example is
the most often reported and reproducible.

Guidance on if this is a know issue, how to provide additional information
to further trace it in an AWS environment, or how to bypass it, is most
appreciated.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

PG Bug reporting form <noreply@postgresql.org> writes:

AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:

LOG: server process (PID 31294) was terminated by signal 11: Segmentation
fault
DETAIL: Failed process was running:
simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000))

= '{string}'

LOG: terminating any other active server processes

Huh. Can you get a stack trace from that?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

Also, could we see the definition of the table (psql \d would be
helpful)?

regards, tom lane

In reply to: PG Bug reporting form (#1)
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Em ter, 21 de mai de 2019 às 11:27, PG Bug reporting form
<noreply@postgresql.org> escreveu:

AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:

Aurora is a Postgres fork so you should report it to Amazon. However, ...

LOG: server process (PID 31294) was terminated by signal 11: Segmentation
fault
DETAIL: Failed process was running:
simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000))

= '{string}'

LOG: terminating any other active server processes
FATAL: Can't handle storage runtime process crash

Could you reproduce it with stock Postgres? Could you provide a test case?

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#4Steve
postgres-ca@byrequest.com
In reply to: Euler Taveira de Oliveira (#3)
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

I probably need AWS hands to get a trace from behind the curtain. Started a
thread there
https://forums.aws.amazon.com/thread.jspa?threadID=303488&amp;tstart=0

Column | Type | Modifiers
----------------+---------+------------------------------------------------------
a | integer | not null default
nextval('{table3}_seq'::regclass)
b | integer |
c | integer |
d | text |
Indexes:
… PRIMARY KEY, btree (a)
… UNIQUE CONSTRAINT, btree (b, c)
… btree (b)
… btree (c)
… btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b)
… btree (lower("substring"(d, 1, 1000)), b)
Foreign-key constraints:
… FOREIGN KEY (b) REFERENCES {table2}(b)
… FOREIGN KEY (c) REFERENCES {table1}(c)

200G so it would/will take time to run tests on stock Postgres.

On Tue, May 21, 2019 at 8:46 AM Euler Taveira <euler@timbira.com.br> wrote:

Em ter, 21 de mai de 2019 às 11:27, PG Bug reporting form
<noreply@postgresql.org> escreveu:

AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:

Aurora is a Postgres fork so you should report it to Amazon. However, ...

LOG: server process (PID 31294) was terminated by signal 11:

Segmentation

fault
DETAIL: Failed process was running:
simplied >>> SELECT value FROM {table} WHERE

lower(substring(value,1,1000))

= '{string}'

LOG: terminating any other active server processes
FATAL: Can't handle storage runtime process crash

Could you reproduce it with stock Postgres? Could you provide a test case?

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve (#4)
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Steve <postgres-ca@byrequest.com> writes:

Column | Type | Modifiers
----------------+---------+------------------------------------------------------
a | integer | not null default
nextval('{table3}_seq'::regclass)
b | integer |
c | integer |
d | text |
Indexes:
… PRIMARY KEY, btree (a)
… UNIQUE CONSTRAINT, btree (b, c)
… btree (b)
… btree (c)
… btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b)
… btree (lower("substring"(d, 1, 1000)), b)
Foreign-key constraints:
… FOREIGN KEY (b) REFERENCES {table2}(b)
… FOREIGN KEY (c) REFERENCES {table1}(c)

Hm, so this query is probably using the last of those indexes ---
could we see EXPLAIN output to confirm that?

If so, a plausible explanation is that a portion of that index is corrupt,
although it's certainly not very nice that you're getting a crash rather
than an error report.

If you're in a hurry to restore functionality, dropping and recreating
that index would likely make the problem go away ... but it would also
destroy the evidence we'd need to find the cause of the crash. So if
you can hold off till we see the stack trace, that'd be nice.

regards, tom lane

#6Steve
postgres-ca@byrequest.com
In reply to: Tom Lane (#5)
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Yeah I agree, and it is, ... our first step was to regenerate all the
indexes, but the segfault persists.

We're reproducing the case in a restored-from-snapshot db. Perhaps I'll
reindex again there. Since we have a AWS snapshot we can jump back pretty
fast to retest.

BINGO, An AWS Development Manager just stepped in… They've identified the
problem and deploying a patch release
https://forums.aws.amazon.com/thread.jspa?messageID=901775&amp;#901775

On Tue, May 21, 2019 at 9:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Steve <postgres-ca@byrequest.com> writes:

Column | Type | Modifiers

----------------+---------+------------------------------------------------------

a | integer | not null default
nextval('{table3}_seq'::regclass)
b | integer |
c | integer |
d | text |
Indexes:
… PRIMARY KEY, btree (a)
… UNIQUE CONSTRAINT, btree (b, c)
… btree (b)
… btree (c)
… btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b)
… btree (lower("substring"(d, 1, 1000)), b)
Foreign-key constraints:
… FOREIGN KEY (b) REFERENCES {table2}(b)
… FOREIGN KEY (c) REFERENCES {table1}(c)

Hm, so this query is probably using the last of those indexes ---
could we see EXPLAIN output to confirm that?

If so, a plausible explanation is that a portion of that index is corrupt,
although it's certainly not very nice that you're getting a crash rather
than an error report.

If you're in a hurry to restore functionality, dropping and recreating
that index would likely make the problem go away ... but it would also
destroy the evidence we'd need to find the cause of the crash. So if
you can hold off till we see the stack trace, that'd be nice.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve (#6)
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Steve <postgres-ca@byrequest.com> writes:

BINGO, An AWS Development Manager just stepped in… They've identified the
problem and deploying a patch release
https://forums.aws.amazon.com/thread.jspa?messageID=901775&amp;#901775

Oh, so it was their bug not ours? Sure wish there was more detail there.

regards, tom lane

#8Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#7)
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

On Tue, May 21, 2019 at 03:25:11PM -0400, Tom Lane wrote:

Steve <postgres-ca@byrequest.com> writes:

BINGO, An AWS Development Manager just stepped in… They've identified the
problem and deploying a patch release
https://forums.aws.amazon.com/thread.jspa?messageID=901775&amp;#901775

Oh, so it was their bug not ours? Sure wish there was more detail there.

Aurora uses a different engine than Postgres as far as I understood,
so we may likely not be impacted by that.. Let's see if we get any
feedback.
--
Michael

#9Steve
postgres-ca@byrequest.com
In reply to: Michael Paquier (#8)
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault

Patched Aurora from 1.5.0 to 1.5.1 (which fixes an issue with index
prefetch). The issue appears to be fully resolved.

Thanks for the help leading into this.

On Tue, May 21, 2019 at 7:21 PM Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Tue, May 21, 2019 at 03:25:11PM -0400, Tom Lane wrote:

Steve <postgres-ca@byrequest.com> writes:

BINGO, An AWS Development Manager just stepped in… They've identified

the

problem and deploying a patch release
https://forums.aws.amazon.com/thread.jspa?messageID=901775&amp;#901775

Oh, so it was their bug not ours? Sure wish there was more detail there.

Aurora uses a different engine than Postgres as far as I understood,
so we may likely not be impacted by that.. Let's see if we get any
feedback.
--
Michael