BUG #16094: Database entering recovery mode

Started by PG Bug reporting formover 6 years ago5 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16094
Logged by: Mircea Pirv
Email address: mircea@reva.tech
PostgreSQL version: 12.0
Operating system: OSX
Description:

Hello,

We've updated our database from 10.6 to 12.0 recently and we keep
encountering an error which says that the database is entering recovery
mode.
Our setup has replication enabled, with a replication slot, and using a
replication subscription. On our development machines replication is setup
to use the same machine.
The problem is, that with replication enabled, changing a column value using
an update statement, causes the database to enter recovery mode.
This happens only when replication is enabled.
We could not find any pattern, like what kind of columns might be the
culprit. It happens with all kinds of columns, regardless of the fact that
we have constraints, triggers, defaults on the columns. With some columns it
happens randomly, with some it happens all the time.

In the logs the only error we see, is a segmentation fault, when replication
tries to run.
We tried with both a local server, and a dockerized version (12.0).

It only reproduces in 12.0. 10.6 and 11.5 do not have this problem, we could
not reproduce it.

Any ideas on what may have changed in 12.0 to cause this behavior ?

Thanks,
Mircea

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #16094: Database entering recovery mode

PG Bug reporting form <noreply@postgresql.org> writes:

We've updated our database from 10.6 to 12.0 recently and we keep
encountering an error which says that the database is entering recovery
mode.
...
In the logs the only error we see, is a segmentation fault, when replication
tries to run.

It should be possible to collect a stack trace for the segfault,
which would greatly assist debugging this. Please see

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane

#3Mircea Pirv
mircea@reva.tech
In reply to: Tom Lane (#2)
Re: BUG #16094: Database entering recovery mode

This is all that I managed to retrieve using gdb

#0 0x00007f53d89eb7b7 in epoll_wait (epfd=3, events=0x55bc137a59c0,
maxevents=1, maxevents@entry=<error reading variable: Cannot access memory
at address 0x7ffee2d44afc>, timeout=-1, timeout@entry=<error reading
variable: Cannot access memory at address 0x7ffee2d44ae8>) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
resultvar = 18446744073709551612
sc_ret = <optimized out>
#1 0x000055bc11790379 in WaitEventSetWaitBlock (nevents=<error reading
variable: Cannot access memory at address 0x7ffee2d44afc>,
occurred_events=<error reading variable: Cannot access memory at address
0x7ffee2d44ad8>, cur_timeout=<error reading variable: Cannot access memory
at address 0x7ffee2d44ae8>, set=0x55bc137a5948) at
./build/../src/backend/storage/ipc/latch.c:1080
returned_events = 0
rc = <optimized out>
cur_event = <optimized out>
cur_epoll_event = <optimized out>
returned_events = <optimized out>
rc = <optimized out>
cur_event = <optimized out>
cur_epoll_event = <optimized out>
__func__ = <error reading variable __func__ (Cannot access memory
at address 0x55bc11a5a510)>
__errno_location = <optimized out>
#2 WaitEventSetWait (set=0x55bc137a5948, timeout=<error reading variable:
Cannot access memory at address 0x7ffee2d44ae0>, occurred_events=<error
reading variable: Cannot access memory at address 0x7ffee2d44ad8>,
nevents=<error reading variable: Cannot access memory at address
0x7ffee2d44afc>, wait_event_info=<optimized out>) at
./build/../src/backend/storage/ipc/latch.c:1032
rc = <optimized out>
returned_events = 0
start_time = <error reading variable start_time (Cannot access
memory at address 0x7ffee2d44b00)>
cur_time = <error reading variable cur_time (Cannot access memory
at address 0x7ffee2d44b10)>
cur_timeout = <error reading variable cur_timeout (Cannot access
memory at address 0x7ffee2d44ae8)>
Backtrace stopped: Cannot access memory at address 0x7ffee2d44b78

On Mon, Nov 4, 2019 at 5:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

PG Bug reporting form <noreply@postgresql.org> writes:

We've updated our database from 10.6 to 12.0 recently and we keep
encountering an error which says that the database is entering recovery
mode.
...
In the logs the only error we see, is a segmentation fault, when

replication

tries to run.

It should be possible to collect a stack trace for the segfault,
which would greatly assist debugging this. Please see

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane

#4Michael Paquier
michael@paquier.xyz
In reply to: Mircea Pirv (#3)
Re: BUG #16094: Database entering recovery mode

On Wed, Nov 06, 2019 at 03:37:31PM +0200, Mircea Pirv wrote:

#2 WaitEventSetWait (set=0x55bc137a5948, timeout=<error reading variable:
Cannot access memory at address 0x7ffee2d44ae0>, occurred_events=<error
reading variable: Cannot access memory at address 0x7ffee2d44ad8>,
nevents=<error reading variable: Cannot access memory at address
0x7ffee2d44afc>, wait_event_info=<optimized out>) at
./build/../src/backend/storage/ipc/latch.c:1032
rc = <optimized out>
returned_events = 0
start_time = <error reading variable start_time (Cannot access
memory at address 0x7ffee2d44b00)>
cur_time = <error reading variable cur_time (Cannot access memory
at address 0x7ffee2d44b10)>
cur_timeout = <error reading variable cur_timeout (Cannot access
memory at address 0x7ffee2d44ae8)>
Backtrace stopped: Cannot access memory at address 0x7ffee2d44b78

Is this the only part of the stack you can get? WaitEventSetWait()
gets called in three places, which are be-secure.c, syslogger.c and
condition_variable.c. Could it be possible to see more of the actual
callers here?
--
Michael

#5Mircea Pirv
mircea@reva.tech
In reply to: Michael Paquier (#4)
Re: BUG #16094: Database entering recovery mode

Hi, Thanks for the reply. That's all the bt full command prints out when
the segmentation fault error occurs.

Thanks.
Mircea

On Thu, Nov 7, 2019 at 4:35 AM Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Wed, Nov 06, 2019 at 03:37:31PM +0200, Mircea Pirv wrote:

#2 WaitEventSetWait (set=0x55bc137a5948, timeout=<error reading

variable:

Cannot access memory at address 0x7ffee2d44ae0>, occurred_events=<error
reading variable: Cannot access memory at address 0x7ffee2d44ad8>,
nevents=<error reading variable: Cannot access memory at address
0x7ffee2d44afc>, wait_event_info=<optimized out>) at
./build/../src/backend/storage/ipc/latch.c:1032
rc = <optimized out>
returned_events = 0
start_time = <error reading variable start_time (Cannot access
memory at address 0x7ffee2d44b00)>
cur_time = <error reading variable cur_time (Cannot access memory
at address 0x7ffee2d44b10)>
cur_timeout = <error reading variable cur_timeout (Cannot access
memory at address 0x7ffee2d44ae8)>
Backtrace stopped: Cannot access memory at address 0x7ffee2d44b78

Is this the only part of the stack you can get? WaitEventSetWait()
gets called in three places, which are be-secure.c, syslogger.c and
condition_variable.c. Could it be possible to see more of the actual
callers here?
--
Michael