Segmentation fault occurs when the standby becomes primary, in SR

Started by Fujii Masaoalmost 16 years ago4 messages
#1Fujii Masao
masao.fujii@gmail.com
1 attachment(s)

Hi,

When I created the trigger file to activate the standby server,
I got the segmentation fault:

sby [11342]: LOG: trigger file found: ../trigger
sby [11343]: FATAL: terminating walreceiver process due to
administrator command
sby [11342]: LOG: redo done at 0/10000E0
sby [11342]: LOG: last completed transaction was at log time
2000-01-01 09:21:04.685861+09
sby [11341]: LOG: startup process (PID 11342) was terminated by
signal 11: Segmentation fault
sby [11341]: LOG: terminating any other active server processes

This happens in the following scenario:

0. The trigger file is found.
1. The variable StandbyMode is reset to FALSE before re-fetching
the last applied record.
2. That record attempts to be read from the archive.
3. RestoreArchivedFile() goes through the following condition
expression because the StandbyMode is off.

if (StandbyMode && recoveryRestoreCommand == NULL)
goto not_available;

4. RestoreArchivedFile() wrongly constructs the command to be
executed even though restore_command has not been supplied
(this is possible in standby mode).
---> Segmentation fault!

The attached patch would fix the bug.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

fix_segv_in_sr.patchtext/x-patch; charset=US-ASCII; name=fix_segv_in_sr.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 2759,2766 **** RestoreArchivedFile(char *path, const char *xlogfname,
  	uint32		restartLog;
  	uint32		restartSeg;
  
! 	/* In standby mode, restore_command might not be supplied */
! 	if (StandbyMode && recoveryRestoreCommand == NULL)
  		goto not_available;
  
  	/*
--- 2759,2769 ----
  	uint32		restartLog;
  	uint32		restartSeg;
  
! 	/*
! 	 * Returns FALSE if restore_command has not been supplied. This is
! 	 * possible in standby mode.
! 	 */
! 	if (recoveryRestoreCommand == NULL)
  		goto not_available;
  
  	/*
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#1)
Re: Segmentation fault occurs when the standby becomes primary, in SR

Fujii Masao wrote:

When I created the trigger file to activate the standby server,
I got the segmentation fault:

...
The attached patch would fix the bug.

Thanks, committed. (I kept the old comment, though, I liked it better)

Now, whether we should even allow setting up a standby without
restore_command is another question. It's *possible*, but you need to
enable archiving in the master anyway to take an on-line backup, and you
need the archive to catch up if the standby ever falls behind too much.

Then again, if the database is small, maybe you don't mind taking a new
base backup if the standby falls behind. And you *can* take a base
backup with a dummy archive_command (ie. archive_command='/bin/true'),
if you trust that the WAL files stay in pg_xlog long enough for standby
to stream them from there.

Perhaps we should require a restore_command. If you know what you're
doing, you can always use '/bin/false' as restore_command to hack around it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: Re: Segmentation fault occurs when the standby becomes primary, in SR

On Thu, Jan 28, 2010 at 2:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Perhaps we should require a restore_command. If you know what you're
doing, you can always use '/bin/false' as restore_command to hack around it.

That seems kind of needlessly hacky (and it won't work on Windows).
Seems like it doesn't cost anything to let it be omitted altogether.

...Robert

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: Segmentation fault occurs when the standby becomes primary, in SR

On Fri, Jan 29, 2010 at 4:23 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Thanks, committed. (I kept the old comment, though, I liked it better)

Thanks!

Then again, if the database is small, maybe you don't mind taking a new
base backup if the standby falls behind. And you *can* take a base
backup with a dummy archive_command (ie. archive_command='/bin/true'),
if you trust that the WAL files stay in pg_xlog long enough for standby
to stream them from there.

Yeah, this is one of the case that restore_command is not required
for SR.

Perhaps we should require a restore_command. If you know what you're
doing, you can always use '/bin/false' as restore_command to hack around it.

One of main aim of SR is an easy-to-setup. So I don't want to
impose such a hacky setting of restore_command on users.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center