pg_stat_archiver issue with aborted archiver

Started by Julien Rouhaudover 10 years ago5 messages
#1Julien Rouhaud
julien.rouhaud@dalibo.com
1 attachment(s)

Hello,

I just noticed that if the archiver aborts (for instance if the
archive_command exited with a return code > 127), pg_stat_archiver won't
report those failed attempts. This happens with both 9.4 and 9.5 branches.

Please find attached a patch that fix this issue, based on current head.

Regards.
--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Attachments:

fix_archiver.patchtext/x-patch; name=fix_archiver.patchDownload
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 578,585 **** pgarch_archiveXlog(char *xlog)
  		 *
  		 * Per the Single Unix Spec, shells report exit status > 128 when a
  		 * called command died on a signal.
  		 */
! 		int			lev = (WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128) ? FATAL : LOG;
  
  		if (WIFEXITED(rc))
  		{
--- 578,595 ----
  		 *
  		 * Per the Single Unix Spec, shells report exit status > 128 when a
  		 * called command died on a signal.
+ 		 *
+ 		 * If the archiver abort, we still need to tell the collector about
+ 		 * the WAL file that we failed to archive.
  		 */
! 		int		lev;
! 		if (WIFSIGNALED(rc) || WEXITSTATUS(rc) > 128)
! 		{
! 			lev = FATAL;
! 			pgstat_send_archiver(xlog, true);
! 		}
! 		else
! 			lev = LOG;
  
  		if (WIFEXITED(rc))
  		{

#2Michael Paquier
michael.paquier@gmail.com
In reply to: Julien Rouhaud (#1)
Re: pg_stat_archiver issue with aborted archiver

On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

I just noticed that if the archiver aborts (for instance if the
archive_command exited with a return code > 127), pg_stat_archiver won't
report those failed attempts. This happens with both 9.4 and 9.5 branches.

Please find attached a patch that fix this issue, based on current head.

The current code seems right to me. When the archive command dies
because of a signal (exit code > 128), the server should fail
immediately with FATAL and should not do any extra processing. It will
also try to archive again the same segment file after restart. When
trying again, if this time the failure is not caused by a signal but
still fails it will be reported to pg_stat_archiver.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Julien Rouhaud
julien.rouhaud@dalibo.com
In reply to: Michael Paquier (#2)
Re: pg_stat_archiver issue with aborted archiver

Le 08/06/2015 05:56, Michael Paquier a écrit :

On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

I just noticed that if the archiver aborts (for instance if the
archive_command exited with a return code > 127),
pg_stat_archiver won't report those failed attempts. This happens
with both 9.4 and 9.5 branches.

Please find attached a patch that fix this issue, based on
current head.

The current code seems right to me. When the archive command dies
because of a signal (exit code > 128), the server should fail
immediately with FATAL and should not do any extra processing.

Ok. It may be worth to document it though.

It will also try to archive again the same segment file after
restart. When trying again, if this time the failure is not caused
by a signal but still fails it will be reported to
pg_stat_archiver.

Yes, my comment was only about the failure not reported in some
special cases.

Thank for your response.
--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Julien Rouhaud (#3)
Re: pg_stat_archiver issue with aborted archiver

On Mon, Jun 8, 2015 at 5:17 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

Le 08/06/2015 05:56, Michael Paquier a écrit :

On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

I just noticed that if the archiver aborts (for instance if the
archive_command exited with a return code > 127),
pg_stat_archiver won't report those failed attempts. This happens
with both 9.4 and 9.5 branches.

Please find attached a patch that fix this issue, based on
current head.

The current code seems right to me. When the archive command dies
because of a signal (exit code > 128), the server should fail
immediately with FATAL and should not do any extra processing.

In that case, ISTM that the archiver process dies with FATAL but
the server not. No? Then the archiver is restarted by postmaster.
If my understanding is right, it seems worth applying something like
Julien's patch.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#4)
Re: pg_stat_archiver issue with aborted archiver

On Tue, Jun 9, 2015 at 4:23 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Jun 8, 2015 at 5:17 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

Le 08/06/2015 05:56, Michael Paquier a écrit :

On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:

I just noticed that if the archiver aborts (for instance if the
archive_command exited with a return code > 127),
pg_stat_archiver won't report those failed attempts. This happens
with both 9.4 and 9.5 branches.

Please find attached a patch that fix this issue, based on
current head.

The current code seems right to me. When the archive command dies
because of a signal (exit code > 128), the server should fail
immediately with FATAL and should not do any extra processing.

In that case, ISTM that the archiver process dies with FATAL but
the server not. No? Then the archiver is restarted by postmaster.
If my understanding is right, it seems worth applying something like
Julien's patch.

Er, sure. Please understand the archiver process... My point is that
3ad0728 introduced the behavior that we have now in pgarch.c, and that
we should immediately bail out from the archiver process without
interacting with pgstat, the archiver coming back to this file
archiving at restart, and only use pgstat_send_archiver when there is
a status from pgarch_archiveXlog().
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers