[bug fix] Stats collector is not restarted on the standby

Started by Tsunakawa, Takayukiabout 9 years ago10 messages
#1Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
1 attachment(s)

Hello,

If the stats collector is forcibly terminated on the standby in streaming replication configuration, it won't be restarted until the standby is promoted to the primary. The attached patch restarts the stats collector on the standby.

FYI, when the stats collector is down, SELECTs against the statistics views get stale data with the following message.

LOG: using stale statistics instead of current ones because stats collector is not responding
STATEMENT: select * from pg_stat_user_tables

Regards
Takayuki Tsunakawa

Attachments:

stats_collector_not_restarted.patchapplication/octet-stream; name=stats_collector_not_restarted.patchDownload
diff -Nacr a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
*** a/src/backend/postmaster/postmaster.c	2016-10-25 13:17:23.000000000 +0900
--- b/src/backend/postmaster/postmaster.c	2016-10-26 13:44:32.000000000 +0900
***************
*** 1753,1759 ****
  		}
  
  		/* If we have lost the stats collector, try to start a new one */
! 		if (PgStatPID == 0 && pmState == PM_RUN)
  			PgStatPID = pgstat_start();
  
  		/* If we have lost the archiver, try to start a new one. */
--- 1753,1760 ----
  		}
  
  		/* If we have lost the stats collector, try to start a new one */
! 		if (PgStatPID == 0 &&
! 			(pmState == PM_RUN || pmState == PM_HOT_STANDBY))
  			PgStatPID = pgstat_start();
  
  		/* If we have lost the archiver, try to start a new one. */
***************
*** 2963,2969 ****
  			if (!EXIT_STATUS_0(exitstatus))
  				LogChildExit(LOG, _("statistics collector process"),
  							 pid, exitstatus);
! 			if (pmState == PM_RUN)
  				PgStatPID = pgstat_start();
  			continue;
  		}
--- 2964,2970 ----
  			if (!EXIT_STATUS_0(exitstatus))
  				LogChildExit(LOG, _("statistics collector process"),
  							 pid, exitstatus);
! 			if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
  				PgStatPID = pgstat_start();
  			continue;
  		}
#2Michael Paquier
michael.paquier@gmail.com
In reply to: Tsunakawa, Takayuki (#1)
Re: [bug fix] Stats collector is not restarted on the standby

On Wed, Oct 26, 2016 at 2:46 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

If the stats collector is forcibly terminated on the standby in streaming replication configuration, it won't be restarted until the standby is promoted to the primary. The attached patch restarts the stats collector on the standby.

FYI, when the stats collector is down, SELECTs against the statistics views get stale data with the following message.

LOG: using stale statistics instead of current ones because stats collector is not responding
STATEMENT: select * from pg_stat_user_tables

Oops. This could be a problem for some applications... As far as I can
see and after playing with it, your patch looks correct.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Michael Paquier (#2)
Re: [bug fix] Stats collector is not restarted on the standby

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
Oops. This could be a problem for some applications... As far as I can see
and after playing with it, your patch looks correct.

Thank you for checking the patch. I'm relieved.

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Michael Paquier
michael.paquier@gmail.com
In reply to: Tsunakawa, Takayuki (#3)
Re: [bug fix] Stats collector is not restarted on the standby

On Wed, Oct 26, 2016 at 4:10 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
Oops. This could be a problem for some applications... As far as I can see
and after playing with it, your patch looks correct.

Thank you for checking the patch. I'm relieved.

It would be a good idea to add that to next CF if nobody pops into the
thread so as we don't forget about it.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Michael Paquier (#4)
Re: [bug fix] Stats collector is not restarted on the standby

From: Michael Paquier [mailto:michael.paquier@gmail.com]

It would be a good idea to add that to next CF if nobody pops into the thread
so as we don't forget about it.

Thanks for the notice, done.

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Michael Paquier (#2)
Re: [bug fix] Stats collector is not restarted on the standby

On Wed, Oct 26, 2016 at 12:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Oct 26, 2016 at 2:46 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

If the stats collector is forcibly terminated on the standby in streaming replication configuration, it won't be restarted until the standby is promoted to the primary. The attached patch restarts the stats collector on the standby.

FYI, when the stats collector is down, SELECTs against the statistics views get stale data with the following message.

LOG: using stale statistics instead of current ones because stats collector is not responding
STATEMENT: select * from pg_stat_user_tables

Oops. This could be a problem for some applications... As far as I can
see and after playing with it, your patch looks correct.
--

I've tested with the patch. The patch doesn't solve the problem
completely. In standby, after forcible termination, statistics
collector process is taking some time to get restarted. In between, if
somebody SELECTs against the statistics views, he will still get stale
data with the above LOG message.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Michael Paquier
michael.paquier@gmail.com
In reply to: Kuntal Ghosh (#6)
Re: [bug fix] Stats collector is not restarted on the standby

On Wed, Oct 26, 2016 at 7:12 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

On Wed, Oct 26, 2016 at 12:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Oct 26, 2016 at 2:46 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

If the stats collector is forcibly terminated on the standby in streaming replication configuration, it won't be restarted until the standby is promoted to the primary. The attached patch restarts the stats collector on the standby.

FYI, when the stats collector is down, SELECTs against the statistics views get stale data with the following message.

LOG: using stale statistics instead of current ones because stats collector is not responding
STATEMENT: select * from pg_stat_user_tables

Oops. This could be a problem for some applications... As far as I can
see and after playing with it, your patch looks correct.
--

I've tested with the patch. The patch doesn't solve the problem
completely. In standby, after forcible termination, statistics
collector process is taking some time to get restarted. In between, if
somebody SELECTs against the statistics views, he will still get stale
data with the above LOG message.

If you test on a master node that would be the same: there is a delay
until the stats process restart. I have not looked at the code closely
enough in this area (reaper()?) to determine if there are ways to
improve the responsiveness of this process restart that is a
non-auxiliary proces btw, still improving this behavior is something I
feel would be invasive, and something that would be dedicated to HEAD.
The patch proposed here by Tsunakawa-san makes at least sure that a
node in PM_HOT_STANDBY state restarts it.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kuntal Ghosh
kuntalghosh.2007@gmail.com
In reply to: Michael Paquier (#7)
Re: [bug fix] Stats collector is not restarted on the standby

On Wed, Oct 26, 2016 at 4:47 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

If you test on a master node that would be the same: there is a delay
until the stats process restart. I have not looked at the code closely
enough in this area (reaper()?) to determine if there are ways to
improve the responsiveness of this process restart that is a
non-auxiliary proces btw, still improving this behavior is something I
feel would be invasive, and something that would be dedicated to HEAD.
The patch proposed here by Tsunakawa-san makes at least sure that a
node in PM_HOT_STANDBY state restarts it.

Yes, you are right. Master also has some delay for restarting the
process. Otherwise, the patch solves the problem.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Kuntal Ghosh (#8)
Re: [bug fix] Stats collector is not restarted on the standby

On Wed, Oct 26, 2016 at 8:41 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:

On Wed, Oct 26, 2016 at 4:47 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

If you test on a master node that would be the same: there is a delay
until the stats process restart. I have not looked at the code closely
enough in this area (reaper()?) to determine if there are ways to
improve the responsiveness of this process restart that is a
non-auxiliary proces btw, still improving this behavior is something I
feel would be invasive, and something that would be dedicated to HEAD.
The patch proposed here by Tsunakawa-san makes at least sure that a
node in PM_HOT_STANDBY state restarts it.

Yes, you are right. Master also has some delay for restarting the
process. Otherwise, the patch solves the problem.

The delay is intentional. Per pgstat_start():

/*
* Do nothing if too soon since last collector start. This is a safety
* valve to protect against continuous respawn attempts if the collector
* is dying immediately at launch. Note that since we will be re-called
* from the postmaster main loop, we will get another chance later.
*/
curtime = time(NULL);
if ((unsigned int) (curtime - last_pgstat_start_time) <
(unsigned int) PGSTAT_RESTART_INTERVAL)
return 0;
last_pgstat_start_time = curtime;

Committed and back-patched all the way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Robert Haas (#9)
Re: [bug fix] Stats collector is not restarted on the standby

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
The delay is intentional. Per pgstat_start():

It's kind of you to tell the reason.

Committed and back-patched all the way.

Thanks again!

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers