Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Started by Michael Paquierabout 13 years ago25 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

There is a strange bug with the latest master head (commit 7fcbf6a).
When the WAL stream with a master is cut on a slave, slave returns a FATAL
(well normal...), but then enters in recovery process and automatically
promotes.
Here are more details about the logs on slave (I simply killed the master
manually):
FATAL: could not receive data from WAL stream:
cp: cannot stat
‘/home/michael/bin/pgsql/archive/master/000000010000000000000004’: No such
file or directory
LOG: record with zero length at 0/401E1B8
LOG: redo done at 0/401E178
LOG: last completed transaction was at log time 2013-01-17
20:27:53.180971+09
cp: cannot stat ‘/home/michael/bin/pgsql/archive/master/00000002.history’:
No such file or directory
LOG: selected new timeline ID: 2
cp: cannot stat ‘/home/michael/bin/pgsql/archive/master/00000001.history’:
No such file or directory
LOG: archive recovery complete
DEBUG: resetting unlogged relations: cleanup 0 init 1
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
DEBUG: archived transaction log file "000000010000000000000004"
DEBUG: archived transaction log file "00000002.history"
LOG: statement: create table bn (a int);
DEBUG: autovacuum: processing database "postgres"

Slave does not try anymore to reconnect to master with messages of the type:
FATAL: could not connect to the primary server

I also noticed that there is some delay until modifications on master are
visible on slave.
For example run a simple CREATE TABLE and the new table is not

[some bisecting later...]

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Does someone else noticed that?
--
Michael Paquier
http://michael.otacoo.com

#2Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#1)
Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-17 13:47:41 +0900, Michael Paquier wrote:

Hi all,

There is a strange bug with the latest master head (commit 7fcbf6a).
When the WAL stream with a master is cut on a slave, slave returns a FATAL
(well normal...), but then enters in recovery process and automatically
promotes.
Here are more details about the logs on slave (I simply killed the master
manually):
FATAL: could not receive data from WAL stream:
cp: cannot stat
‘/home/michael/bin/pgsql/archive/master/000000010000000000000004’: No such
file or directory
LOG: record with zero length at 0/401E1B8
LOG: redo done at 0/401E178
LOG: last completed transaction was at log time 2013-01-17
20:27:53.180971+09
cp: cannot stat ‘/home/michael/bin/pgsql/archive/master/00000002.history’:
No such file or directory
LOG: selected new timeline ID: 2
cp: cannot stat ‘/home/michael/bin/pgsql/archive/master/00000001.history’:
No such file or directory
LOG: archive recovery complete
DEBUG: resetting unlogged relations: cleanup 0 init 1
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
DEBUG: archived transaction log file "000000010000000000000004"
DEBUG: archived transaction log file "00000002.history"
LOG: statement: create table bn (a int);
DEBUG: autovacuum: processing database "postgres"

Slave does not try anymore to reconnect to master with messages of the type:
FATAL: could not connect to the primary server

I also noticed that there is some delay until modifications on master are
visible on slave.
For example run a simple CREATE TABLE and the new table is not

[some bisecting later...]

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Does someone else noticed that?

No, at least I haven't, but it got committed only a rather short time
ago ;)
Looking, I have an inkling where the rpoblem could be.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#1)
Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-17 13:47:41 +0900, Michael Paquier wrote:

Hi all,

There is a strange bug with the latest master head (commit 7fcbf6a).
When the WAL stream with a master is cut on a slave, slave returns a FATAL
(well normal...), but then enters in recovery process and automatically
promotes.
Here are more details about the logs on slave (I simply killed the master
manually):
FATAL: could not receive data from WAL stream:
cp: cannot stat
‘/home/michael/bin/pgsql/archive/master/000000010000000000000004’: No such
file or directory
LOG: record with zero length at 0/401E1B8
LOG: redo done at 0/401E178
LOG: last completed transaction was at log time 2013-01-17
20:27:53.180971+09
cp: cannot stat ‘/home/michael/bin/pgsql/archive/master/00000002.history’:
No such file or directory
LOG: selected new timeline ID: 2
cp: cannot stat ‘/home/michael/bin/pgsql/archive/master/00000001.history’:
No such file or directory
LOG: archive recovery complete
DEBUG: resetting unlogged relations: cleanup 0 init 1
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
DEBUG: archived transaction log file "000000010000000000000004"
DEBUG: archived transaction log file "00000002.history"
LOG: statement: create table bn (a int);
DEBUG: autovacuum: processing database "postgres"

Slave does not try anymore to reconnect to master with messages of the type:
FATAL: could not connect to the primary server

I also noticed that there is some delay until modifications on master are
visible on slave.
For example run a simple CREATE TABLE and the new table is not

[some bisecting later...]

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Ok, my inkling proved to be correct, its two related issues:

a ) The error handling in XLogReadRecord is inconsistent, it doesn't
always reset the necessary things.

b) ReadRecord: We cannot not break out of the retry loop in readRecord
just so, just removing the break seems correct.

c) ReadRecord: (minor): We should log an error even if errormsg is not
set, otherwise we wont jump out far enough.

I think at least a) and b) is the result of merges between development
of different people, sorry for that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#3)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 17.01.2013 15:05, Andres Freund wrote:

On 2013-01-17 13:47:41 +0900, Michael Paquier wrote:

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Ok, my inkling proved to be correct, its two related issues:

a ) The error handling in XLogReadRecord is inconsistent, it doesn't
always reset the necessary things.

b) ReadRecord: We cannot not break out of the retry loop in readRecord
just so, just removing the break seems correct.

c) ReadRecord: (minor): We should log an error even if errormsg is not
set, otherwise we wont jump out far enough.

I think at least a) and b) is the result of merges between development
of different people, sorry for that.

Got a patch?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#4)
Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-17 17:18:14 +0200, Heikki Linnakangas wrote:

On 17.01.2013 15:05, Andres Freund wrote:

On 2013-01-17 13:47:41 +0900, Michael Paquier wrote:

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Ok, my inkling proved to be correct, its two related issues:

a ) The error handling in XLogReadRecord is inconsistent, it doesn't
always reset the necessary things.

b) ReadRecord: We cannot not break out of the retry loop in readRecord
just so, just removing the break seems correct.

c) ReadRecord: (minor): We should log an error even if errormsg is not
set, otherwise we wont jump out far enough.

I think at least a) and b) is the result of merges between development
of different people, sorry for that.

Got a patch?

Yes, I am just testing some scenarios with it, will send it after that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#5)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-17 16:23:44 +0100, Andres Freund wrote:

On 2013-01-17 17:18:14 +0200, Heikki Linnakangas wrote:

On 17.01.2013 15:05, Andres Freund wrote:

On 2013-01-17 13:47:41 +0900, Michael Paquier wrote:

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Ok, my inkling proved to be correct, its two related issues:

a ) The error handling in XLogReadRecord is inconsistent, it doesn't
always reset the necessary things.

b) ReadRecord: We cannot not break out of the retry loop in readRecord
just so, just removing the break seems correct.

c) ReadRecord: (minor): We should log an error even if errormsg is not
set, otherwise we wont jump out far enough.

I think at least a) and b) is the result of merges between development
of different people, sorry for that.

Got a patch?

Yes, I am just testing some scenarios with it, will send it after that.

Ok, the attached patch seems to fix a) and b). c) above is bogus, as
explained in a comment in the patch. I also noticed that the TLI check
didn't mark the last source as failed.

Not a real issue and its independent from this patch but I found that
when promoting from streaming rep the code first fails over to archive
recovery and only then to recovering from pg_xlog. Is that intended?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

fix-error-handling-xlogreader.patchtext/x-patch; charset=us-asciiDownload+27-31
#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#6)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 17.01.2013 17:42, Andres Freund wrote:

Ok, the attached patch seems to fix a) and b). c) above is bogus, as
explained in a comment in the patch. I also noticed that the TLI check
didn't mark the last source as failed.

This looks fragile:

/*
* We only end up here without a message when XLogPageRead() failed
* - in that case we already logged something.
* In StandbyMode that only happens if we have been triggered, so
* we shouldn't loop anymore in that case.
*/
if (errormsg == NULL)
break;

I don't like relying on the presence of an error message to control
logic like that. Should we throw in an explicit CheckForStandbyTrigger()
check in the condition of that loop?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#7)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-17 18:33:42 +0200, Heikki Linnakangas wrote:

On 17.01.2013 17:42, Andres Freund wrote:

Ok, the attached patch seems to fix a) and b). c) above is bogus, as
explained in a comment in the patch. I also noticed that the TLI check
didn't mark the last source as failed.

This looks fragile:

/*
* We only end up here without a message when XLogPageRead() failed
* - in that case we already logged something.
* In StandbyMode that only happens if we have been triggered, so
* we shouldn't loop anymore in that case.
*/
if (errormsg == NULL)
break;

I don't like relying on the presence of an error message to control logic
like that. Should we throw in an explicit CheckForStandbyTrigger() check in
the condition of that loop?

I agree, I wasn't too happy about that either. But in some sense its
only propagating state from XLogReadPage which already has dealt with
the error and decided its ok.
Its the solution closest to what happened in the old implementation,
thats why I thought it would be halfway to acceptable.

Adding the CheckForStandbyTrigger() in the condition would mean
promotion would happen before all the available records are processed
and it would increase the amount of stat()s tremendously.
So I don't really like that either.

Don't really have a good idea :(

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#8)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 17.01.2013 18:42, Andres Freund wrote:

On 2013-01-17 18:33:42 +0200, Heikki Linnakangas wrote:

On 17.01.2013 17:42, Andres Freund wrote:

Ok, the attached patch seems to fix a) and b). c) above is bogus, as
explained in a comment in the patch. I also noticed that the TLI check
didn't mark the last source as failed.

This looks fragile:

/*
* We only end up here without a message when XLogPageRead() failed
* - in that case we already logged something.
* In StandbyMode that only happens if we have been triggered, so
* we shouldn't loop anymore in that case.
*/
if (errormsg == NULL)
break;

I don't like relying on the presence of an error message to control logic
like that. Should we throw in an explicit CheckForStandbyTrigger() check in
the condition of that loop?

I agree, I wasn't too happy about that either. But in some sense its
only propagating state from XLogReadPage which already has dealt with
the error and decided its ok.
Its the solution closest to what happened in the old implementation,
thats why I thought it would be halfway to acceptable.

Adding the CheckForStandbyTrigger() in the condition would mean
promotion would happen before all the available records are processed
and it would increase the amount of stat()s tremendously.
So I don't really like that either.

I was thinking of the attached. As long as we check for
CheckForStandbyTrigger() after the "record == NULL" check, we won't
perform extra stat() calls on successful reads, only when we're polling
after reaching the end of valid WAL. That seems acceptable. If we want
to avoid even that, we could move the static 'triggered' variable from
CheckForStandbyTrigger() out of that function, and check that in the loop.

- Heikki

Attachments:

fix-error-handling-xlogreader-2.patchtext/x-diff; name=fix-error-handling-xlogreader-2.patchDownload+28-33
#10Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#9)
Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-17 18:50:35 +0200, Heikki Linnakangas wrote:

On 17.01.2013 18:42, Andres Freund wrote:

On 2013-01-17 18:33:42 +0200, Heikki Linnakangas wrote:

On 17.01.2013 17:42, Andres Freund wrote:

Ok, the attached patch seems to fix a) and b). c) above is bogus, as
explained in a comment in the patch. I also noticed that the TLI check
didn't mark the last source as failed.

This looks fragile:

/*
* We only end up here without a message when XLogPageRead() failed
* - in that case we already logged something.
* In StandbyMode that only happens if we have been triggered, so
* we shouldn't loop anymore in that case.
*/
if (errormsg == NULL)
break;

I don't like relying on the presence of an error message to control logic
like that. Should we throw in an explicit CheckForStandbyTrigger() check in
the condition of that loop?

I agree, I wasn't too happy about that either. But in some sense its
only propagating state from XLogReadPage which already has dealt with
the error and decided its ok.
Its the solution closest to what happened in the old implementation,
thats why I thought it would be halfway to acceptable.

Adding the CheckForStandbyTrigger() in the condition would mean
promotion would happen before all the available records are processed
and it would increase the amount of stat()s tremendously.
So I don't really like that either.

I was thinking of the attached. As long as we check for
CheckForStandbyTrigger() after the "record == NULL" check, we won't perform
extra stat() calls on successful reads, only when we're polling after
reaching the end of valid WAL. That seems acceptable.

Looks good to me.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#10)
Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 17.01.2013 18:55, Andres Freund wrote:

On 2013-01-17 18:50:35 +0200, Heikki Linnakangas wrote:

I was thinking of the attached. As long as we check for
CheckForStandbyTrigger() after the "record == NULL" check, we won't perform
extra stat() calls on successful reads, only when we're polling after
reaching the end of valid WAL. That seems acceptable.

Looks good to me.

Ok, committed.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#1)
Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-17 13:47:41 +0900, Michael Paquier wrote:

Slave does not try anymore to reconnect to master with messages of the type:
FATAL: could not connect to the primary server

I also noticed that there is some delay until modifications on master are
visible on slave.

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Heikki committed a fix for at least the promotion issue, I didn't notice
any problem with an increased delay, could you check again if you still
see it?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Fujii Masao
masao.fujii@gmail.com
In reply to: Andres Freund (#12)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On Fri, Jan 18, 2013 at 2:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-01-17 13:47:41 +0900, Michael Paquier wrote:

Slave does not try anymore to reconnect to master with messages of the type:
FATAL: could not connect to the primary server

I also noticed that there is some delay until modifications on master are
visible on slave.

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Heikki committed a fix for at least the promotion issue, I didn't notice
any problem with an increased delay, could you check again if you still
see it?

I encountered the problem that the timeline switch is not performed expectedly.
I set up one master, one standby and one cascade standby. All the servers
share the archive directory. restore_command is specified in the recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
....

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Fujii Masao (#13)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-18 03:05:47 +0900, Fujii Masao wrote:

On Fri, Jan 18, 2013 at 2:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-01-17 13:47:41 +0900, Michael Paquier wrote:

Slave does not try anymore to reconnect to master with messages of the type:
FATAL: could not connect to the primary server

I also noticed that there is some delay until modifications on master are
visible on slave.

I think that bug has been introduced by commit 7fcbf6a.
Before splitting xlog reading as a separate facility things worked
correctly.
There are also no delay problems before this commit.

Heikki committed a fix for at least the promotion issue, I didn't notice
any problem with an increased delay, could you check again if you still
see it?

I encountered the problem that the timeline switch is not performed expectedly.
I set up one master, one standby and one cascade standby. All the servers
share the archive directory. restore_command is specified in the recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
....

That's after the commit or before? Because in passing I think I
noticed/fixed a bug that could cause exactly that problem...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#14)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 17.01.2013 20:08, Andres Freund wrote:

On 2013-01-18 03:05:47 +0900, Fujii Masao wrote:

I encountered the problem that the timeline switch is not performed expectedly.
I set up one master, one standby and one cascade standby. All the servers
share the archive directory. restore_command is specified in the recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
....

That's after the commit or before? Because in passing I think I
noticed/fixed a bug that could cause exactly that problem...

I think I broke that with the "teach pg_receivexlog to cross timelines"
patch. Will take a look...

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#15)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 17.01.2013 21:57, Heikki Linnakangas wrote:

On 17.01.2013 20:08, Andres Freund wrote:

On 2013-01-18 03:05:47 +0900, Fujii Masao wrote:

I encountered the problem that the timeline switch is not performed
expectedly.
I set up one master, one standby and one cascade standby. All the
servers
share the archive directory. restore_command is specified in the
recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the
following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
....

That's after the commit or before? Because in passing I think I
noticed/fixed a bug that could cause exactly that problem...

I think I broke that with the "teach pg_receivexlog to cross timelines"
patch. Will take a look...

Ok, there was a couple of issues. First, as I suspected above, I added a
new result set after copy has ended in START_STREAMING command, but
forgot to teach walreceiver about it. Fixed that.

Secondly, there's an interesting interaction between the new xlogreader
code and streaming replication and timeline switches:

The first call to the page_read callback in xlogreader is always made
with size SizeOfXLogRecord, counting from the beginning of the page.
That's bogus; there can be no WAL record in the very beginning of page,
because of the page header, so I think that was supposed to be
SizeXLogShortPHD. But that won't fix the issue.

The problem is that XLogPageRead callback uses the page address and
requested length to decide what timeline to stream from. For example,
imagine that there's a timeline switch at offset 1000 in a page, and we
have already read all the WAL up to that point. When xlogreader first
asks to fetch the first 32 bytes from the page (the bogus
SizeOfXLogRecord), XLogPageRead deduces that that byte range is still on
the old timeline, and starts streaming from the old timeline. Next,
xlogreader needs the rest of the page, up to 1000 + SizeOfPageHeader, to
read the first record it's actually interested in, XLogPageRead will
return an error because that range is not on the timeline that's
currently streamed. And we loop back to retry, and run into the same
problem again.

This interaction is a bit too subtle for my taste, but the
straightforward fix is to just modify xlogreader so that the first
read_page call requests all the bytes up to record we're actually
interested in. That seems like a smart thing to do anyway.

I committed a patch for that second issue too, but please take a look in
case you come up with a better idea.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#16)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-17 23:49:22 +0200, Heikki Linnakangas wrote:

On 17.01.2013 21:57, Heikki Linnakangas wrote:

On 17.01.2013 20:08, Andres Freund wrote:

On 2013-01-18 03:05:47 +0900, Fujii Masao wrote:

I encountered the problem that the timeline switch is not performed
expectedly.
I set up one master, one standby and one cascade standby. All the
servers
share the archive directory. restore_command is specified in the
recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the
following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
....

That's after the commit or before? Because in passing I think I
noticed/fixed a bug that could cause exactly that problem...

I think I broke that with the "teach pg_receivexlog to cross timelines"
patch. Will take a look...

Ok, there was a couple of issues. First, as I suspected above, I added a new
result set after copy has ended in START_STREAMING command, but forgot to
teach walreceiver about it. Fixed that.

Secondly, there's an interesting interaction between the new xlogreader code
and streaming replication and timeline switches:

The first call to the page_read callback in xlogreader is always made with
size SizeOfXLogRecord, counting from the beginning of the page. That's
bogus; there can be no WAL record in the very beginning of page, because of
the page header, so I think that was supposed to be SizeXLogShortPHD. But
that won't fix the issue.

Yea, thats clearly a typo.

The problem is that XLogPageRead callback uses the page address and
requested length to decide what timeline to stream from. For example,
imagine that there's a timeline switch at offset 1000 in a page, and we have
already read all the WAL up to that point. When xlogreader first asks to
fetch the first 32 bytes from the page (the bogus SizeOfXLogRecord),
XLogPageRead deduces that that byte range is still on the old timeline, and
starts streaming from the old timeline. Next, xlogreader needs the rest of
the page, up to 1000 + SizeOfPageHeader, to read the first record it's
actually interested in, XLogPageRead will return an error because that range
is not on the timeline that's currently streamed. And we loop back to retry,
and run into the same problem again.

This interaction is a bit too subtle for my taste, but the straightforward
fix is to just modify xlogreader so that the first read_page call requests
all the bytes up to record we're actually interested in. That seems like a
smart thing to do anyway.

Yuck. Reading from targetRecOff onwards seems like a good idea, yes. But
ISTM that the code isn't really safe now: Assume targetRecOff is 0
(which is valid) then we read up to SizeOfXLogRecord but assume that we
can read both the xlog record as well as the page header.
Then you also need to factor in that the page header can be differently
big.

And yes, its too subtle :(

Do you want to fix that or shall I?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#13)
Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On Fri, Jan 18, 2013 at 3:05 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

I encountered the problem that the timeline switch is not performed
expectedly.
I set up one master, one standby and one cascade standby. All the servers
share the archive directory. restore_command is specified in the
recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the
following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1

I am seeing similar issues with master at 88228e6.
This is easily reproducible by setting up 2 slaves under a master, then
kill the master. Promote slave 1 and reconnect slave 2 to slave 1, then
you will notice that the timeline jump is not done.

I don't know if Masao tried to put in sync the slave that reconnects to the
promoted slave, but in this case slave2 stucks in "potential" state". That
is due to timeline that has not changed on slave2 but better to let you
know...

The replication delays are still here.
--
Michael Paquier
http://michael.otacoo.com

#19Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#18)
Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On 2013-01-18 08:24:31 +0900, Michael Paquier wrote:

On Fri, Jan 18, 2013 at 3:05 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

I encountered the problem that the timeline switch is not performed
expectedly.
I set up one master, one standby and one cascade standby. All the servers
share the archive directory. restore_command is specified in the
recovery.conf
in those two standbys.

I shut down the master, and then promoted the standby. In this case, the
cascade standby should switch to new timeline and replication should be
successfully restarted. But the timeline was never changed, and the
following
log messages were kept outputting.

sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1
sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
sby2 LOG: replication terminated by primary server
sby2 DETAIL: End of WAL reached on timeline 1

I am seeing similar issues with master at 88228e6.
This is easily reproducible by setting up 2 slaves under a master, then
kill the master. Promote slave 1 and reconnect slave 2 to slave 1, then
you will notice that the timeline jump is not done.

Can you reproduce that one with 7fcbf6a^ (i.e before xlogreader got
split off?).

The replication delays are still here.

That one is caused by this nice bug, courtesy of yours truly:
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 90ba32e..1174493 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8874,7 +8874,7 @@ retry:
    /* See if we need to retrieve more data */
    if (readFile < 0 ||
        (readSource == XLOG_FROM_STREAM &&
-        receivedUpto <= targetPagePtr + reqLen))
+        receivedUpto < targetPagePtr + reqLen))
    {
        if (StandbyMode)
        {

I didn't notice because I had a testscript inserting stuff continuously
and it cause at most lagging by one record...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#19)
Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

On Fri, Jan 18, 2013 at 8:48 AM, Andres Freund <andres@2ndquadrant.com>wrote:

Can you reproduce that one with 7fcbf6a^ (i.e before xlogreader got
split off?).

Yes, it is reproducible before the xlog reader split.
Just an additional report, the master jumps correctly to the new timeline.

The replication delays are still here.

That one is caused by this nice bug, courtesy of yours truly:
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 90ba32e..1174493 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8874,7 +8874,7 @@ retry:
/* See if we need to retrieve more data */
if (readFile < 0 ||
(readSource == XLOG_FROM_STREAM &&
-        receivedUpto <= targetPagePtr + reqLen))
+        receivedUpto < targetPagePtr + reqLen))
{
if (StandbyMode)
{

I didn't notice because I had a testscript inserting stuff continuously
and it cause at most lagging by one record...

This fix is indeed working.
--
Michael Paquier
http://michael.otacoo.com

#21Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#18)
#22Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#21)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#19)
#24Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#22)
#25Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#24)