New and interesting replication issues with 9.2.8 sync rep
Just got a report of a replication issue with 9.2.8 from a community member:
Here's the sequence:
1) A --> B (sync rep)
2) Shut down B
3) Shut down A
4) Start up B as a master
5) Start up A as sync replica of B
6) A successfully joins B as a sync replica, even though its transaction
log is 1016 bytes *ahead* of B.
7) Transactions written to B all hang
8) Xlog on A is now corrupt, although the database itself is OK
Now, the above sequence happened because of the user misunderstanding
what sync rep really means. However, A should not have been able to
connect with B in replication mode, especially in sync rep mode; that
should have failed. Any thoughts on why it didn't?
I'm trying to produce a test case ...
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
Just got a report of a replication issue with 9.2.8 from a community member:
Here's the sequence:
1) A --> B (sync rep)
2) Shut down B
3) Shut down A
4) Start up B as a master
5) Start up A as sync replica of B
6) A successfully joins B as a sync replica, even though its transaction
log is 1016 bytes *ahead* of B.7) Transactions written to B all hang
8) Xlog on A is now corrupt, although the database itself is OK
This is fundamentally borked practice.
Now, the above sequence happened because of the user misunderstanding
what sync rep really means. However, A should not have been able to
connect with B in replication mode, especially in sync rep mode; that
should have failed. Any thoughts on why it didn't?
I'd guess that B, while starting up, has written further WAL records
bringing it further ahead of A.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05/03/2014 01:07 AM, Andres Freund wrote:
On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
Just got a report of a replication issue with 9.2.8 from a community member:
Here's the sequence:
1) A --> B (sync rep)
2) Shut down B
3) Shut down A
4) Start up B as a master
5) Start up A as sync replica of B
6) A successfully joins B as a sync replica, even though its transaction
log is 1016 bytes *ahead* of B.7) Transactions written to B all hang
8) Xlog on A is now corrupt, although the database itself is OK
This is fundamentally borked practice.
Now, the above sequence happened because of the user misunderstanding
what sync rep really means. However, A should not have been able to
connect with B in replication mode, especially in sync rep mode; that
should have failed. Any thoughts on why it didn't?I'd guess that B, while starting up, has written further WAL records
bringing it further ahead of A.
Apparently not; from what I've seen pg_stat_replication even *shows*
that the replica is ahead of the master. Futher, Postgres should have
recognized that there was a timeline branch point before A's last
record, no?
I'm working on getting permission to access the DB files.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMd06e63f783b612e2ee480192e51c44a929f8032e0b0337f23ac405548180d1935a27970738dfc14bdf55b8f1416b7798@asav-2.01.com
On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
On 05/03/2014 01:07 AM, Andres Freund wrote:
On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
Just got a report of a replication issue with 9.2.8 from a community member:
Here's the sequence:
1) A --> B (sync rep)
2) Shut down B
3) Shut down A
4) Start up B as a master
5) Start up A as sync replica of B
6) A successfully joins B as a sync replica, even though its transaction
log is 1016 bytes *ahead* of B.7) Transactions written to B all hang
8) Xlog on A is now corrupt, although the database itself is OK
This is fundamentally borked practice.
Now, the above sequence happened because of the user misunderstanding
what sync rep really means. However, A should not have been able to
connect with B in replication mode, especially in sync rep mode; that
should have failed. Any thoughts on why it didn't?I'd guess that B, while starting up, has written further WAL records
bringing it further ahead of A.Apparently not; from what I've seen pg_stat_replication even *shows*
that the replica is ahead of the master. Futher, Postgres should have
recognized that there was a timeline branch point before A's last
record, no?
There wasn't any timeline increase because - as far as I understand the
above - there wasn't any promotion. The cluster was shut down and
recovery.conf was created/removed respectively.
To me this is a operator error. We could try to defend against it more
vigorously, but thats's hard to do without breaking actual usecases.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05/05/2014 10:25 AM, Andres Freund wrote:
On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
On 05/03/2014 01:07 AM, Andres Freund wrote:
On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
Just got a report of a replication issue with 9.2.8 from a community member:
Here's the sequence:
1) A --> B (sync rep)
2) Shut down B
3) Shut down A
4) Start up B as a master
5) Start up A as sync replica of B
6) A successfully joins B as a sync replica, even though its transaction
log is 1016 bytes *ahead* of B.7) Transactions written to B all hang
8) Xlog on A is now corrupt, although the database itself is OK
This is fundamentally borked practice.
Now, the above sequence happened because of the user misunderstanding
what sync rep really means. However, A should not have been able to
connect with B in replication mode, especially in sync rep mode; that
should have failed. Any thoughts on why it didn't?I'd guess that B, while starting up, has written further WAL records
bringing it further ahead of A.Apparently not; from what I've seen pg_stat_replication even *shows*
that the replica is ahead of the master. Futher, Postgres should have
recognized that there was a timeline branch point before A's last
record, no?There wasn't any timeline increase because - as far as I understand the
above - there wasn't any promotion. The cluster was shut down and
recovery.conf was created/removed respectively.
Ah, oops, left out a step. B was promoted.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMa4c37670ded779d742ba205ec368a228510c64f75212b51f6bdd29636aad494ac5d4d7a06fd99e7ee5c25a51f8d5e2fa@asav-3.01.com
On 2014-05-05 10:30:17 -0700, Josh Berkus wrote:
On 05/05/2014 10:25 AM, Andres Freund wrote:
On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
On 05/03/2014 01:07 AM, Andres Freund wrote:
On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
Just got a report of a replication issue with 9.2.8 from a community member:
Here's the sequence:
1) A --> B (sync rep)
2) Shut down B
3) Shut down A
4) Start up B as a master
5) Start up A as sync replica of B
6) A successfully joins B as a sync replica, even though its transaction
log is 1016 bytes *ahead* of B.7) Transactions written to B all hang
8) Xlog on A is now corrupt, although the database itself is OK
This is fundamentally borked practice.
Now, the above sequence happened because of the user misunderstanding
what sync rep really means. However, A should not have been able to
connect with B in replication mode, especially in sync rep mode; that
should have failed. Any thoughts on why it didn't?I'd guess that B, while starting up, has written further WAL records
bringing it further ahead of A.Apparently not; from what I've seen pg_stat_replication even *shows*
that the replica is ahead of the master.
That's the shutdown record from A that I've talked about.
Futher, Postgres should have
recognized that there was a timeline branch point before A's last
record, no?There wasn't any timeline increase because - as far as I understand the
above - there wasn't any promotion. The cluster was shut down and
recovery.conf was created/removed respectively.Ah, oops, left out a step. B was promoted.
Still a user error. You need to reclone.
Depending on how archiving and the target timeline was configured the
timeline increase won't be treated as an error...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05/05/2014 10:53 AM, Andres Freund wrote:
Still a user error. You need to reclone.
Depending on how archiving and the target timeline was configured the
timeline increase won't be treated as an error...
Andres and I hashed this out on IRC. The basic problem was that I was
relying on pg_stat_replication to point out when a successful
replication connection was established. However, he pointed out cases
where pg_stat_replication will report sync or streaming even though
replication has failed due to differences in WAL position. That appears
to be what happened here.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM5a6b343dd04f9ddb66eceab012cbd5c17cdc69ee544982033f512c5b4dca500957bb52f0f0acc6474e1bbecfebc2177b@asav-2.01.com