Error promoting slave on cascading replication using replication slots

Started by Álvaro Nunes Lemos Meloover 10 years ago2 messagesgeneral
Jump to latest
#1Álvaro Nunes Lemos Melo
al_nunes@atua.com.br

Hi,

I'm configuring a cascading replication environment, with replication
slots, but I'm having a problem when the master goes down and I promote
a slave. All servers start from a cluster created from scratch, with
default config options. The process that I'm using to set up the
cascading replication it is:

1 - On master:
wal_level = hot_standby
max_wal_senders = 3
max_wal_replication_slots = 3
hot_standby = on

2 - On slave1:
Stop Server
Apply the same configuration from above
Erase the old cluster
Run pg_basebackup -v -P -R -X stream -c fast -h IP -U postgres -D PGDATA

3 - On master:
pg_create_physical_replication_slot('NAME')

4 - On slave1:
Add the primary_slot_name to recovery.conf
Start cluster

Everything run smoothly, according to with "SELECT * FROM
pg_stat_replication" and "SELECT * FROM pg_replication_slots". The steps
2, 3 and 4 are repeated on slave2 wich points to slave1. The problem
happens when I stop the master, and run a

pg_ctl -D /var/lib/postgresql/9.4/main promote

on slave1. At this point, slave2 throws the following log, and stops
receiving WAL through the replication slot:

2015-12-17 11:23:06 BRST [944-2] LOG: replication terminated by primary
server
2015-12-17 11:23:06 BRST [944-3] DETAIL: End of WAL reached on timeline
1 at 0/30001A0.
2015-12-17 11:23:06 BRST [944-4] LOG: fetching timeline history file
for timeline 2 from primary server
2015-12-17 11:23:06 BRST [937-7] LOG: record with zero length at 0/30001A0
2015-12-17 11:23:06 BRST [944-5] LOG: restarted WAL streaming at
0/3000000 on timeline 1
2015-12-17 11:23:06 BRST [944-6] LOG: replication terminated by primary
server
2015-12-17 11:23:06 BRST [944-7] DETAIL: End of WAL reached on timeline
1 at 0/30001A0.
2015-12-17 11:23:11 BRST [944-8] LOG: restarted WAL streaming at
0/3000000 on timeline 1
2015-12-17 11:23:11 BRST [944-9] LOG: replication terminated by primary
server

I found a instruction to add the following line to recovery.conf:
recovery_target_timeline = 'latest'

When this line is added, slave2 keeps its replication with slave 1:
2015-12-17 13:37:54 BRST [868-2] LOG: replication terminated by primary
server
2015-12-17 13:37:54 BRST [868-3] DETAIL: End of WAL reached on timeline
1 at 0/3001358.
2015-12-17 13:37:54 BRST [868-4] LOG: fetching timeline history file
for timeline 2 from primary server
2015-12-17 13:37:54 BRST [863-7] LOG: new target timeline is 2
2015-12-17 13:37:54 BRST [863-8] LOG: record with zero length at 0/3001358
2015-12-17 13:37:54 BRST [868-5] LOG: restarted WAL streaming at
0/3000000 on timeline 2

My question is: is this the right procedure, or am I missing something?

Best regards,

--
Álvaro Nunes Melo Atua Sistemas de Informação
alvaro@atua.com.br http://www.atua.com.br
(54) 9976-0106 (54) 3045-8100

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Álvaro Nunes Lemos Melo (#1)
Re: Error promoting slave on cascading replication using replication slots

Alvaro Melo <al_nunes@atua.com.br> wrote:

I found a instruction to add the following line to recovery.conf:
recovery_target_timeline = 'latest'

When this line is added, slave2 keeps its replication with slave 1:
2015-12-17 13:37:54 BRST [868-2] LOG: replication terminated by primary
server
2015-12-17 13:37:54 BRST [868-3] DETAIL: End of WAL reached on timeline
1 at 0/3001358.
2015-12-17 13:37:54 BRST [868-4] LOG: fetching timeline history file
for timeline 2 from primary server
2015-12-17 13:37:54 BRST [863-7] LOG: new target timeline is 2
2015-12-17 13:37:54 BRST [863-8] LOG: record with zero length at 0/3001358
2015-12-17 13:37:54 BRST [868-5] LOG: restarted WAL streaming at
0/3000000 on timeline 2

My question is: is this the right procedure, or am I missing something?

Yeah, this is the right procedure, afaik.

Slave2 is now in sync with the new timeline, everything is okay.

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general