Standby promotion does not work
All,
So I've finally been able to do some testing, and I'll report that
currently there is way I've found to get existing standbys to subscribe
to a new master.
No matter what I do in recovery.conf, it results in errors and failure
to replicate.
Test setup:
hosts: master1, master2, replica1
replica1 and master2 are subscribed to master1
First, master1 is shut down.
Second, master 2 is promoted via "pg_ctl promote"
So, original recovery.conf on replica1:
#autogenerated recovery.conf file. do not edit
standby_mode = 'on'
primary_conninfo = 'host=master1 port=5432 user=replication'
trigger_file = '/var/log/pgpool/trigger/trigger_file1'
restore_command = 'scp master1:/usr/local/pgsql/wal_share/%f %p'
recovery_target_timeline = 'latest'
This is changed to:
#autogenerated recovery.conf file. do not edit
standby_mode = 'on'
primary_conninfo = 'host=master1 port=5432 user=replication'
trigger_file = '/var/log/pgpool/trigger/trigger_file1'
restore_command = 'scp master1:/usr/local/pgsql/wal_share/%f %p'
recovery_target_timeline = 'latest'
On restart of replica1, I get the following error:
2011-04-10 13:27:24.766 PDT,,,2867,,4da212ac.b33,1,,2011-04-10 13:27:24
PDT,,0,FATAL,XX000,"timeline 2 of the primary does not match recovery
target timeline 1",,,,,,,,,""
2011-04-10 13:27:29.875 PDT,,,2878,,4da212b1.b3e,1,,2011-04-10 13:27:29
PDT,,0,FATAL,XX000,"timeline 2 of the primary does not match recovery
target timeline 1",,,,,,,,,""
If I try to manually change the timeline in recovery.conf to '2', I get:
2011-04-10 13:23:05.115 PDT,,,2834,,4da211a9.b12,2,,2011-04-10 13:23:05
PDT,,0,FATAL,XX000,"recovery target timeline 2 does not exist",,,,,,,,,""
2011-04-10 13:23:05.116 PDT,,,2832,,4da211a8.b10,1,,2011-04-10 13:23:04
PDT,,0,LOG,00000,"startup process (PID 2834) exited with exit code
1",,,,,,,,,""
2011-04-10 13:23:05.116 PDT,,,2832,,4da211a8.b10,2,,2011-04-10 13:23:04
PDT,,0,LOG,00000,"aborting startup due to startup process
failure",,,,,,,,,""
Receive location on master2:
0/93000078
Receive location on replica1:
0/93000000
... and in any case, this is a test system with no activity. So there's
no way we can replica1 be ahead.
So it seems like we still don't have any way to promote an existing
standby to a new master. Is this fixable?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com