streaming replication not working

Started by John DeSoiover 12 years ago10 messagesgeneral

desoi@pgedit.com

over 12 years ago

I have a 9.2.4 (CentOS 6.5) standby that has been working with no problems using log shipping. I wanted to add streaming replication which I thought would be as simple as adding primary_conninfo to recovery.conf and restarting the standby. But on restart there is no message or error about connecting to the primary for replication. pg_stat_replication is empty on the primary and I don't see any errors on the primary either.

Here is what I have on the standby:

postgresql.conf

hot_standby = on
max_wal_senders = 2
wal_level = hot_standby

recovery.conf

standby_mode = 'on'
trigger_file = '/pgsql/9.2/data/failover.trigger'
primary_conninfo = 'host=localhost port=21333 user=postgres'
restore_command = 'pg_standby -t /pgsql/9.2/data/failover.trigger /shared/pgbackup %f %p %r'
archive_cleanup_command = 'pg_archivecleanup /shared/pgbackup %r'

I have a ssh tunnel setup on localhost and have verified the replication user can connect to the primary.

Am I missing something obvious? Do I have to back up the primary again to make this change?

Thanks,

John DeSoi, Ph.D.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Ray Stell

stellr@vt.edu

over 12 years ago

In reply to: John DeSoi (#1)

Re: streaming replication not working

On Sep 23, 2013, at 11:20 AM, John DeSoi <desoi@pgedit.com> wrote:

Am I missing something obvious? Do I have to back up the primary again to make this change?

you didn't mention a pg_hba.conf rule. did you add one for the replication user?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John DeSoi

jdesoi@gmail.com

over 12 years ago

In reply to: Ray Stell (#2)

Re: streaming replication not working

On Sep 23, 2013, at 1:00 PM, Ray Stell <stellr@vt.edu> wrote:

Am I missing something obvious? Do I have to back up the primary again to make this change?

you didn't mention a pg_hba.conf rule. did you add one for the replication user?

You mean on the primary, right? Yes, I have one there. But even if I did not, I would expect to see a connection error in the log on the standby. No error or any indication the streaming replication process is running on the standby.

John DeSoi, Ph.D.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John DeSoi

desoi@pgedit.com

over 12 years ago

In reply to: Ray Stell (#2)

Re: streaming replication not working

On Sep 23, 2013, at 1:00 PM, Ray Stell <stellr@vt.edu> wrote:

Am I missing something obvious? Do I have to back up the primary again to make this change?

you didn't mention a pg_hba.conf rule. did you add one for the replication user?

John DeSoi, Ph.D.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Ray Stell

stellr@vt.edu

over 12 years ago

In reply to: John DeSoi (#4)

Re: streaming replication not working

On Sep 23, 2013, at 4:47 PM, John DeSoi wrote:

You mean on the primary, right?

right

Yes, I have one there. But even if I did not, I would expect to see a connection error in the log on the standby. No error or any indication the streaming replication process is running on the standby.

you're right, I was firing from the hip. sure enough, it does toss an error:
2013-09-23 18:00:01 EDT,0,authentication FATAL: 28000: no pg_hba.conf entry for replication connection from host "xxx", user "repuser", SSL off

I'd guess a firewall issue? What happens with "telnet primary_host port" maybe use tcpdump to see what's happening with the traffic?

Karl Denninger

karl@denninger.net

over 12 years ago

In reply to: Ray Stell (#5)

Re: streaming replication not working

On 9/23/2013 9:30 PM, Ray Stell wrote:

On Sep 23, 2013, at 4:47 PM, John DeSoi wrote:

You mean on the primary, right?

right

Yes, I have one there. But even if I did not, I would expect to see a
connection error in the log on the standby. No error or any
indication the streaming replication process is running on the standby.

you're right, I was firing from the hip. sure enough, it does toss an
error:
2013-09-23 18:00:01 EDT,0,authentication FATAL: 28000: no pg_hba.conf
entry for replication connection from host "xxx", user "repuser", SSL off

I'd guess a firewall issue? What happens with "telnet primary_host
port" maybe use tcpdump to see what's happening with the traffic?

No, there is a missing line in pg_hba.conf that should look something
like this:

host replication repuser xxx trust

(where "xxx" is the hostname)

See the pg_hba.conf file for more examples. Note that "replication" is
a special database tag and a replicating connection must have one of
these defined as "all" does not match it.

You can use "host", "hostssl" or "hostnossl"; "trust" means that no
password is demanded and for obvious reasons should NOT be used for
other than a local connection that can be trusted implicitly. I prefer
not to use that method for other than local socket connections and then
only on a machine where nobody signs in that is untrusted (e.g. only
admins are permitted general access.) If you are connecting over an
insecure channel or untrusted users are on the machine then consider SSL
to encrypt the traffic and either use md5 for the password or use a
certificate.

You can reload the file without restarting postgres with "pg_ctl -D
data-directory reload"

(where "data-directory" is wherever the data directory that has the
pg_hba.conf file -- and the rest of the base of the data store -- is)

--
Karl Denninger
karl@denninger.net
/Cuda Systems LLC/

Laurenz Albe

laurenz.albe@cybertec.at

over 12 years ago

In reply to: John DeSoi (#1)

Re: streaming replication not working

John DeSoi wrote:

I have a 9.2.4 (CentOS 6.5) standby that has been working with no problems using log shipping. I
wanted to add streaming replication which I thought would be as simple as adding primary_conninfo to
recovery.conf and restarting the standby. But on restart there is no message or error about connecting
to the primary for replication. pg_stat_replication is empty on the primary and I don't see any errors
on the primary either.

Here is what I have on the standby:

postgresql.conf

hot_standby = on
max_wal_senders = 2
wal_level = hot_standby

You should set the same parameters on the primary, else it won't work.

recovery.conf

standby_mode = 'on'
trigger_file = '/pgsql/9.2/data/failover.trigger'
primary_conninfo = 'host=localhost port=21333 user=postgres'
restore_command = 'pg_standby -t /pgsql/9.2/data/failover.trigger /shared/pgbackup %f %p %r'
archive_cleanup_command = 'pg_archivecleanup /shared/pgbackup %r'

I have a ssh tunnel setup on localhost and have verified the replication user can connect to the
primary.

Am I missing something obvious? Do I have to back up the primary again to make this change?

No, that shouldn't be necessary.

What are the messages in the secondary's log
after you start it?

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John DeSoi

desoi@pgedit.com

over 12 years ago

In reply to: Laurenz Albe (#7)

Re: streaming replication not working

On Sep 24, 2013, at 5:48 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Here is what I have on the standby:

postgresql.conf

hot_standby = on
max_wal_senders = 2
wal_level = hot_standby

You should set the same parameters on the primary, else it won't work.

On the primary I have

wal_level = hot_standby
wal_keep_segments = 48
max_wal_senders = 2

archive_mode = on
archive_command = 'rsync --whole-file --ignore-existing --delete-after --timeout=30 -a %p backup:/path/backup/%f'
archive_timeout = 300

I don't have "hot_standby = on". I thought that only applied to the standby to allow queries in recovery mode?

Am I missing something obvious? Do I have to back up the primary again to make this change?

No, that shouldn't be necessary.

What are the messages in the secondary's log
after you start it?

Below is what I have for restart with log level at debug2. In looking at the source, I should have an error or a log message "streaming replication successfully connected to primary". I never get either one.

LOG: restored log file "000000010000010F00000001" from archive
LOG: restored log file "000000010000010F00000002" from archive
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: shutting down
LOG: database system is shut down
LOG: database system was shut down in recovery at 2013-09-24 08:23:53 CDT
DEBUG: standby_mode = 'on'
DEBUG: trigger_file = '/pgsql/9.2/data/failover.trigger'
DEBUG: primary_conninfo = 'host=localhost port=21333 user=postgres'
DEBUG: restore_command = 'pg_standby -t /pgsql/9.2/data/failover.trigger /shared/pgbackup %f %p %r'
DEBUG: archive_cleanup_command = 'pg_archivecleanup /shared/pgbackup %r'
LOG: entering standby mode
LOG: restored log file "000000010000010F00000002" from archive
DEBUG: got WAL segment from archive
DEBUG: checkpoint record is at 10F/204D9B8
DEBUG: redo record is at 10F/2035608; shutdown FALSE
DEBUG: next transaction ID: 0/3837659; next OID: 13124290
DEBUG: next MultiXactId: 3; next MultiXactOffset: 5
DEBUG: oldest unfrozen transaction ID: 1798, in database 1
DEBUG: transaction ID wrap limit is 2147485445, limited by database with OID 1
DEBUG: resetting unlogged relations: cleanup 1 init 0
DEBUG: initializing for hot standby
LOG: redo starts at 10F/2035608
DEBUG: recovery snapshots are now enabled
CONTEXT: xlog redo running xacts: nextXid 3837668 latestCompletedXid 3837667 oldestRunningXid 3837668
DEBUG: checkpointer updated shared memory configuration values
LOG: consistent recovery state reached at 10F/3000000
LOG: database system is ready to accept read only connections
LOG: restored log file "000000010000010F00000003" from archive
DEBUG: got WAL segment from archive

Thanks,

John DeSoi, Ph.D.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Guillaume Lelarge

guillaume@lelarge.info

over 12 years ago

In reply to: John DeSoi (#8)

Re: streaming replication not working

On Tue, 2013-09-24 at 08:59 -0500, John DeSoi wrote:

On Sep 24, 2013, at 5:48 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Here is what I have on the standby:

postgresql.conf

hot_standby = on
max_wal_senders = 2
wal_level = hot_standby

You should set the same parameters on the primary, else it won't work.

On the primary I have

wal_level = hot_standby
wal_keep_segments = 48
max_wal_senders = 2

archive_mode = on
archive_command = 'rsync --whole-file --ignore-existing --delete-after --timeout=30 -a %p backup:/path/backup/%f'
archive_timeout = 300

I don't have "hot_standby = on". I thought that only applied to the standby to allow queries in recovery mode?

Am I missing something obvious? Do I have to back up the primary again to make this change?

No, that shouldn't be necessary.

What are the messages in the secondary's log
after you start it?

Below is what I have for restart with log level at debug2. In looking at the source, I should have an error or a log message "streaming replication successfully connected to primary". I never get either one.

LOG: restored log file "000000010000010F00000001" from archive
LOG: restored log file "000000010000010F00000002" from archive
LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: shutting down
LOG: database system is shut down
LOG: database system was shut down in recovery at 2013-09-24 08:23:53 CDT
DEBUG: standby_mode = 'on'
DEBUG: trigger_file = '/pgsql/9.2/data/failover.trigger'
DEBUG: primary_conninfo = 'host=localhost port=21333 user=postgres'
DEBUG: restore_command = 'pg_standby -t /pgsql/9.2/data/failover.trigger /shared/pgbackup %f %p %r'
DEBUG: archive_cleanup_command = 'pg_archivecleanup /shared/pgbackup %r'
LOG: entering standby mode
LOG: restored log file "000000010000010F00000002" from archive
DEBUG: got WAL segment from archive
DEBUG: checkpoint record is at 10F/204D9B8
DEBUG: redo record is at 10F/2035608; shutdown FALSE
DEBUG: next transaction ID: 0/3837659; next OID: 13124290
DEBUG: next MultiXactId: 3; next MultiXactOffset: 5
DEBUG: oldest unfrozen transaction ID: 1798, in database 1
DEBUG: transaction ID wrap limit is 2147485445, limited by database with OID 1
DEBUG: resetting unlogged relations: cleanup 1 init 0
DEBUG: initializing for hot standby
LOG: redo starts at 10F/2035608
DEBUG: recovery snapshots are now enabled
CONTEXT: xlog redo running xacts: nextXid 3837668 latestCompletedXid 3837667 oldestRunningXid 3837668
DEBUG: checkpointer updated shared memory configuration values
LOG: consistent recovery state reached at 10F/3000000
LOG: database system is ready to accept read only connections
LOG: restored log file "000000010000010F00000003" from archive
DEBUG: got WAL segment from archive

Your config file and your debug logs don't match. Your config file says
that the restore command is rsync, but your logs say its pg_standby.

Check if you have a pg_standby process on the slave. That would explain
why the slave never tries to establish a replication connection to the
master.

--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

John DeSoi

desoi@pgedit.com

over 12 years ago

In reply to: Guillaume Lelarge (#9)

Re: streaming replication not working

On Sep 25, 2013, at 8:36 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:

Your config file and your debug logs don't match. Your config file says
that the restore command is rsync, but your logs say its pg_standby.

Check if you have a pg_standby process on the slave. That would explain
why the slave never tries to establish a replication connection to the
master.

rsync is only used in the primary configuration to push the WAL files to the standby. But pg_standby is indeed the problem. I thought pg_standby was a more feature rich option than using cp for the restore command. I see now the documentation says it supports creation of a "warm standby". It did not occur to me this meant the standby could not connect to the primary for streaming replication. Even when using pg_standby, the server was really a "hot standby" because I was able to connect to it and make read-only queries. I think it would be helpful for pg_standby to emit a warning if primary_conninfo is set it the recovery.conf.

I changed the restore command to use cp and now everything appears to be working as expected.

Thanks very much for your help and to everyone who offered suggestions.

John DeSoi, Ph.D.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

streaming replication not working

Attachments: