Replication: slave server has 3x size of production server?

Started by Edson Richterabout 6 years ago14 messagesgeneral

edsonrichter@hotmail.com

about 6 years ago

Hi!

I've a database cluster created at 9.6.10 linux x64 server rhel. I made progressive upgrades, first upgrading slave and then upgrading master.
Actually both are running 9.6.17.
Current production server has 196Gb in size.
Nevertheless, the replicated (slave) server has 598 Gb in size.
Replication server has 3x size of production server, is that normal?

Shall I drop the slave server and re-create it? How to avoid this situation in future?

Thanks,

Edson

adrian.klaver@aklaver.com

about 6 years ago

In reply to: Edson Richter (#1)

Re: Replication: slave server has 3x size of production server?

On 2/22/20 9:25 AM, Edson Richter wrote:

Hi!

I've a database cluster created at 9.6.10 linux x64 server rhel. I made
progressive upgrades, first upgrading slave and then upgrading master.
Actually both are running 9.6.17.
Current production server has 196Gb in size.
Nevertheless, the replicated (slave) server has 598 Gb in size.
Replication server has 3x size of production server, is that normal?

How are you measuring the sizes?

Where is the space being taken up on disk?

Shall I drop the slave server and re-create it? How to avoid this
situation in future?

Thanks,

Edson

--
Adrian Klaver
adrian.klaver@aklaver.com

edsonrichter@hotmail.com

about 6 years ago

In reply to: Adrian Klaver (#2)

RE: Replication: slave server has 3x size of production server?

________________________________

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 14:33
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 9:25 AM, Edson Richter wrote:

Hi!

I've a database cluster created at 9.6.10 linux x64 server rhel. I made
progressive upgrades, first upgrading slave and then upgrading master.
Actually both are running 9.6.17.
Current production server has 196Gb in size.
Nevertheless, the replicated (slave) server has 598 Gb in size.
Replication server has 3x size of production server, is that normal?

How are you measuring the sizes?

This is the command:

du --max-depth 1 -h pgDbCluster

Production:

du --max-depth 1 -h pgDbCluster

56M pgDbCluster/pg_log
444K pgDbCluster/global
4,0K pgDbCluster/pg_stat
4,0K pgDbCluster/pg_snapshots
16K pgDbCluster/pg_logical
20K pgDbCluster/pg_replslot
61M pgDbCluster/pg_subtrans
4,0K pgDbCluster/pg_commit_ts
465M pgDbCluster/pg_xlog
4,0K pgDbCluster/pg_twophase
12M pgDbCluster/pg_multixact
4,0K pgDbCluster/pg_serial
195G pgDbCluster/base
284K pgDbCluster/pg_stat_tmp
12M pgDbCluster/pg_clog
4,0K pgDbCluster/pg_dynshmem
12K pgDbCluster/pg_notify
4,0K pgDbCluster/pg_tblspc
196G pgDbCluster

Slave:

du -h --max-depth 1 pgDbCluster

403G pgDbCluster/pg_xlog
120K pgDbCluster/pg_log
424K pgDbCluster/global
0 pgDbCluster/pg_stat
0 pgDbCluster/pg_snapshots
4,0K pgDbCluster/pg_logical
8,0K pgDbCluster/pg_replslot
60M pgDbCluster/pg_subtrans
0 pgDbCluster/pg_commit_ts
0 pgDbCluster/pg_twophase
11M pgDbCluster/pg_multixact
0 pgDbCluster/pg_serial
195G pgDbCluster/base
12M pgDbCluster/pg_clog
0 pgDbCluster/pg_dynshmem
8,0K pgDbCluster/pg_notify
12K pgDbCluster/pg_stat_tmp
0 pgDbCluster/pg_tblspc
598G pgDbCluster

Edson

Where is the space being taken up on disk?

Shall I drop the slave server and re-create it? How to avoid this
situation in future?

Thanks,

Edson

--
Adrian Klaver
adrian.klaver@aklaver.com

adrian.klaver@aklaver.com

about 6 years ago

In reply to: Edson Richter (#3)

Re: Replication: slave server has 3x size of production server?

On 2/22/20 10:05 AM, Edson Richter wrote:

------------------------------------------------------------------------

*De:* Adrian Klaver <adrian.klaver@aklaver.com>
*Enviado:* sï¿½bado, 22 de fevereiro de 2020 14:33
*Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
<pgsql-general@postgresql.org>
*Assunto:* Re: Replication: slave server has 3x size of production
server?
On 2/22/20 9:25 AM, Edson Richter wrote:

Hi!

I've a database cluster created at 9.6.10 linux x64 server rhel. I made
progressive upgrades, first upgrading slave and then upgrading master.
Actually both are running 9.6.17.
Current production server has 196Gb in size.
Nevertheless, the replicated (slave) server has 598 Gb in size.
Replication server has 3x size of production server, is that normal?

How are you measuring the sizes?

This is the command:

du --max-depth 1 -h pgDbCluster

Production:

du --max-depth 1 -h pgDbCluster

56M ï¿½ ï¿½ pgDbCluster/pg_log
444K ï¿½ ï¿½pgDbCluster/global
4,0K ï¿½ ï¿½pgDbCluster/pg_stat
4,0K ï¿½ ï¿½pgDbCluster/pg_snapshots
16K ï¿½ ï¿½ pgDbCluster/pg_logical
20K ï¿½ ï¿½ pgDbCluster/pg_replslot
61M ï¿½ ï¿½ pgDbCluster/pg_subtrans
4,0K ï¿½ ï¿½pgDbCluster/pg_commit_ts
465M ï¿½ ï¿½pgDbCluster/pg_xlog
4,0K ï¿½ ï¿½pgDbCluster/pg_twophase
12M ï¿½ ï¿½ pgDbCluster/pg_multixact
4,0K ï¿½ ï¿½pgDbCluster/pg_serial
195G ï¿½ ï¿½pgDbCluster/base
284K ï¿½ ï¿½pgDbCluster/pg_stat_tmp
12M ï¿½ ï¿½ pgDbCluster/pg_clog
4,0K ï¿½ ï¿½pgDbCluster/pg_dynshmem
12K ï¿½ ï¿½ pgDbCluster/pg_notify
4,0K ï¿½ ï¿½pgDbCluster/pg_tblspc
196G ï¿½ ï¿½pgDbCluster

Slave:

du -h --max-depth 1 pgDbCluster

403G ï¿½ ï¿½pgDbCluster/pg_xlog
120K ï¿½ ï¿½pgDbCluster/pg_log
424K ï¿½ ï¿½pgDbCluster/global
0 ï¿½ ï¿½ ï¿½ pgDbCluster/pg_stat
0 ï¿½ ï¿½ ï¿½ pgDbCluster/pg_snapshots
4,0K ï¿½ ï¿½pgDbCluster/pg_logical
8,0K ï¿½ ï¿½pgDbCluster/pg_replslot
60M ï¿½ ï¿½ pgDbCluster/pg_subtrans
0 ï¿½ ï¿½ ï¿½ pgDbCluster/pg_commit_ts
0 ï¿½ ï¿½ ï¿½ pgDbCluster/pg_twophase
11M ï¿½ ï¿½ pgDbCluster/pg_multixact
0 ï¿½ ï¿½ ï¿½ pgDbCluster/pg_serial
195G ï¿½ ï¿½pgDbCluster/base
12M ï¿½ ï¿½ pgDbCluster/pg_clog
0 ï¿½ ï¿½ ï¿½ pgDbCluster/pg_dynshmem
8,0K ï¿½ ï¿½pgDbCluster/pg_notify
12K ï¿½ ï¿½ pgDbCluster/pg_stat_tmp
0 ï¿½ ï¿½ ï¿½ pgDbCluster/pg_tblspc
598G ï¿½ ï¿½pgDbCluster

So the WAL logs are not being cleared.

What replication method is being used?

What are the settings for the replication?

Edson

--
Adrian Klaver
adrian.klaver@aklaver.com

edsonrichter@hotmail.com

about 6 years ago

In reply to: Adrian Klaver (#4)

RE: Replication: slave server has 3x size of production server?

________________________________

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 15:50
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 10:05 AM, Edson Richter wrote:

------------------------------------------------------------------------

*De:* Adrian Klaver <adrian.klaver@aklaver.com>
*Enviado:* sábado, 22 de fevereiro de 2020 14:33
*Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
<pgsql-general@postgresql.org>
*Assunto:* Re: Replication: slave server has 3x size of production
server?
On 2/22/20 9:25 AM, Edson Richter wrote:

Hi!

I've a database cluster created at 9.6.10 linux x64 server rhel. I made
progressive upgrades, first upgrading slave and then upgrading master.
Actually both are running 9.6.17.
Current production server has 196Gb in size.
Nevertheless, the replicated (slave) server has 598 Gb in size.
Replication server has 3x size of production server, is that normal?

How are you measuring the sizes?

This is the command:

du --max-depth 1 -h pgDbCluster

Production:

du --max-depth 1 -h pgDbCluster

56M pgDbCluster/pg_log
444K pgDbCluster/global
4,0K pgDbCluster/pg_stat
4,0K pgDbCluster/pg_snapshots
16K pgDbCluster/pg_logical
20K pgDbCluster/pg_replslot
61M pgDbCluster/pg_subtrans
4,0K pgDbCluster/pg_commit_ts
465M pgDbCluster/pg_xlog
4,0K pgDbCluster/pg_twophase
12M pgDbCluster/pg_multixact
4,0K pgDbCluster/pg_serial
195G pgDbCluster/base
284K pgDbCluster/pg_stat_tmp
12M pgDbCluster/pg_clog
4,0K pgDbCluster/pg_dynshmem
12K pgDbCluster/pg_notify
4,0K pgDbCluster/pg_tblspc
196G pgDbCluster

Slave:

du -h --max-depth 1 pgDbCluster

403G pgDbCluster/pg_xlog
120K pgDbCluster/pg_log
424K pgDbCluster/global
0 pgDbCluster/pg_stat
0 pgDbCluster/pg_snapshots
4,0K pgDbCluster/pg_logical
8,0K pgDbCluster/pg_replslot
60M pgDbCluster/pg_subtrans
0 pgDbCluster/pg_commit_ts
0 pgDbCluster/pg_twophase
11M pgDbCluster/pg_multixact
0 pgDbCluster/pg_serial
195G pgDbCluster/base
12M pgDbCluster/pg_clog
0 pgDbCluster/pg_dynshmem
8,0K pgDbCluster/pg_notify
12K pgDbCluster/pg_stat_tmp
0 pgDbCluster/pg_tblspc
598G pgDbCluster

So the WAL logs are not being cleared.

What replication method is being used?

What are the settings for the replication?

Streaming replication. Initiated via pg_basebackup.

Settings on master server:

# - Sending Server(s) -
# Set these on the master and on any standby that will send replication data.
max_wal_senders = 2 # max number of walsender processes (change requires restart)
wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
max_replication_slots = 2 # max number of replication slots (change requires restart)
#track_commit_timestamp = off # collect timestamp of transaction commit (change requires restart)
# - Master Server -
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync rep number of sync standbys and comma-separated list of application_name from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed

Settings on slave server:

# - Standby Servers -
# These settings are ignored on a master server.
hot_standby = on # "on" allows queries during recovery (change requires restart)
max_standby_archive_delay = -1 # max delay before canceling queries when reading WAL from archive; -1 allows indefinite delay
max_standby_streaming_delay = -1 # max delay before canceling queries when reading streaming WAL; -1 allows indefinite delay
wal_receiver_status_interval = 10s # send replies at least this often 0 disables
hot_standby_feedback = on # send info from standby to prevent query conflicts
wal_receiver_timeout = 0 # time that receiver waits for communication from master in milliseconds; 0 disables
wal_retrieve_retry_interval = 5s # time to wait before retrying to retrieve WAL after a failed attempt

Regards,

Edson

Edson

--
Adrian Klaver
adrian.klaver@aklaver.com

adrian.klaver@aklaver.com

about 6 years ago

In reply to: Edson Richter (#5)

Re: Replication: slave server has 3x size of production server?

On 2/22/20 11:03 AM, Edson Richter wrote:

------------------------------------------------------------------------

Streaming replication. Initiated via pg_basebackup.

Settings on master server:

# - Sending Server(s) -
# Set these on the master and on any standby that will send replication
data.
max_wal_senders = 2 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ # max number of walsender processes
(change requires restart)
wal_keep_segments = 25 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½# in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s ï¿½ ï¿½ ï¿½ # in milliseconds; 0 disables
max_replication_slots = 2 ï¿½ ï¿½ ï¿½ # max number of replication
slotsï¿½(change requires restart)
#track_commit_timestamp = off ï¿½ # collect timestamp of transaction
commitï¿½(change requires restart)
# - Master Server -
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync
repï¿½number of sync standbys and comma-separated list of
application_nameï¿½from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 ï¿½ # number of xacts by which cleanup is
delayed

Settings on slave server:

# - Standby Servers -
# These settings are ignored on a master server.
hot_standby = on ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½# "on" allows queries during
recovery (change requires restart)
max_standby_archive_delay = -1 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½# max delay before canceling
queries when reading WAL from archive; -1 allows indefinite delay
max_standby_streaming_delay = -1 ï¿½ ï¿½ ï¿½ ï¿½# max delay before canceling
queries when reading streaming WAL; -1 allows indefinite delay
wal_receiver_status_interval = 10s ï¿½ ï¿½ ï¿½# send replies at least this
often 0 disables
hot_standby_feedback = on ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ # send info from standby to
prevent query conflicts
wal_receiver_timeout = 0 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½# time that receiver waits for
communication from master in milliseconds; 0 disables
wal_retrieve_retry_interval = 5s ï¿½ ï¿½ ï¿½ ï¿½# time to wait before retrying
to retrieve WAL after a failed attempt

What are the settings for:

archive_mode
archive_command

on the standby?

Are the files in pg_xlog on the standby mostly from well in the past?

Regards,

Edson

Edson

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

edsonrichter@hotmail.com

about 6 years ago

In reply to: Adrian Klaver (#6)

RE: Replication: slave server has 3x size of production server?

________________________________

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 16:16
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 11:03 AM, Edson Richter wrote:

------------------------------------------------------------------------

Streaming replication. Initiated via pg_basebackup.

Settings on master server:

# - Sending Server(s) -
# Set these on the master and on any standby that will send replication
data.
max_wal_senders = 2 # max number of walsender processes
(change requires restart)
wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
max_replication_slots = 2 # max number of replication
slots (change requires restart)
#track_commit_timestamp = off # collect timestamp of transaction
commit (change requires restart)
# - Master Server -
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync
rep number of sync standbys and comma-separated list of
application_name from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is
delayed

Settings on slave server:

# - Standby Servers -
# These settings are ignored on a master server.
hot_standby = on # "on" allows queries during
recovery (change requires restart)
max_standby_archive_delay = -1 # max delay before canceling
queries when reading WAL from archive; -1 allows indefinite delay
max_standby_streaming_delay = -1 # max delay before canceling
queries when reading streaming WAL; -1 allows indefinite delay
wal_receiver_status_interval = 10s # send replies at least this
often 0 disables
hot_standby_feedback = on # send info from standby to
prevent query conflicts
wal_receiver_timeout = 0 # time that receiver waits for
communication from master in milliseconds; 0 disables
wal_retrieve_retry_interval = 5s # time to wait before retrying
to retrieve WAL after a failed attempt

What are the settings for:

archive_mode
archive_command

on the standby?

Are the files in pg_xlog on the standby mostly from well in the past?

Actually, standby server is sending wals to a backup (barman) server:

archive_mode = always # enables archiving; off, on, or always (change requires restart)
archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

The files are about 7 months old.

Thanks,

Edson

Regards,

Edson

Edson

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

adrian.klaver@aklaver.com

about 6 years ago

In reply to: Edson Richter (#7)

Re: Replication: slave server has 3x size of production server?

On 2/22/20 11:23 AM, Edson Richter wrote:

------------------------------------------------------------------------

*De:* Adrian Klaver <adrian.klaver@aklaver.com>
*Enviado:* sï¿½bado, 22 de fevereiro de 2020 16:16
*Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
<pgsql-general@postgresql.org>
*Assunto:* Re: Replication: slave server has 3x size of production
server?
On 2/22/20 11:03 AM, Edson Richter wrote:

ï¿½ï¿½ï¿½ï¿½ ------------------------------------------------------------------------

Streaming replication. Initiated via pg_basebackup.

Settings on master server:

# - Sending Server(s) -
# Set these on the master and on any standby that will send replication
data.
max_wal_senders = 2 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ # max number of walsender processes
(change requires restart)
wal_keep_segments = 25 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½# in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s ï¿½ ï¿½ ï¿½ # in milliseconds; 0 disables
max_replication_slots = 2 ï¿½ ï¿½ ï¿½ # max number of replication
slotsï¿½(change requires restart)
#track_commit_timestamp = off ï¿½ # collect timestamp of transaction
commitï¿½(change requires restart)
# - Master Server -
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync
repï¿½number of sync standbys and comma-separated list of
application_nameï¿½from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 ï¿½ # number of xacts by which cleanup is
delayed

Settings on slave server:

# - Standby Servers -
# These settings are ignored on a master server.
hot_standby = on ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½# "on" allows queries during
recovery (change requires restart)
max_standby_archive_delay = -1 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½# max delay before canceling
queries when reading WAL from archive; -1 allows indefinite delay
max_standby_streaming_delay = -1 ï¿½ ï¿½ ï¿½ ï¿½# max delay before canceling
queries when reading streaming WAL; -1 allows indefinite delay
wal_receiver_status_interval = 10s ï¿½ ï¿½ ï¿½# send replies at least this
often 0 disables
hot_standby_feedback = on ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ # send info from standby to
prevent query conflicts
wal_receiver_timeout = 0 ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½# time that receiver waits for
communication from master in milliseconds; 0 disables
wal_retrieve_retry_interval = 5s ï¿½ ï¿½ ï¿½ ï¿½# time to wait before retrying
to retrieve WAL after a failed attempt

What are the settings for:

archive_mode
archive_command

on the standby?

Are the files in pg_xlog on the standby mostly from well in the past?

Actually, standby server is sending wals to a backup (barman) server:

archive_mode = always ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ # enables archiving; off, on, or always
(change requires restart)
archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p
barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

And the above is working, the files are showing up on the barman server?

The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

Thanks,

Edson

Regards,

Edson

ï¿½ï¿½ï¿½ï¿½ >
ï¿½ï¿½ï¿½ï¿½ >
ï¿½ï¿½ï¿½ï¿½ > Edson
ï¿½ï¿½ï¿½ï¿½ >

ï¿½ï¿½ï¿½ï¿½ --
ï¿½ï¿½ï¿½ï¿½ Adrian Klaver
ï¿½ï¿½ï¿½ï¿½ adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

edsonrichter@hotmail.com

about 6 years ago

In reply to: Adrian Klaver (#8)

RE: Replication: slave server has 3x size of production server?

________________________________

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 18:12
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 11:23 AM, Edson Richter wrote:

------------------------------------------------------------------------

*De:* Adrian Klaver <adrian.klaver@aklaver.com>
*Enviado:* sábado, 22 de fevereiro de 2020 16:16
*Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
<pgsql-general@postgresql.org>
*Assunto:* Re: Replication: slave server has 3x size of production
server?
On 2/22/20 11:03 AM, Edson Richter wrote:

------------------------------------------------------------------------

Streaming replication. Initiated via pg_basebackup.

Settings on master server:

# - Sending Server(s) -
# Set these on the master and on any standby that will send replication
data.
max_wal_senders = 2 # max number of walsender processes
(change requires restart)
wal_keep_segments = 25 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
max_replication_slots = 2 # max number of replication
slots (change requires restart)
#track_commit_timestamp = off # collect timestamp of transaction
commit (change requires restart)
# - Master Server -
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync
rep number of sync standbys and comma-separated list of
application_name from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is
delayed

Settings on slave server:

# - Standby Servers -
# These settings are ignored on a master server.
hot_standby = on # "on" allows queries during
recovery (change requires restart)
max_standby_archive_delay = -1 # max delay before canceling
queries when reading WAL from archive; -1 allows indefinite delay
max_standby_streaming_delay = -1 # max delay before canceling
queries when reading streaming WAL; -1 allows indefinite delay
wal_receiver_status_interval = 10s # send replies at least this
often 0 disables
hot_standby_feedback = on # send info from standby to
prevent query conflicts
wal_receiver_timeout = 0 # time that receiver waits for
communication from master in milliseconds; 0 disables
wal_retrieve_retry_interval = 5s # time to wait before retrying
to retrieve WAL after a failed attempt

What are the settings for:

archive_mode
archive_command

on the standby?

Are the files in pg_xlog on the standby mostly from well in the past?

Actually, standby server is sending wals to a backup (barman) server:

archive_mode = always # enables archiving; off, on, or always
(change requires restart)
archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p
barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

And the above is working, the files are showing up on the barman server?

Yes, it is working. Last X'log file is present on all thee servers.
Also, comparting last transaction number on master and slave shows that all are in sync.
Last, but not least, select max(id) from a busy table shows same id (when queried almost simultaneously using a simple test routine).

The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

Yes, streaming is working properly (as stated above).

Thanks,

Edson Richter

Thanks,

Edson

Regards,

Edson

Edson

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

adrian.klaver@aklaver.com

about 6 years ago

In reply to: Edson Richter (#9)

Re: Replication: slave server has 3x size of production server?

On 2/22/20 2:51 PM, Edson Richter wrote:

Yes, it is working. Last X'log file is present on all thee servers.
Also, comparting last transaction number on master and slave shows that
all are in sync.
Last, but not least, select max(id) from a busy table shows same id
(when queried almost simultaneously using a simple test routine).

Well something is keeping those WAL file around. You probably should
analyze your complete setup to see what else is touching those servers.

The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

Yes, streaming is working properly (as stated above).

Thanks,

Edson Richter

--
Adrian Klaver
adrian.klaver@aklaver.com

edsonrichter@hotmail.com

about 6 years ago

In reply to: Adrian Klaver (#10)

RE: Replication: slave server has 3x size of production server?

________________________________

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 20:34
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/22/20 2:51 PM, Edson Richter wrote:

Yes, it is working. Last X'log file is present on all thee servers.
Also, comparting last transaction number on master and slave shows that
all are in sync.
Last, but not least, select max(id) from a busy table shows same id
(when queried almost simultaneously using a simple test routine).

Well something is keeping those WAL file around. You probably should
analyze your complete setup to see what else is touching those servers.

It is safe to add a "--remove-source-files" into my archive_command as folows into my slave server?

archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022" -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

and remove the xlog file after copy to barman?
I mean, whem the archive command starts, the wal has been already processed by the slave server, so we don't need them after copying to backup server, right?

Regards,

Edson

The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

Yes, streaming is working properly (as stated above).

Thanks,

Edson Richter

--
Adrian Klaver
adrian.klaver@aklaver.com

adrian.klaver@aklaver.com

about 6 years ago

In reply to: Edson Richter (#11)

Re: Replication: slave server has 3x size of production server?

On 2/23/20 8:04 AM, Edson Richter wrote:

------------------------------------------------------------------------

*De:* Adrian Klaver <adrian.klaver@aklaver.com>
*Enviado:* sï¿½bado, 22 de fevereiro de 2020 20:34
*Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
<pgsql-general@postgresql.org>
*Assunto:* Re: Replication: slave server has 3x size of production
server?
On 2/22/20 2:51 PM, Edson Richter wrote:

Yes, it is working. Last X'log file is present on all thee servers.
Also, comparting last transaction number on master and slave shows that
all are in sync.
Last, but not least, select max(id) from a busy table shows same id
(when queried almost simultaneously using a simple test routine).

Well something is keeping those WAL file around. You probably should
analyze your complete setup to see what else is touching those servers.

It is safe to add a "--remove-source-files" into my archive_command as
folows into my slave server?

I would say not. See:

https://www.postgresql.org/docs/12/wal-configuration.html

"Checkpoints are points in the sequence of transactions at which it is
guaranteed that the heap and index data files have been updated with all
information written before that checkpoint. At checkpoint time, all
dirty data pages are flushed to disk and a special checkpoint record is
written to the log file. (The change records were previously flushed to
the WAL files.) In the event of a crash, the crash recovery procedure
looks at the latest checkpoint record to determine the point in the log
(known as the redo record) from which it should start the REDO
operation. Any changes made to data files before that point are
guaranteed to be already on disk. Hence, after a checkpoint, log
segments preceding the one containing the redo record are no longer
needed and can be recycled or removed. (When WAL archiving is being
done, the log segments must be archived before being recycled or removed.)"

So there is a window where a WAL is written but before the data it
represents is check pointed, so it still needed.

archive_command = 'rsyncï¿½--remove-source-files -e "ssh -2 -C -p 2022"
-az %pï¿½barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

and remove the xlog file after copy to barman?
I mean, whem the archive command starts, the wal has been already
processed by the slave server, so we don't need them after copying to
backup server, right?

Regards,

Edson

ï¿½ï¿½ï¿½ï¿½ >
ï¿½ï¿½ï¿½ï¿½ >
ï¿½ï¿½ï¿½ï¿½ > The files are about 7 months old.

ï¿½ï¿½ï¿½ï¿½ Are there newer files that would indicate that the streaming is working?

Yes, streaming is working properly (as stated above).

Thanks,

Edson Richter

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

edsonrichter@hotmail.com

about 6 years ago

In reply to: Adrian Klaver (#12)

RE: Replication: slave server has 3x size of production server?

________________________________

De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: domingo, 23 de fevereiro de 2020 15:42
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?

On 2/23/20 8:04 AM, Edson Richter wrote:

------------------------------------------------------------------------

*De:* Adrian Klaver <adrian.klaver@aklaver.com>
*Enviado:* sábado, 22 de fevereiro de 2020 20:34
*Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
<pgsql-general@postgresql.org>
*Assunto:* Re: Replication: slave server has 3x size of production
server?
On 2/22/20 2:51 PM, Edson Richter wrote:

Yes, it is working. Last X'log file is present on all thee servers.
Also, comparting last transaction number on master and slave shows that
all are in sync.
Last, but not least, select max(id) from a busy table shows same id
(when queried almost simultaneously using a simple test routine).

Well something is keeping those WAL file around. You probably should
analyze your complete setup to see what else is touching those servers.

It is safe to add a "--remove-source-files" into my archive_command as
folows into my slave server?

I would say not. See:

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F12%2Fwal-configuration.html&amp;data=02%7C01%7C%7Cb49e9c01f11a4b9fe4d108d7b8902bd2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637180801653706393&amp;sdata=QY24D6R%2B%2BJ7VgnctERdK964reKEp7XbxERiXGC2XL8Y%3D&amp;reserved=0

"Checkpoints are points in the sequence of transactions at which it is
guaranteed that the heap and index data files have been updated with all
information written before that checkpoint. At checkpoint time, all
dirty data pages are flushed to disk and a special checkpoint record is
written to the log file. (The change records were previously flushed to
the WAL files.) In the event of a crash, the crash recovery procedure
looks at the latest checkpoint record to determine the point in the log
(known as the redo record) from which it should start the REDO
operation. Any changes made to data files before that point are
guaranteed to be already on disk. Hence, after a checkpoint, log
segments preceding the one containing the redo record are no longer
needed and can be recycled or removed. (When WAL archiving is being
done, the log segments must be archived before being recycled or removed.)"

So there is a window where a WAL is written but before the data it
represents is check pointed, so it still needed.

I see. Makes sense.
I suppose that long lifed xlog files are of no use then... I would expect PostgreSQL delete them automatically.
Perhaps, since I have full backups happening every odd days, I can create a "post backup command" in barman script so it will delete files above 1 week from the server it is backup up from...
I understand there is no guarantee that these files have already been processed... but if they are needed, they can be recovered from the barman server...

Thanks,

Edson

archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022"
-az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

and remove the xlog file after copy to barman?
I mean, whem the archive command starts, the wal has been already
processed by the slave server, so we don't need them after copying to
backup server, right?

Regards,

Edson

The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

Yes, streaming is working properly (as stated above).

Thanks,

Edson Richter

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

Jehan-Guillaume de Rorthais

jgdr@dalibo.com

about 6 years ago

In reply to: Edson Richter (#7)

Re: Replication: slave server has 3x size of production server?

On Sat, 22 Feb 2020 19:23:05 +0000
Edson Richter <edsonrichter@hotmail.com> wrote:
[...]

Actually, standby server is sending wals to a backup (barman) server:

archive_mode = always # enables archiving; off, on, or always
(change requires restart) archive_command = 'rsync -e "ssh -2 -C -p 2022" -az
%p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

The files are about 7 months old.

Did you check the return code of your archive_command?

Did you check the log produced by your archive_command and postmaster?

How many files with ".ready" extension in "$PGDATA/pg_xlog/archive_status/"?

Can you confirm there's no missing WAL between the older one and
the newer one in "$PGDATA/pg_xlog" in alphanum order?