Question about wal files / pg_xlogs

Started by Patrick Bover 9 years ago16 messagesgeneral
Jump to latest
#1Patrick B
patrickbakerbr@gmail.com

Hi all,

I'm currently using PostgreSQL 9.2. I noticed that the wal_files are being
generated by the master well, no problems. But on the slaves, it seems to
be a delay to the delivery of those wal_files.

I got two slaves using streaming replication and wal files shipment from
Master.

*On the master:*

ls -ltr /var/lib/pgsql/9.2/data/pg_xlogs/

Aug 4 02:18 000000020000159D000000D1
Aug 4 02:18 000000020000159D000000D2
Aug 4 02:18 000000020000159D000000D3

select * from pg_current_xlog_location();

159D/D6C8DAF8

So, seems to be ok.

*On the slave:*

ls -ltr /var/lib/pgsql/9.2/wal_archive:

Aug 4 00:58 000000020000159C00000071
Aug 4 00:58 000000020000159C00000072
Aug 4 00:58 000000020000159C00000073

See the time difference? 2 hours? It seems the files are being delivered
with 2 hours delay.

The streaming replication is working fine... But if that goes down, I'll
need the wal_files up to date to recover the database.

How can I see what's going on? What would be the steps? any tips?

Cheers
Patrick.

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Patrick B (#1)
Re: Question about wal files / pg_xlogs

On 08/03/2016 07:21 PM, Patrick B wrote:

Hi all,

I'm currently using PostgreSQL 9.2. I noticed that the wal_files are
being generated by the master well, no problems. But on the slaves, it
seems to be a delay to the delivery of those wal_files.

I got two slaves using streaming replication and wal files shipment from
Master.

*On the master:*

ls -ltr /var/lib/pgsql/9.2/data/pg_xlogs/
Aug 4 02:18 000000020000159D000000D1
Aug 4 02:18 000000020000159D000000D2
Aug 4 02:18 000000020000159D000000D3

select * from pg_current_xlog_location();
159D/D6C8DAF8

So, seems to be ok.

*On the slave:*

ls -ltr /var/lib/pgsql/9.2/wal_archive:
Aug 4 00:58 000000020000159C00000071
Aug 4 00:58 000000020000159C00000072
Aug 4 00:58 000000020000159C00000073

See the time difference? 2 hours? It seems the files are being delivered
with 2 hours delay.

Both machines have same timezone?

How fast are you generating WALs?

How are you shipping the WALs?

The streaming replication is working fine... But if that goes down, I'll
need the wal_files up to date to recover the database.

How can I see what's going on? What would be the steps? any tips?

Cheers
Patrick.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Patrick B
patrickbakerbr@gmail.com
In reply to: Adrian Klaver (#2)
Re: Question about wal files / pg_xlogs

Both machines have same timezone?

Yes! Shouldn't be showing 2 hours before.. I just checked and both server
has the same date / timezone

How fast are you generating WALs?

Check below please

checkpoint_segments = 64

checkpoint_timeout = 5min
checkpoint_completion_target = 0.6
checkpoint_warning = 30s
archive_timeout = 1800
max_wal_senders = 8
wal_keep_segments = 256

How are you shipping the WALs?

I use a bash script to ship them. The script hasn't being changed.... So it
isn't the problem.

*postgresql.conf:*

archive_command = 'exec nice -n 19 ionice -c 2 -n 7 archive_command.bash
"%p" slave01 slave02'

*archive_command.bash:*

Basically we use TAR to ship through ssh:

# we use tar over SSH as I don't fully trust scp's exit status. The added
benefit is that tar preserves all attributes
# the downside is that it's a little tricky to make the remote path
relative
tar -c -O --no-same-owner -C "${WAL_SEGMENT%/*}" "${WAL_SEGMENT##*/}" |
ssh -p ${NEXT_PORT} -C -o 'BatchMode=yes' -o 'CompressionLevel=3' "${USER}@${NEXT_HOST}"
"exec tar -x --no-same-owner --overwrite -C '${WAL_ARCHIVE_PATH}'";
PS_CONCAT="${PIPESTATUS[*]}";

The script is complex, but as I said, nothing has been changed on it.

#4Venkata B Nagothi
nag1010@gmail.com
In reply to: Patrick B (#3)
Re: Question about wal files / pg_xlogs

I use a bash script to ship them. The script hasn't being changed.... So
it isn't the problem.

*postgresql.conf:*

archive_command = 'exec nice -n 19 ionice -c 2 -n 7 archive_command.bash
"%p" slave01 slave02'

*archive_command.bash:*

Basically we use TAR to ship through ssh:

# we use tar over SSH as I don't fully trust scp's exit status. The added
benefit is that tar preserves all attributes
# the downside is that it's a little tricky to make the remote path
relative
tar -c -O --no-same-owner -C "${WAL_SEGMENT%/*}" "${WAL_SEGMENT##*/}" |
ssh -p ${NEXT_PORT} -C -o 'BatchMode=yes' -o 'CompressionLevel=3' "${USER}@${NEXT_HOST}"
"exec tar -x --no-same-owner --overwrite -C '${WAL_ARCHIVE_PATH}'";
PS_CONCAT="${PIPESTATUS[*]}";

The script is complex, but as I said, nothing has been changed on it.

Not sure why the script is so complex. Do you see any messages in the
postgresql log file on master ? and on slave ? which indicates the reason
for delayed shipping of WAL archives. Did you notice any network level
issues ?

Regards,
Venkata B N

Fujitsu Australia

#5John R Pierce
pierce@hogranch.com
In reply to: Patrick B (#3)
Re: Question about wal files / pg_xlogs

On 8/3/2016 7:59 PM, Patrick B wrote:

*postgresql.conf:*

archive_command = 'exec nice -n 19 ionice -c 2 -n 7
archive_command.bash "%p" slave01 slave02'

*archive_command.bash:*

Basically we use TAR to ship through ssh:

# we use tar over SSH as I don't fully trust scp's exit status.
The added benefit is that tar preserves all attributes
# the downside is that it's a little tricky to make the remote
path relative
tar -c -O --no-same-owner -C "${WAL_SEGMENT%/*}"
"${WAL_SEGMENT##*/}" | ssh -p ${NEXT_PORT} -C -o 'BatchMode=yes'
-o 'CompressionLevel=3' "${USER}@${NEXT_HOST}" "exec tar -x
--no-same-owner --overwrite -C '${WAL_ARCHIVE_PATH}'";
PS_CONCAT="${PIPESTATUS[*]}";

normally, you would ship the archived wal files to a file server via
cp-over-nfs or scp, and have the slaves access them as needed via the
recovery.conf

--
john r pierce, recycling bits in santa cruz

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Patrick B (#3)
Re: Question about wal files / pg_xlogs

On 08/03/2016 07:59 PM, Patrick B wrote:

Both machines have same timezone?

Yes! Shouldn't be showing 2 hours before.. I just checked and both
server has the same date / timezone

How fast are you generating WALs?

Check below please

checkpoint_segments = 64
checkpoint_timeout = 5min
checkpoint_completion_target = 0.6
checkpoint_warning = 30s
archive_timeout = 1800
max_wal_senders = 8
wal_keep_segments = 256

How are you shipping the WALs?

I use a bash script to ship them. The script hasn't being changed.... So
it isn't the problem.

*postgresql.conf:*

archive_command = 'exec nice -n 19 ionice -c 2 -n 7
archive_command.bash "%p" slave01 slave02'

Seems to me the settings for nice and ionice above would, on a busy
machine, slow down the transfer. Has there always been a notable time
difference in the transfer or has it gotten worse over time?

*archive_command.bash:*

Basically we use TAR to ship through ssh:

# we use tar over SSH as I don't fully trust scp's exit status. The
added benefit is that tar preserves all attributes
# the downside is that it's a little tricky to make the remote path
relative
tar -c -O --no-same-owner -C "${WAL_SEGMENT%/*}"
"${WAL_SEGMENT##*/}" | ssh -p ${NEXT_PORT} -C -o 'BatchMode=yes' -o
'CompressionLevel=3' "${USER}@${NEXT_HOST}" "exec tar -x
--no-same-owner --overwrite -C '${WAL_ARCHIVE_PATH}'";
PS_CONCAT="${PIPESTATUS[*]}";

The script is complex, but as I said, nothing has been changed on it.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Patrick B
patrickbakerbr@gmail.com
In reply to: Adrian Klaver (#6)
Re: Question about wal files / pg_xlogs

@Adrian,

Seems to me the settings for nice and ionice above would, on a busy
machine, slow down the transfer. Has there always been a notable time
difference in the transfer or has it gotten worse over time?

Yep... I also thought about that. Specially because the master is

constantly getting 100% of IO (we use SATA disks still)...

I'm thinking about removing that `ionice` command... I don't need to
restart Postgres eh?? Just reload the confs?

@John R Pierce,

normally, you would ship the archived wal files to a file server via

cp-over-nfs or scp, and have the slaves access them as needed via the
recovery.conf

What if the NFS server goes down? Networking goes down? We have had that
kind of problem in the past, that's why I'm shipping the wal_files to each
slave, separately. Also, to have an extra copy of them.

@Venkata Balaji N,

Not sure why the script is so complex. Do you see any messages in the

postgresql log file on master ? and on slave ? which indicates the reason
for delayed shipping of WAL archives. Did you notice any network level
issues ?

Yes the script is complex.. I've hidden almost all of it for privacy
purpose.. sorry....

I don't see any messages on the log files... not on the master and not on
the slaves as well. I just see the message of the wal_files
being successfully shipped to the slaves.

Also, no networking level issues.. because I got four slaves with streaming
replication and all of them are working fine... also, my backup server has
never failed... so no networking issues.

Thanks,

Patrick

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Patrick B (#7)
Re: Question about wal files / pg_xlogs

On 08/04/2016 12:55 PM, Patrick B wrote:

@Adrian,

Seems to me the settings for nice and ionice above would, on a busy
machine, slow down the transfer. Has there always been a notable
time difference in the transfer or has it gotten worse over time?

Yep... I also thought about that. Specially because the master is
constantly getting 100% of IO (we use SATA disks still)...

I'm thinking about removing that `ionice` command... I don't need to
restart Postgres eh?? Just reload the confs?

https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL

"However, archive_command can be changed with a configuration file reload."

@John R Pierce,

normally, you would ship the archived wal files to a file server via
cp-over-nfs or scp, and have the slaves access them as needed via
the recovery.conf

What if the NFS server goes down? Networking goes down? We have had that
kind of problem in the past, that's why I'm shipping the wal_files to
each slave, separately. Also, to have an extra copy of them.

@Venkata Balaji N,

Not sure why the script is so complex. Do you see any messages in
the postgresql log file on master ? and on slave ? which indicates
the reason for delayed shipping of WAL archives. Did you notice any
network level issues ?

Yes the script is complex.. I've hidden almost all of it for privacy
purpose.. sorry....

I don't see any messages on the log files... not on the master and not
on the slaves as well. I just see the message of the wal_files
being successfully shipped to the slaves.

Also, no networking level issues.. because I got four slaves with
streaming replication and all of them are working fine... also, my
backup server has never failed... so no networking issues.

Thanks,

Patrick

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Patrick B
patrickbakerbr@gmail.com
In reply to: Adrian Klaver (#8)
Re: Question about wal files / pg_xlogs

https://www.postgresql.org/docs/9.5/static/continuous-archiv
ing.html#BACKUP-ARCHIVING-WAL

"However, archive_command can be changed with a configuration file reload."

<adrian.klaver@aklaver.com>

Cheers... I removed the IONICE command from the archive_command. However,
did not see any difference.

Any idea?

#10Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Patrick B (#9)
Re: Question about wal files / pg_xlogs

On 08/04/2016 01:16 PM, Patrick B wrote:

https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
<https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL&gt;

"However, archive_command can be changed with a configuration file
reload."

<mailto:adrian.klaver@aklaver.com>

Cheers... I removed the IONICE command from the archive_command.
However, did not see any difference.

Well you just did it, so how would it be possible to notice whether it
took two hours or not?

Any idea?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Melvin Davidson
melvin6925@gmail.com
In reply to: Patrick B (#9)
Re: Question about wal files / pg_xlogs

On Thu, Aug 4, 2016 at 4:16 PM, Patrick B <patrickbakerbr@gmail.com> wrote:

https://www.postgresql.org/docs/9.5/static/continuous-archiv
ing.html#BACKUP-ARCHIVING-WAL

"However, archive_command can be changed with a configuration file
reload."

<adrian.klaver@aklaver.com>

Cheers... I removed the IONICE command from the archive_command. However,
did not see any difference.

Any idea?

Just out of curiostity, are the slaves in the same physical location, or by
some chance are they on a remote site?

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#12Patrick B
patrickbakerbr@gmail.com
In reply to: Melvin Davidson (#11)
Re: Question about wal files / pg_xlogs

Just out of curiostity, are the slaves in the same physical location, or
by some chance are they on a remote site?

two of them in the same physical location, and the other two different
country.

#13Melvin Davidson
melvin6925@gmail.com
In reply to: Patrick B (#12)
Re: Question about wal files / pg_xlogs

On Thu, Aug 4, 2016 at 4:32 PM, Patrick B <patrickbakerbr@gmail.com> wrote:

Just out of curiostity, are the slaves in the same physical location, or
by some chance are they on a remote site?

two of them in the same physical location, and the other two different
country.

two of them in the same physical location, and the other two different

country.

And the time difference is on ALL slaves, or just the two in a different
country?

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#14Patrick B
patrickbakerbr@gmail.com
In reply to: Melvin Davidson (#13)
Re: Question about wal files / pg_xlogs

And the time difference is on ALL slaves, or just the two in a different
country?

All of them! :(

#15Alex Ignatov
a.ignatov@postgrespro.ru
In reply to: Patrick B (#1)
Re: Question about wal files / pg_xlogs

Hello!

As I can see files is delivered not with delay but with timeshift.

1. Can you show me restore_command on slave?

2. Also can you check archived WAL creation time on slaves in archive
location after you copied them with archive_command? Is in near WAL
creation time in pg_xlogs? Or different?

3. How do you check timezone equivalence between master and slave? What

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Show quoted text

On 04.08.2016 05:21, Patrick B wrote:

Hi all,

I'm currently using PostgreSQL 9.2. I noticed that the wal_files are
being generated by the master well, no problems. But on the slaves, it
seems to be a delay to the delivery of those wal_files.

I got two slaves using streaming replication and wal files shipment
from Master.

*On the master:*

ls -ltr /var/lib/pgsql/9.2/data/pg_xlogs/
Aug 4 02:18 000000020000159D000000D1
Aug 4 02:18 000000020000159D000000D2
Aug 4 02:18 000000020000159D000000D3

select * from pg_current_xlog_location();
159D/D6C8DAF8

So, seems to be ok.

*On the slave:*

ls -ltr /var/lib/pgsql/9.2/wal_archive:
Aug 4 00:58 000000020000159C00000071
Aug 4 00:58 000000020000159C00000072
Aug 4 00:58 000000020000159C00000073

See the time difference? 2 hours? It seems the files are being
delivered with 2 hours delay.

The streaming replication is working fine... But if that goes down,
I'll need the wal_files up to date to recover the database.

How can I see what's going on? What would be the steps? any tips?

Cheers
Patrick.

#16Patrick B
patrickbakerbr@gmail.com
In reply to: Alex Ignatov (#15)
Re: Question about wal files / pg_xlogs

Hi guys,

I can now confirm that by taking off the IONICE command solved my problem.

Thanks a lot !
Patrick