Weird WAL problem - 9.0.3

Started by Rafael Martinezabout 15 years ago4 messagesgeneral
Jump to latest
#1Rafael Martinez
r.m.guerrero@usit.uio.no

Hello

Yesterday we had a weird problem with the pg_xlog partition in one of
our servers:

- The amount of WAL files was much higher than (2*checkpoint_segments)+1
(over 360 WAL files)

- The WAL files were not created/recycle time-ordered. Here is an
example:

.....................
16777216 Apr 12 17:49 000000010000000D0000001C
16777216 Apr 12 17:49 000000010000000D0000001D
16777216 Apr 12 17:49 000000010000000D0000001E
16777216 Apr 12 17:52 000000010000000D0000001F
16777216 Apr 12 17:50 000000010000000D00000020
16777216 Apr 12 17:51 000000010000000D00000021
16777216 Apr 12 17:49 000000010000000D00000022
16777216 Apr 12 17:49 000000010000000D00000023
16777216 Apr 12 17:49 000000010000000D00000024
16777216 Apr 12 17:51 000000010000000D00000025
.....................

This is the first time I see this behavior with the result of a full
pg_xlog partition.

This happened when testing an upgrade procedure and moving on the fly
with "pg_dumpall | psql" around 30 databases (ca.140GB) from a 8.3
server to a 9.0 one.

Is this normal? If it is, how can we find out the max.number of WAL
files a 9.0 system can generate in the worst case scenario?

Some relevant information about this system:
------------------------------------------------
PostgreSQL 9.0.3 - ext4 - RHEL5.6 - 2.6.18-238.5.1.el5 - x86_64

checkpoint_segments: 128
wal_buffers: 512kB
wal_level: archive
wal_sync_method: fdatasync
shared_buffers: 10GB
------------------------------------------------

regards,
--
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rafael Martinez (#1)
Re: Weird WAL problem - 9.0.3

On Wednesday, April 13, 2011 6:09:25 am Rafael Martinez wrote:

Hello

Yesterday we had a weird problem with the pg_xlog partition in one of
our servers:

- The amount of WAL files was much higher than (2*checkpoint_segments)+1
(over 360 WAL files)

Might want to take a look at:
http://www.postgresql.org/docs/9.0/interactive/wal-configuration.html
In particular:

"There will always be at least one WAL segment file, and will normally not be
more files than the higher of wal_keep_segments or (2 +
checkpoint_completion_target) * checkpoint_segments + 1. Each segment file is
normally 16 MB (though this size can be altered when building the server). You
can use this to estimate space requirements for WAL. Ordinarily, when old log
segment files are no longer needed, they are recycled (renamed to become the next
segments in the numbered sequence). If, due to a short-term peak of log output
rate, there are more than 3 * checkpoint_segments + 1 segment files, the unneeded
segment files will be deleted instead of recycled until the system gets back
under this limit."

--
Adrian Klaver
adrian.klaver@gmail.com

#3Rafael Martinez
r.m.guerrero@usit.uio.no
In reply to: Adrian Klaver (#2)
Re: Weird WAL problem - 9.0.3

On Wed, 2011-04-13 at 06:28 -0700, Adrian Klaver wrote:

On Wednesday, April 13, 2011 6:09:25 am Rafael Martinez wrote:

Might want to take a look at:

[......]

sequence). If, due to a short-term peak of log output rate, there are
more than 3 * checkpoint_segments + 1 segment files, the unneeded
segment files will be deleted instead of recycled until the system
gets back under this limit."

Thank you.

This explains the number of WAL files generated, I was not aware of the
(3*checkpoint_segments + 1) limit (I will RTBM better next time)

But this doesn't explain the WAL files not been created/recycled
time-ordered. I wonder if this happened because the partition got full
while the WALs were created/recycled?

regards,
--
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rafael Martinez (#3)
Re: Weird WAL problem - 9.0.3

Rafael Martinez <r.m.guerrero@usit.uio.no> writes:

But this doesn't explain the WAL files not been created/recycled
time-ordered. I wonder if this happened because the partition got full
while the WALs were created/recycled?

When a checkpoint finishes, it scans the pg_xlog directory to find WAL
files that are no longer needed (because they're before the checkpoint's
WAL replay point). It will either rename them "forward" to become ready
for future use, or delete them if there are already enough future WAL
files present (as determined by checkpoint_segments). The order in
which old segments get renamed to be future ones is basically chance,
because it's determined by the order in which readdir() visits them.
So there's no reason to think that their file timestamps will be in
order.

I would expect WAL files that are *behind* the current write point to
have increasing write timestamps. But not those ahead.

regards, tom lane