recovering from "too many failures" wal error

Started by CS DBAover 11 years ago4 messagesgeneral
Jump to latest
#1CS DBA
cs_dba@consistentstate.com

All;

We have a postgresql 9.2 cluster setup to do continuous wal archiving.
We were archiving to a mount point that went offline. As a result the db
could not archive the wal files, we ended up with many many errors in
the logs indicating the file could not be archived:

WARNING: transaction log file "0000000100000FB100000050" could not be
archived: too many failures

So we caught the issue before the file system filled up, fixed the mount
point and I see wal files being added to the target wal archive
directory. However the pg_xlog directory does not seem to be shrinking,
there are currently 27,546 files in the pg_xlog directory and that
number is not changed in some time (since we fixed the mount point.

I assume the db will at some point remove the backed up files in the
pg_xlog dir, is this true? or do I need to intervene?

Thanks in advance

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Andy Colson
andy@squeakycode.net
In reply to: CS DBA (#1)
Re: recovering from "too many failures" wal error

On 11/29/2014 3:37 PM, CS DBA wrote:

All;

We have a postgresql 9.2 cluster setup to do continuous wal archiving.
We were archiving to a mount point that went offline. As a result the db
could not archive the wal files, we ended up with many many errors in
the logs indicating the file could not be archived:

WARNING: transaction log file "0000000100000FB100000050" could not be
archived: too many failures

So we caught the issue before the file system filled up, fixed the mount
point and I see wal files being added to the target wal archive
directory. However the pg_xlog directory does not seem to be shrinking,
there are currently 27,546 files in the pg_xlog directory and that
number is not changed in some time (since we fixed the mount point.

I assume the db will at some point remove the backed up files in the
pg_xlog dir, is this true? or do I need to intervene?

Thanks in advance

From what I recall from this list, you should never play in pg_xlog.
You'll probably do more damage than good. PG should take care of itself.

Are you still getting error messages? Looks like its been a few days,
has it shrunk yet?

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Andres Freund
andres@anarazel.de
In reply to: CS DBA (#1)
Re: recovering from "too many failures" wal error

On 2014-11-29 14:37:56 -0700, CS DBA wrote:

All;

We have a postgresql 9.2 cluster setup to do continuous wal archiving. We
were archiving to a mount point that went offline. As a result the db could
not archive the wal files, we ended up with many many errors in the logs
indicating the file could not be archived:

WARNING: transaction log file "0000000100000FB100000050" could not be
archived: too many failures

So we caught the issue before the file system filled up, fixed the mount
point and I see wal files being added to the target wal archive directory.
However the pg_xlog directory does not seem to be shrinking, there are
currently 27,546 files in the pg_xlog directory and that number is not
changed in some time (since we fixed the mount point.

I assume the db will at some point remove the backed up files in the pg_xlog
dir, is this true? or do I need to intervene?

The archiver will restart trying to archive if either a timeout has
passed (60s?) or if a new file is ready to be archived. So there should
be no need to intervene after fixing archiving. Are files being archived
again? Specifically ones that previously failed?
WAL files will only be removed around checkpoints - you could force one
by manually issuing a 'CHECKPOINT;' statement.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Andy Colson (#2)
Re: recovering from "too many failures" wal error

On Mon, Dec 1, 2014 at 10:59 PM, Andy Colson <andy@squeakycode.net> wrote:

We have a postgresql 9.2 cluster setup to do continuous wal archiving.

We were archiving to a mount point that went offline. As a result the db
could not archive the wal files, we ended up with many many errors in
the logs indicating the file could not be archived:

WARNING: transaction log file "0000000100000FB100000050" could not be
archived: too many failures

So we caught the issue before the file system filled up, fixed the mount
point and I see wal files being added to the target wal archive
directory. However the pg_xlog directory does not seem to be shrinking,
there are currently 27,546 files in the pg_xlog directory and that
number is not changed in some time (since we fixed the mount point.

I assume the db will at some point remove the backed up files in the
pg_xlog dir, is this true? or do I need to intervene?

Thanks in advance

From what I recall from this list, you should never play in pg_xlog.
You'll probably do more damage than good. PG should take care of itself.

​+1

Don't ever do that​

Are you still getting error messages? Looks like its been a few days, has
it shrunk yet?

​if you are still getting error try to force an archival with

set synchronous_commit=off; -- needed only if you are replicating to
synchronous slaves

select pg_switch_xlog(); -- though not neccessary

select pg_start_backup('test');​

select pg_stop_backup();

I use these commands to test if archival is working fine or to force
archival

Best Regards,

*Sameer Kumar | Database Consultant*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: *+65 8110 0350* T: +65 6438 3504 | www.ashnik.com

*[image: icons]*

[image: Email patch] <http://www.ashnik.com/&gt;

This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).

Attachments:

image006.jpgimage/jpeg; name=image006.jpgDownload
image005.jpgimage/jpeg; name=image005.jpgDownload