pg_basebackup, requested WAL has already been removed

Started by Sergey Koposovalmost 13 years ago5 messagesgeneral

koposov@ast.cam.ac.uk

almost 13 years ago

Hi,

I've recently started to use pg_basebackup --xlog-method=stream to backup
my multi-Tb database.
Before I did the backup when there was not much activity in the DB and it
went perfectly fine, but today, I've started the backup and it failed
twice almost at the same time as the
CREATE INDEX (and another time CLUSTER) commands were finished.

Here:

postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup --xlog-method=stream --progress --verbose --pg
transaction log start point: 23AE/BD003E70
pg_basebackup: starting background WAL receiver
pg_basebackup: unexpected termination of replication stream: FATAL:
requested WAL segment 00000001000023B1000000FE has already been removed
4819820/16816887078 kB (4%), 0/1 tablespace
(/mnt/backup/wsdb_130510/base/1)

And the logs from around that time contained:

some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG: duration: 40128.163 ms statement: CREATE INDEX usno_cle
an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
::2013-05-10 14:35:43 BST:25529LOG: checkpoints are occurring too frequently (8 seconds apart)
::2013-05-10 14:35:43 BST:25529HINT: Consider increasing the configuration parameter "checkpoint_segmen
ts".
::2013-05-10 14:35:51 BST:25529LOG: checkpoints are occurring too frequently (8 seconds apart)
::2013-05-10 14:35:51 BST:25529HINT: Consider increasing the configuration parameter "checkpoint_segmen
ts".
postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL: requested WAL segment 00000001000023B1000000FE has already been removed
some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG: duration: 78378.194 ms statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

One the previous occasion when it happened the CREATE INDEX() was being executed:

some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG: duration: 67.680 ms statement: SELECT name FROM (SELECT pg_catalog.lower(name) AS name FROM pg_catalog.pg_settings UNION ALL SELECT 'session authorization' UNION ALL SELECT 'all') ss WHERE substring(name,1,4)='rand'
LIMIT 1000
::2013-05-10 09:22:47 BST:25529LOG: checkpoints are occurring too frequently (18 seconds apart)
::2013-05-10 09:22:47 BST:25529HINT: Consider increasing the configuration parameter "checkpoint_segments".
postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL: requested WAL segment 000000010000239900000040 has already been removed
some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG: duration: 542955.262 ms statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

The .configuration
PG 9.2.4, Debian 7.0, amd64

shared_buffers = 10GB
work_mem = 1GB
maintenance_work_mem = 1GB
effective_io_concurrency = 5
synchronous_commit = off
checkpoint_segments = 32
max_wal_senders = 2
effective_cache_size = 30GB
autovacuum_max_workers = 3
wal_level=archive
archive_mode = off

Does it look like a bug or am I missing something ?

Thanks,
Sergey

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Lonni J Friedman

netllama@gmail.com

almost 13 years ago

In reply to: Sergey Koposov (#1)

Re: pg_basebackup, requested WAL has already been removed

Its definitely not a bug. You need to set/increase wal_keep_segments
to a value that ensures that they aren't recycled faster than the time
required to complete the base backup (plus some buffer).

On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:

Hi,

I've recently started to use pg_basebackup --xlog-method=stream to backup my
multi-Tb database.
Before I did the backup when there was not much activity in the DB and it
went perfectly fine, but today, I've started the backup and it failed twice
almost at the same time as the
CREATE INDEX (and another time CLUSTER) commands were finished.

Here:

postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
--xlog-method=stream --progress --verbose --pg
transaction log start point: 23AE/BD003E70
pg_basebackup: starting background WAL receiver
pg_basebackup: unexpected termination of replication stream: FATAL:
requested WAL segment 00000001000023B1000000FE has already been removed
4819820/16816887078 kB (4%), 0/1 tablespace
(/mnt/backup/wsdb_130510/base/1)

And the logs from around that time contained:

some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG: duration: 40128.163 ms
statement: CREATE INDEX usno_cle
an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
::2013-05-10 14:35:43 BST:25529LOG: checkpoints are occurring too
frequently (8 seconds apart)
::2013-05-10 14:35:43 BST:25529HINT: Consider increasing the configuration
parameter "checkpoint_segmen
ts".
::2013-05-10 14:35:51 BST:25529LOG: checkpoints are occurring too
frequently (8 seconds apart)
::2013-05-10 14:35:51 BST:25529HINT: Consider increasing the configuration
parameter "checkpoint_segmen
ts".
postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL: requested WAL segment
00000001000023B1000000FE has already been removed
some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG: duration: 78378.194 ms
statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

One the previous occasion when it happened the CREATE INDEX() was being
executed:

some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG: duration: 67.680 ms
statement: SELECT name FROM (SELECT pg_catalog.lower(name) AS name FROM
pg_catalog.pg_settings UNION ALL SELECT 'session authorization' UNION
ALL SELECT 'all') ss WHERE substring(name,1,4)='rand'
LIMIT 1000
::2013-05-10 09:22:47 BST:25529LOG: checkpoints are occurring too
frequently (18 seconds apart)
::2013-05-10 09:22:47 BST:25529HINT: Consider increasing the configuration
parameter "checkpoint_segments".
postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL: requested WAL
segment 000000010000239900000040 has already been removed
some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG: duration: 542955.262 ms
statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

The .configuration
PG 9.2.4, Debian 7.0, amd64

shared_buffers = 10GB
work_mem = 1GB
maintenance_work_mem = 1GB
effective_io_concurrency = 5
synchronous_commit = off
checkpoint_segments = 32
max_wal_senders = 2
effective_cache_size = 30GB
autovacuum_max_workers = 3
wal_level=archive
archive_mode = off

Does it look like a bug or am I missing something ?

Thanks,
Sergey

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand https://netllama.linux-sxs.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Sergey Koposov

koposov@ast.cam.ac.uk

almost 13 years ago

In reply to: Lonni J Friedman (#2)

Re: pg_basebackup, requested WAL has already been removed

On Fri, 10 May 2013, Lonni J Friedman wrote:

Its definitely not a bug. You need to set/increase wal_keep_segments
to a value that ensures that they aren't recycled faster than the time
required to complete the base backup (plus some buffer).

But I thought that wal_keep_segments is not needed for the streaming
regime ( "--xlog-method=stream") And the documentation
http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
only mentions wal_keep_segments when talking about --xlog-method=fetch.

On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov

<koposov@ast.cam.ac.uk>
wrote: >> Hi,

I've recently started to use pg_basebackup --xlog-method=stream to backup my
multi-Tb database.
Before I did the backup when there was not much activity in the DB and it
went perfectly fine, but today, I've started the backup and it failed twice
almost at the same time as the
CREATE INDEX (and another time CLUSTER) commands were finished.

Here:

postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
--xlog-method=stream --progress --verbose --pg
transaction log start point: 23AE/BD003E70
pg_basebackup: starting background WAL receiver
pg_basebackup: unexpected termination of replication stream: FATAL:
requested WAL segment 00000001000023B1000000FE has already been removed
4819820/16816887078 kB (4%), 0/1 tablespace
(/mnt/backup/wsdb_130510/base/1)

And the logs from around that time contained:

some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG: duration: 40128.163 ms
statement: CREATE INDEX usno_cle
an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
::2013-05-10 14:35:43 BST:25529LOG: checkpoints are occurring too
frequently (8 seconds apart)
::2013-05-10 14:35:43 BST:25529HINT: Consider increasing the configuration
parameter "checkpoint_segmen
ts".
::2013-05-10 14:35:51 BST:25529LOG: checkpoints are occurring too
frequently (8 seconds apart)
::2013-05-10 14:35:51 BST:25529HINT: Consider increasing the configuration
parameter "checkpoint_segmen
ts".
postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL: requested WAL segment
00000001000023B1000000FE has already been removed
some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG: duration: 78378.194 ms
statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

One the previous occasion when it happened the CREATE INDEX() was being
executed:

some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG: duration: 67.680 ms
statement: SELECT name FROM (SELECT pg_catalog.lower(name) AS name FROM
pg_catalog.pg_settings UNION ALL SELECT 'session authorization' UNION
ALL SELECT 'all') ss WHERE substring(name,1,4)='rand'
LIMIT 1000
::2013-05-10 09:22:47 BST:25529LOG: checkpoints are occurring too
frequently (18 seconds apart)
::2013-05-10 09:22:47 BST:25529HINT: Consider increasing the configuration
parameter "checkpoint_segments".
postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL: requested WAL
segment 000000010000239900000040 has already been removed
some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG: duration: 542955.262 ms
statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

The .configuration
PG 9.2.4, Debian 7.0, amd64

shared_buffers = 10GB
work_mem = 1GB
maintenance_work_mem = 1GB
effective_io_concurrency = 5
synchronous_commit = off
checkpoint_segments = 32
max_wal_senders = 2
effective_cache_size = 30GB
autovacuum_max_workers = 3
wal_level=archive
archive_mode = off

Does it look like a bug or am I missing something ?

Thanks,
Sergey

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand https://netllama.linux-sxs.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Lonni J Friedman

netllama@gmail.com

almost 13 years ago

In reply to: Sergey Koposov (#3)

Re: pg_basebackup, requested WAL has already been removed

That's a good point. Then i dunno, perhaps it is a bug, but I'd be
surprised if this wasn't working, as its not really a corner case that
could be missed in testing, as long as all the options were exercised.
Hopefully someone else can weigh in.

On Fri, May 10, 2013 at 10:00 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:

On Fri, 10 May 2013, Lonni J Friedman wrote:

Its definitely not a bug. You need to set/increase wal_keep_segments
to a value that ensures that they aren't recycled faster than the time
required to complete the base backup (plus some buffer).

But I thought that wal_keep_segments is not needed for the streaming regime
( "--xlog-method=stream") And the documentation
http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
only mentions wal_keep_segments when talking about --xlog-method=fetch.

On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov

<koposov@ast.cam.ac.uk> wrote: >> Hi,

I've recently started to use pg_basebackup --xlog-method=stream to backup
my
multi-Tb database.
Before I did the backup when there was not much activity in the DB and it
went perfectly fine, but today, I've started the backup and it failed
twice
almost at the same time as the
CREATE INDEX (and another time CLUSTER) commands were finished.

Here:

postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
--xlog-method=stream --progress --verbose --pg
transaction log start point: 23AE/BD003E70
pg_basebackup: starting background WAL receiver
pg_basebackup: unexpected termination of replication stream: FATAL:
requested WAL segment 00000001000023B1000000FE has already been removed
4819820/16816887078 kB (4%), 0/1 tablespace
(/mnt/backup/wsdb_130510/base/1)

And the logs from around that time contained:

some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG: duration: 40128.163 ms
statement: CREATE INDEX usno_cle
an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
::2013-05-10 14:35:43 BST:25529LOG: checkpoints are occurring too
frequently (8 seconds apart)
::2013-05-10 14:35:43 BST:25529HINT: Consider increasing the
configuration
parameter "checkpoint_segmen
ts".
::2013-05-10 14:35:51 BST:25529LOG: checkpoints are occurring too
frequently (8 seconds apart)
::2013-05-10 14:35:51 BST:25529HINT: Consider increasing the
configuration
parameter "checkpoint_segmen
ts".
postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL: requested WAL
segment
00000001000023B1000000FE has already been removed
some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG: duration: 78378.194 ms
statement: CLUSTER usno_clean_q3c_idx ON usno_clean;

One the previous occasion when it happened the CREATE INDEX() was being
executed:

some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG: duration: 67.680 ms
statement: SELECT name FROM (SELECT pg_catalog.lower(name) AS name FROM
pg_catalog.pg_settings UNION ALL SELECT 'session authorization' UNION
ALL SELECT 'all') ss WHERE substring(name,1,4)='rand'
LIMIT 1000
::2013-05-10 09:22:47 BST:25529LOG: checkpoints are occurring too
frequently (18 seconds apart)
::2013-05-10 09:22:47 BST:25529HINT: Consider increasing the
configuration
parameter "checkpoint_segments".
postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL: requested WAL
segment 000000010000239900000040 has already been removed
some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG: duration: 542955.262 ms
statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);

The .configuration
PG 9.2.4, Debian 7.0, amd64

shared_buffers = 10GB
work_mem = 1GB
maintenance_work_mem = 1GB
effective_io_concurrency = 5
synchronous_commit = off
checkpoint_segments = 32
max_wal_senders = 2
effective_cache_size = 30GB
autovacuum_max_workers = 3
wal_level=archive
archive_mode = off

Does it look like a bug or am I missing something ?

Thanks,
Sergey

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand https://netllama.linux-sxs.org

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman netllama@gmail.com
LlamaLand https://netllama.linux-sxs.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Magnus Hagander

magnus@hagander.net

almost 13 years ago

In reply to: Sergey Koposov (#3)

Re: pg_basebackup, requested WAL has already been removed

On Fri, May 10, 2013 at 7:00 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:

On Fri, 10 May 2013, Lonni J Friedman wrote:

Its definitely not a bug. You need to set/increase wal_keep_segments
to a value that ensures that they aren't recycled faster than the time
required to complete the base backup (plus some buffer).

But I thought that wal_keep_segments is not needed for the streaming regime
( "--xlog-method=stream") And the documentation
http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
only mentions wal_keep_segments when talking about --xlog-method=fetch.

It's not a bug in the software - this will happen if the background
stream in pg_basebackup cannot keep up, and that's normal. It works
the same way as a regular standby in that if it's unable to keep up it
will eventually fall so far behind that it can't recover.

It may definitely be something that needs to be cleared up in the
documentation though.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general