Stuck trying to backup large database - best practice?

Started by Antony Gelbergover 11 years ago11 messagesgeneral

antony.gelberg@gmail.com

over 11 years ago

Hi,

We have a postgres 9.3.x box, with 1.3TB free space, and our database of
around 1.8TB. Unfortunately, we're struggling to back it up.

When we try a compressed backup with the following command:

pg_basebackup -D "$BACKUP_PATH/$TIMESTAMP" -Ft -Z9 -P -U "$DBUSER" -w

we get error:

pg_basebackup: could not get transaction log end position from server:
ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
removed

This attempted backup reached 430GB before failing.

We were advised on IRC to try -Xs, but that only works with a plain
(uncompressed) backup, and as you'll note from above, we don't have enough
disk space for this.

Is there anything else we can do apart from get a bigger disk (not trivial
at the moment)? Any best practice?

I suspect that setting up WAL archiving and / or playing with the
wal_keep_segments setting might help, but as you can probably gather, I'd
like to be sure that I'm doing something sane before I dive in.

Happy to give more detail if required.

Antony

adrian.klaver@aklaver.com

over 11 years ago

In reply to: Antony Gelberg (#1)

Re: Stuck trying to backup large database - best practice?

On 01/12/2015 07:20 AM, Antony Gelberg wrote:

Hi,

We have a postgres 9.3.x box, with 1.3TB free space, and our database of
around 1.8TB. Unfortunately, we're struggling to back it up.

When we try a compressed backup with the following command:

pg_basebackup -D "$BACKUP_PATH/$TIMESTAMP" -Ft -Z9 -P -U "$DBUSER" -w

we get error:

pg_basebackup: could not get transaction log end position from server:
ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
removed

This attempted backup reached 430GB before failing.

It fails because the WAL file it needs has been removed from under it.

We were advised on IRC to try -Xs, but that only works with a plain
(uncompressed) backup, and as you'll note from above, we don't have
enough disk space for this.

Is there anything else we can do apart from get a bigger disk (not
trivial at the moment)? Any best practice?

What is the purpose of the backup?

In other words do really want the data and the WALs together or do you
just want the data?

I suspect that setting up WAL archiving and / or playing with the
wal_keep_segments setting might help, but as you can probably gather,
I'd like to be sure that I'm doing something sane before I dive in.

Happy to give more detail if required.

Antony

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

antony.gelberg@gmail.com

over 11 years ago

In reply to: Adrian Klaver (#2)

Re: Stuck trying to backup large database - best practice?

<some snippage>

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 07:20 AM, Antony Gelberg wrote:

pg_basebackup: could not get transaction log end position from server:
ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
removed

This attempted backup reached 430GB before failing.

It fails because the WAL file it needs has been removed from under it.

Okay. We simply understood that it took too long. Clearly we have a
lot to learn about WAL and its intricacies.

We were advised on IRC to try -Xs, but that only works with a plain
(uncompressed) backup, and as you'll note from above, we don't have
enough disk space for this.

Is there anything else we can do apart from get a bigger disk (not
trivial at the moment)? Any best practice?

What is the purpose of the backup?

In other words do really want the data and the WALs together or do you
just want the data?

No, we just want to be able to restore our data at a later point. (As
as secondary point, it's not that clear to me why it would be useful
to have both, I'd be interested for some insight.)

Antony

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

adrian.klaver@aklaver.com

over 11 years ago

In reply to: Antony Gelberg (#3)

Re: Stuck trying to backup large database - best practice?

On 01/12/2015 08:10 AM, Antony Gelberg wrote:

<some snippage>

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 07:20 AM, Antony Gelberg wrote:

pg_basebackup: could not get transaction log end position from server:
ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
removed

This attempted backup reached 430GB before failing.

It fails because the WAL file it needs has been removed from under it.

Okay. We simply understood that it took too long. Clearly we have a
lot to learn about WAL and its intricacies.

See here:

http://www.postgresql.org/docs/9.4/interactive/wal.html

We were advised on IRC to try -Xs, but that only works with a plain
(uncompressed) backup, and as you'll note from above, we don't have
enough disk space for this.

Is there anything else we can do apart from get a bigger disk (not
trivial at the moment)? Any best practice?

What is the purpose of the backup?

In other words do really want the data and the WALs together or do you
just want the data?

No, we just want to be able to restore our data at a later point. (As
as secondary point, it's not that clear to me why it would be useful
to have both, I'd be interested for some insight.)

Seems you may be better served by pg_dump:

http://www.postgresql.org/docs/9.4/interactive/app-pgdump.html

pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward data
dump and if you use -Fc you get compression.

Something I failed to ask in my previous post, how are you determining
the size of the database?

In addition are you talking about a single database or the Postgres
database cluster?

Antony

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

antony.gelberg@gmail.com

over 11 years ago

In reply to: Adrian Klaver (#4)

Re: Stuck trying to backup large database - best practice?

On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:10 AM, Antony Gelberg wrote:

<some snippage>

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 07:20 AM, Antony Gelberg wrote:

pg_basebackup: could not get transaction log end position from server:
ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
removed

This attempted backup reached 430GB before failing.

It fails because the WAL file it needs has been removed from under it.

Okay. We simply understood that it took too long. Clearly we have a
lot to learn about WAL and its intricacies.

See here:

http://www.postgresql.org/docs/9.4/interactive/wal.html

Of course we read the docs before asking here, but really learning
about a subject comes with time. :)

We were advised on IRC to try -Xs, but that only works with a plain
(uncompressed) backup, and as you'll note from above, we don't have
enough disk space for this.

Is there anything else we can do apart from get a bigger disk (not
trivial at the moment)? Any best practice?

What is the purpose of the backup?

In other words do really want the data and the WALs together or do you
just want the data?

No, we just want to be able to restore our data at a later point. (As
as secondary point, it's not that clear to me why it would be useful
to have both, I'd be interested for some insight.)

Seems you may be better served by pg_dump:

http://www.postgresql.org/docs/9.4/interactive/app-pgdump.html

pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward data
dump and if you use -Fc you get compression.

So I should clarify - we want to be able to get back to the same point
as we would once the WAL was applied. If we were to use pg_dump,
would we lose out in any way? Appreciate insight as to how
pg_basebackup is scuppering things.

Something I failed to ask in my previous post, how are you determining the
size of the database?

It's a managed server - the hosting company told us it was 1.8TB. I
just ran the query at
http://stackoverflow.com/questions/2596624/how-do-you-find-the-disk-size-of-a-postgres-postgresql-table-and-its-indexes,
and I don't have the total, but I'd say the actual table data is less,
nearer 1TB at a quick glance.

In addition are you talking about a single database or the Postgres database
cluster?

We only have one database in the cluster, so it's the same thing.

Antony

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

adrian.klaver@aklaver.com

over 11 years ago

In reply to: Antony Gelberg (#5)

Re: Stuck trying to backup large database - best practice?

On 01/12/2015 08:40 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:10 AM, Antony Gelberg wrote:

<some snippage>

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 07:20 AM, Antony Gelberg wrote:

pg_basebackup: could not get transaction log end position from server:
ERROR: requested WAL segment 0000000400002B9F000000B4 has already been
removed

This attempted backup reached 430GB before failing.

It fails because the WAL file it needs has been removed from under it.

Okay. We simply understood that it took too long. Clearly we have a
lot to learn about WAL and its intricacies.

See here:

http://www.postgresql.org/docs/9.4/interactive/wal.html

Of course we read the docs before asking here, but really learning
about a subject comes with time. :)

We were advised on IRC to try -Xs, but that only works with a plain
(uncompressed) backup, and as you'll note from above, we don't have
enough disk space for this.

Is there anything else we can do apart from get a bigger disk (not
trivial at the moment)? Any best practice?

What is the purpose of the backup?

In other words do really want the data and the WALs together or do you
just want the data?

No, we just want to be able to restore our data at a later point. (As
as secondary point, it's not that clear to me why it would be useful
to have both, I'd be interested for some insight.)

Seems you may be better served by pg_dump:

http://www.postgresql.org/docs/9.4/interactive/app-pgdump.html

pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward data
dump and if you use -Fc you get compression.

So I should clarify - we want to be able to get back to the same point
as we would once the WAL was applied. If we were to use pg_dump,
would we lose out in any way?

pg_dump does not save WALs, so it would not work for that purpose.

Appreciate insight as to how

pg_basebackup is scuppering things.

From original post it is not entirely clear whether you are using the
-X or -x options. The command you show does not have them, but you
mention -Xs. In any case it seems wal_keep_segments will need to be
bumped up to keep WAL segments around that are being recycled during the
backup process. How much will depend on a determination of fast Postgres
is using/recycling log segments? Looking at the turnover in the pg_xlog
directory would be a start.

Something I failed to ask in my previous post, how are you determining the
size of the database?

It's a managed server - the hosting company told us it was 1.8TB. I
just ran the query at
http://stackoverflow.com/questions/2596624/how-do-you-find-the-disk-size-of-a-postgres-postgresql-table-and-its-indexes,
and I don't have the total, but I'd say the actual table data is less,
nearer 1TB at a quick glance.

In addition are you talking about a single database or the Postgres database
cluster?

We only have one database in the cluster, so it's the same thing.

Antony

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

postgres@boscorama.com

over 11 years ago

In reply to: Antony Gelberg (#5)

Re: Stuck trying to backup large database - best practice?

On 01/12/15 08:40, Antony Gelberg wrote:

In addition are you talking about a single database or the Postgres database
cluster?

We only have one database in the cluster, so it's the same thing.

Not exactly. For example, with pg_dump you would also need to dump the
'global' context separately. Or you could use pg_dumpall for both the
context and the database with all of that then piped into a gzip or
bzip (since pg_dumpall only outputs sql text).

HTH,
Bosco.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

antony.gelberg@gmail.com

over 11 years ago

In reply to: Adrian Klaver (#6)

Re: Stuck trying to backup large database - best practice?

On Mon, Jan 12, 2015 at 7:08 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:40 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:10 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward data
dump and if you use -Fc you get compression.

So I should clarify - we want to be able to get back to the same point
as we would once the WAL was applied. If we were to use pg_dump,
would we lose out in any way?

pg_dump does not save WALs, so it would not work for that purpose.

Appreciate insight as to how

pg_basebackup is scuppering things.

From original post it is not entirely clear whether you are using the -X or -x options. The command you show does not have them, but you mention -Xs. In any case it seems wal_keep_segments will need to be bumped up to keep WAL segments around that are being recycled during the backup process. How much will depend on a determination of fast Postgres is using/recycling log segments? Looking at the turnover in the pg_xlog directory would be a start.

The original script used -xs, but that didn't make sense, so we used
-Xs in the end, but then we cancelled the backup as we assumed that we
wouldn't have enough space for it uncompressed. Did we miss
something?

I think your suggestion of looking in pg_xlog and tweaking
wal_keep_segments is interesting, we'll take a look, and I'll report
back with findings.

Thanks for your very detailed help.

Antony

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

jkregloh@sproutloud.com

over 11 years ago

In reply to: Antony Gelberg (#8)

Re: Stuck trying to backup large database - best practice?

I apologize if it has already been suggested. I already deleted the
previous emails in this chain.

Have you looked into Barman? My current database is just a tad over 1TB. I
have one master, two slaves, and another machine running Barman. The slaves
are there for redundancy purposes. Master fails, a slave gets promoted. The
backups are all done by Barman. This allows for PITR. I do not run any
backup software on the database server, but I do on the Barman server. In
Barman I keep a 7 day retention policy, then I have Bacula backing that up
with a 1 month retention policy. So theoretically I could do a PITR up to a
month in the past.

Thanks,
-Joseph Kregloh

On Mon, Jan 12, 2015 at 5:16 PM, Antony Gelberg <antony.gelberg@gmail.com>
wrote:

Show quoted text

On Mon, Jan 12, 2015 at 7:08 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:40 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:10 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward

data

dump and if you use -Fc you get compression.

So I should clarify - we want to be able to get back to the same point
as we would once the WAL was applied. If we were to use pg_dump,
would we lose out in any way?

pg_dump does not save WALs, so it would not work for that purpose.

Appreciate insight as to how

pg_basebackup is scuppering things.

From original post it is not entirely clear whether you are using the -X

or -x options. The command you show does not have them, but you mention
-Xs. In any case it seems wal_keep_segments will need to be bumped up to
keep WAL segments around that are being recycled during the backup process.
How much will depend on a determination of fast Postgres is using/recycling
log segments? Looking at the turnover in the pg_xlog directory would be a
start.

The original script used -xs, but that didn't make sense, so we used
-Xs in the end, but then we cancelled the backup as we assumed that we
wouldn't have enough space for it uncompressed. Did we miss
something?

I think your suggestion of looking in pg_xlog and tweaking
wal_keep_segments is interesting, we'll take a look, and I'll report
back with findings.

Thanks for your very detailed help.

Antony

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

adrian.klaver@aklaver.com

over 11 years ago

In reply to: Antony Gelberg (#8)

Re: Stuck trying to backup large database - best practice?

On 01/12/2015 02:16 PM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 7:08 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:40 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:10 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward data
dump and if you use -Fc you get compression.

So I should clarify - we want to be able to get back to the same point
as we would once the WAL was applied. If we were to use pg_dump,
would we lose out in any way?

pg_dump does not save WALs, so it would not work for that purpose.

Appreciate insight as to how

pg_basebackup is scuppering things.

From original post it is not entirely clear whether you are using the -X or -x options. The command you show does not have them, but you mention -Xs. In any case it seems wal_keep_segments will need to be bumped up to keep WAL segments around that are being recycled during the backup process. How much will depend on a determination of fast Postgres is using/recycling log segments? Looking at the turnover in the pg_xlog directory would be a start.

The original script used -xs, but that didn't make sense, so we used
-Xs in the end, but then we cancelled the backup as we assumed that we
wouldn't have enough space for it uncompressed. Did we miss
something?

Not sure missed as much as not fully understand:) When you use either -x
or -X you are telling pg_basebackup that you care that the WAL files in
the backup directory are update to the point the backup completed. If
you use -Xf which the same as -x then you are saying wait till the rest
of the backup is finished then collect and copy over all the relevant
WAL files. This is where wal_keep_segments comes into play. It needs to
be set high enough that relevant WAL files in place at the beginning of
the backup are still there when the backup completes in order to have a
complete set. If you use -Xs, then a parallel process is started to copy
over the WAL files while the other data files are being copied over.
Though as the docs say:

http://www.postgresql.org/docs/9.4/interactive/app-pgbasebackup.html

"As long as the client can keep up with transaction log received, using
this mode requires no extra transaction logs to be saved on the master."

So it is possible for the client to fall behind and have a WAL file be
recycled before it can be transferred. If you are experiencing this then
again wal_keep_segments is way of forcing Postgres to keep WAL files
around. The basic concept is that by default WAL files are recycled when
they fall out of scope on the primary and so you have to 'catch' them
before they do or force them to hang around.

Compression is a separate operation and applies only in the tar format
case and should not be affected by the -x(X) options.

If it where me I would start looking at another 'machine' to offload the
backup to. Otherwise you will be looking at increasingly convoluted
methods of getting two bodies to occupy one space.

I think your suggestion of looking in pg_xlog and tweaking
wal_keep_segments is interesting, we'll take a look, and I'll report
back with findings.

Thanks for your very detailed help.

Antony

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

sameer.kumar@ashnik.com

over 11 years ago

In reply to: Adrian Klaver (#10)

Re: Stuck trying to backup large database - best practice?

On Tue, Jan 13, 2015 at 7:04 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 01/12/2015 02:16 PM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 7:08 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:40 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 6:23 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 01/12/2015 08:10 AM, Antony Gelberg wrote:

On Mon, Jan 12, 2015 at 5:31 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

pg_basebackup has additional features which in your case are creating
issues. pg_dump on the other hand is pretty much a straight forward
data
dump and if you use -Fc you get compression.

So I should clarify - we want to be able to get back to the same point
as we would once the WAL was applied. If we were to use pg_dump,
would we lose out in any way?

pg_dump does not save WALs, so it would not work for that purpose.

Appreciate insight as to how

pg_basebackup is scuppering things.

From original post it is not entirely clear whether you are using the
-X or -x options. The command you show does not have them, but you mention
-Xs. In any case it seems wal_keep_segments will need to be bumped up to
keep WAL segments around that are being recycled during the backup process.
How much will depend on a determination of fast Postgres is using/recycling
log segments? Looking at the turnover in the pg_xlog directory would be a
start.

The original script used -xs, but that didn't make sense, so we used
-Xs in the end, but then we cancelled the backup as we assumed that we
wouldn't have enough space for it uncompressed. Did we miss
something?

Not sure missed as much as not fully understand:) When you use either -x
or -X you are telling pg_basebackup that you care that the WAL files in the
backup directory are update to the point the backup completed. If you use
-Xf which the same as -x then you are saying wait till the rest of the
backup is finished then collect and copy over all the relevant WAL files.
This is where wal_keep_segments comes into play. It needs to be set high
enough that relevant WAL files in place at the beginning of the backup are
still there when the backup completes in order to have a complete set. If
you use -Xs, then a parallel process is started to copy over the WAL files
while the other data files are being copied over. Though as the docs say:

http://www.postgresql.org/docs/9.4/interactive/app-pgbasebackup.html

"As long as the client can keep up with transaction log received, using
this mode requires no extra transaction logs to be saved on the master."

Wouldn't streaming the WAL files
(in a parallel process, which is what wal-stream option in pg_basebackup
does)
during the backup resolve the original issue he faced
with WAL being removed
?

So it is possible for the client to fall behind and have a WAL file be

recycled before it can be transferred. If you are experiencing this then
again wal_keep_segments is way of forcing Postgres to keep WAL files
around. The basic concept is that by default WAL files are recycled when
they fall out of scope on the primary and so you have to 'catch' them
before they do or force them to hang around.

Compression is a separate operation and applies only in the tar format
case and should not be affected by the -x(X) options.

But I guess if you take backup on a 2nd machine, the WAL or data-files are
not compressed before sending them over network.

If it where me I would start looking at another 'machine' to offload the
backup to. Otherwise you will be looking at increasingly convoluted methods
of getting two bodies to occupy one space.

+1

I agree...

Best Regards,

*Sameer Kumar | Database Consultant*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: *+65 8110 0350* T: +65 6438 3504 | www.ashnik.com

*[image: icons]*

[image: Email patch] <http://www.ashnik.com/>

This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).