Add function to return backup_label and tablespace_map

Started by Fujii Masaoalmost 4 years ago14 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

Hi,

Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and tablespace_map files from the result of pg_backup_stop() when taking a base backup using low level backup API. One issue when doing this is that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that pg_backup_stop() returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run some OS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which would prevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one, I'm afraid.

To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and tablespace_map? I'm thinking to make this function available just after pg_backup_start() finishes, also even after pg_backup_stop() finishes. For example, this function allows us to take a backup using the following psql script file.

------------------------------
SELECT * FROM pg_backup_start('test');
\! cp -a $PGDATA /backup
SELECT * FROM pg_backup_stop();

\pset tuples_only on
\pset format unaligned

\o /backup/data/backup_label
SELECT labelfile FROM pg_backup_label();

\o /backup/data/tablespace_map
SELECT spcmapfile FROM pg_backup_label();
------------------------------

Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can successfully reach the consensus for adding the function, I will update the patch.

Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

pg_backup_label_v1.patchtext/plain; charset=UTF-8; name=pg_backup_label_v1.patchDownload+50-13
#2Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Fujii Masao (#1)
Re: Add function to return backup_label and tablespace_map

On Thu, Jul 7, 2022 at 10:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Hi,

Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and tablespace_map files from the result of pg_backup_stop() when taking a base backup using low level backup API. One issue when doing this is that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that pg_backup_stop() returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run some OS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which would prevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one, I'm afraid.

To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and tablespace_map? I'm thinking to make this function available just after pg_backup_start() finishes, also even after pg_backup_stop() finishes. For example, this function allows us to take a backup using the following psql script file.

------------------------------
SELECT * FROM pg_backup_start('test');
\! cp -a $PGDATA /backup
SELECT * FROM pg_backup_stop();

\pset tuples_only on
\pset format unaligned

\o /backup/data/backup_label
SELECT labelfile FROM pg_backup_label();

\o /backup/data/tablespace_map
SELECT spcmapfile FROM pg_backup_label();
------------------------------

Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can successfully reach the consensus for adding the function, I will update the patch.

Thought?

+1 for making it easy for the user to create backup_label and
tablespace_map files. With the patch, the label_file and
tblspc_map_file contents are preserved until the lifecycle of the
session or the next run of pg_backup_start, I'm not sure if we need to
worry more about it.

Why can't we have functions like pg_create_backup_label() and
pg_create_tablespace_map() which create the 'backup_label' and
'tablespace_map' files respectively in the data directory and also
return the contents as output columns?

Also, we can let users run these create functions only once (perhaps
after the pg_backup_stop is called which is when the contents will be
consistent). If we allow these functions to read the label_file or
tblspc_map_file contents during the backup before stop backup, they
may not be consistent. We can have a new sessionBackupState something
like SESSION_BACKUP_READY_TO_COLLECT_INFO or SESSION_BACKUP_DONE and
after the new function calls sessionBackupState goes to
SESSION_BACKUP_NONE) and the contents of label_file and
tblspc_map_file are freed up.

In the docs, it's good if we can clearly specify the steps to use all
of these functions.

Regards,
Bharath Rupireddy.

#3Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#2)
Re: Add function to return backup_label and tablespace_map

On Fri, Jul 8, 2022 at 3:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Jul 7, 2022 at 10:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Hi,

Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and tablespace_map files from the result of pg_backup_stop() when taking a base backup using low level backup API. One issue when doing this is that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that pg_backup_stop() returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run some OS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which would prevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one, I'm afraid.

To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and tablespace_map? I'm thinking to make this function available just after pg_backup_start() finishes, also even after pg_backup_stop() finishes. For example, this function allows us to take a backup using the following psql script file.

------------------------------
SELECT * FROM pg_backup_start('test');
\! cp -a $PGDATA /backup
SELECT * FROM pg_backup_stop();

\pset tuples_only on
\pset format unaligned

\o /backup/data/backup_label
SELECT labelfile FROM pg_backup_label();

\o /backup/data/tablespace_map
SELECT spcmapfile FROM pg_backup_label();
------------------------------

Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can successfully reach the consensus for adding the function, I will update the patch.

Thought?

+1 for making it easy for the user to create backup_label and
tablespace_map files. With the patch, the label_file and
tblspc_map_file contents are preserved until the lifecycle of the
session or the next run of pg_backup_start, I'm not sure if we need to
worry more about it.

Why can't we have functions like pg_create_backup_label() and
pg_create_tablespace_map() which create the 'backup_label' and
'tablespace_map' files respectively in the data directory and also
return the contents as output columns?

Also, we can let users run these create functions only once (perhaps
after the pg_backup_stop is called which is when the contents will be
consistent). If we allow these functions to read the label_file or
tblspc_map_file contents during the backup before stop backup, they
may not be consistent. We can have a new sessionBackupState something
like SESSION_BACKUP_READY_TO_COLLECT_INFO or SESSION_BACKUP_DONE and
after the new function calls sessionBackupState goes to
SESSION_BACKUP_NONE) and the contents of label_file and
tblspc_map_file are freed up.

In the docs, it's good if we can clearly specify the steps to use all
of these functions.

Forgot to mention a comment on the v1 patch: we'll need to revoke
permissions from the public for pg_backup_label (or whatever the new
function(s) that'll be introduced) as well similar to pg_backup_start
and pg_backup_stop.

Regards,
Bharath Rupireddy.

#4David Steele
david@pgmasters.net
In reply to: Fujii Masao (#1)
Re: Add function to return backup_label and tablespace_map

On 7/7/22 12:43, Fujii Masao wrote:

Since an exclusive backup method was dropped in v15, in v15 or later, we
need to create backup_label and tablespace_map files from the result of
pg_backup_stop() when taking a base backup using low level backup API.
One issue when doing this is that; there is no simple way to create
those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop() returns if we execute it via psql. Probaby we need to
store those columns in a temporary file and run some OS commands or
script to separate that file into backup_label and tablespace_map.

Why not just select these columns into a temp table:

create temp table backup_result as select * from pg_backup_stop(...);

Then they can be easily dumped with \o by selecting from the temp table.

To enable us to do that more easily, how about adding the
pg_backup_label() function that returns backup_label and tablespace_map?
I'm thinking to make this function available just after
pg_backup_start() finishes

This makes me nervous as I'm sure users will immediately start writing
backup_label into PGDATA to make their lives easier. Having backup_label
in PGDATA for a running cluster causes problems and is the major reason
we deprecated and then removed the exclusive method. In addition, what
little protection we had from this condition has been removed.

Regards,
-David

#5Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: David Steele (#4)
Re: Add function to return backup_label and tablespace_map

On Fri, Jul 8, 2022 at 5:12 PM David Steele <david@pgmasters.net> wrote:

To enable us to do that more easily, how about adding the
pg_backup_label() function that returns backup_label and tablespace_map?
I'm thinking to make this function available just after
pg_backup_start() finishes

This makes me nervous as I'm sure users will immediately start writing
backup_label into PGDATA to make their lives easier. Having backup_label
in PGDATA for a running cluster causes problems and is the major reason
we deprecated and then removed the exclusive method. In addition, what
little protection we had from this condition has been removed.

IIUC, with the new mechanism, we don't need a backup_label file to be
present in the data directory after pg_backup_stop? If yes, where will
the postgres recover from if it crashes after pg_backup_stop before
the next checkpoint? I'm trying to understand the significance of the
backup_label and tablespace_map contents after the removal of
exclusive backup.

Also, do we need the read_backup_label part of the code [1]if (read_backup_label(&CheckPointLoc, &CheckPointTLI, &backupEndRequired, &backupFromStandby)) { List *tablespaces = NIL;?

[1]: if (read_backup_label(&CheckPointLoc, &CheckPointTLI, &backupEndRequired, &backupFromStandby)) { List *tablespaces = NIL;
if (read_backup_label(&CheckPointLoc, &CheckPointTLI, &backupEndRequired,
&backupFromStandby))
{
List *tablespaces = NIL;

/*
* Archive recovery was requested, and thanks to the backup label
* file, we know how far we need to replay to reach consistency. Enter
* archive recovery directly.
*/
InArchiveRecovery = true;
if (StandbyModeRequested)
StandbyMode = true;

/*
* When a backup_label file is present, we want to roll forward from
* the checkpoint it identifies, rather than using pg_control.
*/

Regards,
Bharath Rupireddy.

#6David Steele
david@pgmasters.net
In reply to: Bharath Rupireddy (#5)
Re: Add function to return backup_label and tablespace_map

On 7/8/22 07:53, Bharath Rupireddy wrote:

On Fri, Jul 8, 2022 at 5:12 PM David Steele <david@pgmasters.net> wrote:

To enable us to do that more easily, how about adding the
pg_backup_label() function that returns backup_label and tablespace_map?
I'm thinking to make this function available just after
pg_backup_start() finishes

This makes me nervous as I'm sure users will immediately start writing
backup_label into PGDATA to make their lives easier. Having backup_label
in PGDATA for a running cluster causes problems and is the major reason
we deprecated and then removed the exclusive method. In addition, what
little protection we had from this condition has been removed.

IIUC, with the new mechanism, we don't need a backup_label file to be
present in the data directory after pg_backup_stop? If yes, where will
the postgres recover from if it crashes after pg_backup_stop before
the next checkpoint? I'm trying to understand the significance of the
backup_label and tablespace_map contents after the removal of
exclusive backup.

backup_label should be written directly into the backup and should be
present when the backup is restored and before recovery begins. It
should not be present in a normally operating cluster or it will cause
problems after crashes and restarts.

Also, do we need the read_backup_label part of the code [1]?

Yes, since the backup_label is required for recovery.

Regards,
-David

#7Julien Rouhaud
rjuju123@gmail.com
In reply to: David Steele (#4)
Re: Add function to return backup_label and tablespace_map

On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote:

On 7/7/22 12:43, Fujii Masao wrote:

Since an exclusive backup method was dropped in v15, in v15 or later, we
need to create backup_label and tablespace_map files from the result of
pg_backup_stop() when taking a base backup using low level backup API.
One issue when doing this is that; there is no simple way to create
those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop() returns if we execute it via psql. Probaby we need to
store those columns in a temporary file and run some OS commands or
script to separate that file into backup_label and tablespace_map.

Why not just select these columns into a temp table:

create temp table backup_result as select * from pg_backup_stop(...);

Then they can be easily dumped with \o by selecting from the temp table.

That wouldn't help people making backups from standby servers.

#8David Steele
david@pgmasters.net
In reply to: Julien Rouhaud (#7)
Re: Add function to return backup_label and tablespace_map

On 7/8/22 08:22, Julien Rouhaud wrote:

On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote:

On 7/7/22 12:43, Fujii Masao wrote:

Since an exclusive backup method was dropped in v15, in v15 or later, we
need to create backup_label and tablespace_map files from the result of
pg_backup_stop() when taking a base backup using low level backup API.
One issue when doing this is that; there is no simple way to create
those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop() returns if we execute it via psql. Probaby we need to
store those columns in a temporary file and run some OS commands or
script to separate that file into backup_label and tablespace_map.

Why not just select these columns into a temp table:

create temp table backup_result as select * from pg_backup_stop(...);

Then they can be easily dumped with \o by selecting from the temp table.

That wouldn't help people making backups from standby servers.

Ah, yes, good point. This should work on a standby, though:

select quote_literal(labelfile) as backup_label from pg_backup_stop(...)
\gset
\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :backup_label;

Regards,
-David

#9Christoph Berg
myon@debian.org
In reply to: David Steele (#4)
Re: Add function to return backup_label and tablespace_map

Re: David Steele

To enable us to do that more easily, how about adding the
pg_backup_label() function that returns backup_label and tablespace_map?
I'm thinking to make this function available just after
pg_backup_start() finishes

I was just wondering: Why is "labelfile" only returned by
pg_backup_stop()? All the info in there is already available at
pg_backup_start() time. Having the output available earlier would
allow writing the backup_label into the backup directory, or store it
along some filesystem snapshot that is already immutable by the time
pg_backup_stop is called.

If we rename all functions anyway for PG15, we could move the info
from stop to start.

This makes me nervous as I'm sure users will immediately start writing
backup_label into PGDATA to make their lives easier. Having backup_label in
PGDATA for a running cluster causes problems and is the major reason we
deprecated and then removed the exclusive method. In addition, what little
protection we had from this condition has been removed.

Is that really an argument for making the life of everyone else
harder?

Christoph

#10David Steele
david@pgmasters.net
In reply to: Christoph Berg (#9)
Re: Add function to return backup_label and tablespace_map

On 7/8/22 09:10, Christoph Berg wrote:

Re: David Steele

To enable us to do that more easily, how about adding the
pg_backup_label() function that returns backup_label and tablespace_map?
I'm thinking to make this function available just after
pg_backup_start() finishes

I was just wondering: Why is "labelfile" only returned by
pg_backup_stop()? All the info in there is already available at
pg_backup_start() time.

Not sure exactly why this decision was made in 9.6 (might be because
tablespace_map does need to be generated at stop time), but I'm planning
to add data to this file in PG16 that is only available at stop time. In
particular, backup software would like to know the earliest possible
time that can be used for PITR and right now this needs to be
approximated. Would be good to have that in backup_label along with
start time. Min recovery xid would also be very useful.

Having the output available earlier would
allow writing the backup_label into the backup directory, or store it
along some filesystem snapshot that is already immutable by the time
pg_backup_stop is called.

What is precluded by getting the backup label after pg_backup_stop()?
Perhaps a more detailed example here would be helpful.

If we rename all functions anyway for PG15, we could move the info
from stop to start.

This makes me nervous as I'm sure users will immediately start writing
backup_label into PGDATA to make their lives easier. Having backup_label in
PGDATA for a running cluster causes problems and is the major reason we
deprecated and then removed the exclusive method. In addition, what little
protection we had from this condition has been removed.

Is that really an argument for making the life of everyone else
harder?

I don't see how anyone's life is made harder unless the plan is to write
backup_label into PGDATA, which should not be done.

As we've noted before, there's no point in pretending that doing backup
correctly is easy because it is definitely not.

Regards,
-David

#11David Steele
david@pgmasters.net
In reply to: David Steele (#8)
Re: Add function to return backup_label and tablespace_map

On 7/8/22 09:09, David Steele wrote:

On 7/8/22 08:22, Julien Rouhaud wrote:

On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote:

On 7/7/22 12:43, Fujii Masao wrote:

Since an exclusive backup method was dropped in v15, in v15 or
later, we
need to create backup_label and tablespace_map files from the result of
pg_backup_stop() when taking a base backup using low level backup API.
One issue when doing this is that; there is no simple way to create
those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop() returns if we execute it via psql. Probaby we need to
store those columns in a temporary file and run some OS commands or
script to separate that file into backup_label and tablespace_map.

Why not just select these columns into a temp table:

create temp table backup_result as select * from pg_backup_stop(...);

Then they can be easily dumped with \o by selecting from the temp table.

That wouldn't help people making backups from standby servers.

Ah, yes, good point. This should work on a standby, though:

select quote_literal(labelfile) as backup_label from pg_backup_stop(...)
\gset
\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :backup_label;

Looks like I made that more complicated than it needed to be:

select * from pg_backup_stop(...) \gset
\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :'labelfile';

Regards,
-David

#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#1)
Re: Add function to return backup_label and tablespace_map

At Fri, 8 Jul 2022 01:43:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

finishes. For example, this function allows us to take a backup using
the following psql script file.

------------------------------
SELECT * FROM pg_backup_start('test');
\! cp -a $PGDATA /backup
SELECT * FROM pg_backup_stop();

\pset tuples_only on
\pset format unaligned

\o /backup/data/backup_label
SELECT labelfile FROM pg_backup_label();

\o /backup/data/tablespace_map
SELECT spcmapfile FROM pg_backup_label();
------------------------------

As David mentioned, we can do the same thing now by using \gset, when
we want to save the files on the client side. (File copy is done on
the server side by the steps, though.)

Thinking about another scenario of generating those files server-side
(this is safer than the client-side method regarding to
line-separators and the pset settings, I think). We can do that by
using admingpack instead, with simpler steps.

SELECT lsn, labelfile, spcmapfile
pg_file_write('/tmp/backup_label', labelfile, false),
pg_file_write('/tmp/tablespace_map', spcmapfile, false)
FROM pg_backup_stop();

However, if pg_file_write() fails, the data are gone. But \gset also
works here.

select pg_backup_start('s1');
SELECT * FROM pg_backup_stop() \gset
SELECT pg_file_write('/tmp/backup_label', :'labelfile', false);
SELECT pg_file_write('/tmp/tablespace_map', :'spcmapfile', false);

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#13Fujii Masao
masao.fujii@gmail.com
In reply to: David Steele (#11)
Re: Add function to return backup_label and tablespace_map

On 2022/07/08 23:11, David Steele wrote:

Looks like I made that more complicated than it needed to be:

select * from pg_backup_stop(...) \gset
\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :'labelfile';

Thanks! I had completely forgotten \gset command.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#14Stephen Frost
sfrost@snowman.net
In reply to: Fujii Masao (#13)
Re: Add function to return backup_label and tablespace_map

Greetings,

* Fujii Masao (masao.fujii@oss.nttdata.com) wrote:

On 2022/07/08 23:11, David Steele wrote:

Looks like I made that more complicated than it needed to be:

select * from pg_backup_stop(...) \gset
\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :'labelfile';

Thanks! I had completely forgotten \gset command.

Seems like it might make sense to consider using a better format for
these files and also to allow us to more easily add things in the future
(ending LSN, ending time, etc) for backup tools to be able to leverage.
Perhaps we should change this to having just a single file returned,
instead of two, and use JSON for it, as a more general and extensible
format that we've already got code to work with..?

Thanks,

Stephen