Add exclusive backup deprecation notes to documentation
Hackers,
It has been noted on multiple threads, such as [1]/messages/by-id/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net, that it would be
good to have additional notes in the documentation to explain why
exclusive backups have been deprecated and why they should be avoided
when possible.
This patch attempts to document the limitations of the exclusive mode.
--
-David
david@pgmasters.net
[1]: /messages/by-id/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net
/messages/by-id/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net
Attachments:
exclusive-backup-deprecation-doc-v01.patchtext/plain; charset=UTF-8; name=exclusive-backup-deprecation-doc-v01.patch; x-mac-creator=0; x-mac-type=0Download
From 68beedac7e529a81ebf58ff40d30b28e404b30f5 Mon Sep 17 00:00:00 2001
From: David Steele <david@pgmasters.net>
Date: Thu, 28 Feb 2019 16:55:42 +0200
Subject: [PATCH] Add exclusive backup deprecation notes to documentation.
Update the exclusive backup documentation to explain the limitations of the exclusive backup mode and make it clear that the feature is deprecated.
---
doc/src/sgml/backup.sgml | 42 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..de1a8a5826 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -948,6 +948,48 @@ SELECT * FROM pg_stop_backup(false, true);
</sect3>
<sect3 id="backup-lowlevel-base-backup-exclusive">
<title>Making an exclusive low level backup</title>
+
+ <note>
+ <para>
+ The exclusive backup method is deprecated and should be avoided in favor
+ of the non-exclusive backup method or
+ <application>pg_basebackup</application>.
+ </para>
+
+ <para>
+ The primary issue with the exclusive method is that the
+ <filename>backup_label</filename> file is written into the data directory
+ when <function>pg_start_backup</function> is called and remains until
+ <function>pg_stop_backup</function> is called. If
+ <productname>PostgreSQL</productname> or the host terminates abnormally
+ then <filename>backup_label</filename> will be left in the data directory
+ and <productname>PostgreSQL</productname> will not start. A log message
+ recommends that <filename>backup_label</filename> be removed if not
+ restoring from a backup.
+ </para>
+
+ <para>
+ However, if <filename>backup_label</filename> is present because a restore
+ is actually in progress, then removing it will result in corruption. For
+ this reason it is not recommended to automate the removal of
+ <filename>backup_label</filename>.
+ </para>
+
+ <para>
+ Another issue with exclusive backup mode is that it will continue until
+ <function>pg_stop_backup</function> is called, even if the calling process
+ is no longer performing the backup. The next time
+ <function>pg_start_backup</function> is called it will fail unless
+ <function>pg_stop_backup</function> is called manually first.
+ </para>
+
+ <para>
+ Finally, only one exclusive backup may be run at a time. However, it is
+ possible to run an exclusive backup at the same time as any number of
+ non-exclusive backups.
+ </para>
+ </note>
+
<para>
The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of backup
--
2.17.2 (Apple Git-113)
David Steele wrote:
This patch attempts to document the limitations of the exclusive mode.
Thanks!
+ <para> + The primary issue with the exclusive method is that the + <filename>backup_label</filename> file is written into the data directory + when <function>pg_start_backup</function> is called and remains until + <function>pg_stop_backup</function> is called. If + <productname>PostgreSQL</productname> or the host terminates abnormally
There should be a comma at the end of this line.
+ then <filename>backup_label</filename> will be left in the data directory + and <productname>PostgreSQL</productname> will not start. A log message
You should say "*may* not start", because it will if the WAL segment is still there.
+ recommends that <filename>backup_label</filename> be removed if not + restoring from a backup. + </para>
Yours,
Laurenz Albe
On 2/28/19 6:08 PM, Laurenz Albe wrote:
David Steele wrote:
This patch attempts to document the limitations of the exclusive mode.
Thanks!
+ <para> + The primary issue with the exclusive method is that the + <filename>backup_label</filename> file is written into the data directory + when <function>pg_start_backup</function> is called and remains until + <function>pg_stop_backup</function> is called. If + <productname>PostgreSQL</productname> or the host terminates abnormallyThere should be a comma at the end of this line.
Fixed.
+ then <filename>backup_label</filename> will be left in the data directory + and <productname>PostgreSQL</productname> will not start. A log messageYou should say "*may* not start", because it will if the WAL segment is still there.
You are correct. It's still pretty likely though so I went with "probably".
I added some extra language to the warning that gets emitted in the log.
Users are more like to see that than the documentation.
I also addressed a comment from another thread by adding pg_basebackup
as .e.g. rather than or.
Thanks,
--
-David
david@pgmasters.net
Attachments:
exclusive-backup-deprecation-doc-v02.patchtext/plain; charset=UTF-8; name=exclusive-backup-deprecation-doc-v02.patch; x-mac-creator=0; x-mac-type=0Download
From fa2391c9f843358d4e2264b62297eb10bbffcc5a Mon Sep 17 00:00:00 2001
From: David Steele <david@pgmasters.net>
Date: Fri, 1 Mar 2019 11:14:41 +0200
Subject: [PATCH] Add exclusive backup deprecation notes to documentation.
Update the exclusive backup documentation to explain the limitations of the
exclusive backup mode and make it clear that the feature is deprecated.
Update the log message when the backup_label is found but recovery
cannot proceed.
---
doc/src/sgml/backup.sgml | 42 +++++++++++++++++++++++++++++++
src/backend/access/transam/xlog.c | 10 ++++++--
2 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..1d09808b0d 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -948,6 +948,48 @@ SELECT * FROM pg_stop_backup(false, true);
</sect3>
<sect3 id="backup-lowlevel-base-backup-exclusive">
<title>Making an exclusive low level backup</title>
+
+ <note>
+ <para>
+ The exclusive backup method is deprecated and should be avoided in favor
+ of the non-exclusive backup method, e.g.
+ <application>pg_basebackup</application>.
+ </para>
+
+ <para>
+ The primary issue with the exclusive method is that the
+ <filename>backup_label</filename> file is written into the data directory
+ when <function>pg_start_backup</function> is called and remains until
+ <function>pg_stop_backup</function> is called. If
+ <productname>PostgreSQL</productname> or the host terminates abnormally,
+ then <filename>backup_label</filename> will be left in the data directory
+ and <productname>PostgreSQL</productname> probably will not start. A log
+ message recommends that <filename>backup_label</filename> be removed if not
+ restoring from a backup.
+ </para>
+
+ <para>
+ However, if <filename>backup_label</filename> is present because a restore
+ is actually in progress, then removing it will result in corruption. For
+ this reason it is not recommended to automate the removal of
+ <filename>backup_label</filename>.
+ </para>
+
+ <para>
+ Another issue with exclusive backup mode is that it will continue until
+ <function>pg_stop_backup</function> is called, even if the calling process
+ is no longer performing the backup. The next time
+ <function>pg_start_backup</function> is called it will fail unless
+ <function>pg_stop_backup</function> is called manually first.
+ </para>
+
+ <para>
+ Finally, only one exclusive backup may be run at a time. However, it is
+ possible to run an exclusive backup at the same time as any number of
+ non-exclusive backups.
+ </para>
+ </note>
+
<para>
The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of backup
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ecd12fc53a..e12a04414b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6405,14 +6405,20 @@ StartupXLOG(void)
if (!ReadRecord(xlogreader, checkPoint.redo, LOG, false))
ereport(FATAL,
(errmsg("could not find redo location referenced by checkpoint record"),
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add recovery options to \"%s/postgresql.auto.conf\".\n"
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
+ "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
+ DataDir, DataDir, DataDir, DataDir)));
}
}
else
{
ereport(FATAL,
(errmsg("could not locate required checkpoint record"),
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add recovery options to \"%s/postgresql.auto.conf\".\n"
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
+ "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
+ DataDir, DataDir, DataDir, DataDir)));
wasShutdown = false; /* keep compiler quiet */
}
--
2.17.2 (Apple Git-113)
David Steele wrote:
I added some extra language to the warning that gets emitted in the log.
Users are more like to see that than the documentation.I also addressed a comment from another thread by adding pg_basebackup
as .e.g. rather than or.
Looks good to me.
Yours,
Laurenz Albe
El vie., 1 de mar. de 2019 a la(s) 06:21, David Steele
(david@pgmasters.net) escribió:
I also addressed a comment from another thread by adding pg_basebackup
as .e.g. rather than or.
Thanks David,
This looks very good!
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
El vie., 1 de mar. de 2019 a la(s) 06:21, David Steele
(david@pgmasters.net) escribió:
I added some extra language to the warning that gets emitted in the log.
Users are more like to see that than the documentation.I also addressed a comment from another thread by adding pg_basebackup
as .e.g. rather than or.
More awake, I gave this last patch a second read. Wording is good now.
No objections there at all.
I do think that paragraph 2 and 3 should be merged as it seems the
idea on the third is a continuation of what's in the second.
But even without that change, I believe this patch is good for commit.
Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Please follow the 1-space indentation in the documentation files.
I think the style of mentioning all the problems in a note before the
actual description is a bit backwards.
The layout of the section should be:
- This is what it does.
- Here are some comparisons with other methods.
- For this or that reason, it's deprecated.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/1/19 1:13 PM, Peter Eisentraut wrote:
Please follow the 1-space indentation in the documentation files.
Whoops. Will fix.
I think the style of mentioning all the problems in a note before the
actual description is a bit backwards.
In the case of an important note like this I think it should be right at
the top where people will see it. Not everyone reads to the end.
--
-David
david@pgmasters.net
On Fri, Mar 1, 2019 at 2:07 PM David Steele <david@pgmasters.net> wrote:
On 3/1/19 1:13 PM, Peter Eisentraut wrote:
Please follow the 1-space indentation in the documentation files.
Whoops. Will fix.
I think the style of mentioning all the problems in a note before the
actual description is a bit backwards.In the case of an important note like this I think it should be right at
the top where people will see it. Not everyone reads to the end
Maybe have the first note say "This method is deprecated bceause it has
serious risks (see bellow)" and then list the actual risks at the end?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Magnus Hagander wrote:
Maybe have the first note say "This method is deprecated bceause it has serious
risks (see bellow)" and then list the actual risks at the end?
Good idea. That may attract the attention of the dogs among the readers.
Yours,
Laurenz Albe
On 3/1/19 3:14 PM, Laurenz Albe wrote:
Magnus Hagander wrote:
Maybe have the first note say "This method is deprecated bceause it has serious
risks (see bellow)" and then list the actual risks at the end?Good idea. That may attract the attention of the dogs among the readers.
OK, here's a new version that splits the deprecation notes from the
discussion of risks. I also fixed the indentation.
Thanks,
--
-David
david@pgmasters.net
Attachments:
exclusive-backup-deprecation-doc-v03.patchtext/plain; charset=UTF-8; name=exclusive-backup-deprecation-doc-v03.patch; x-mac-creator=0; x-mac-type=0Download
From f2dafe53a5da8916c3b0390b9ebf657b931013f5 Mon Sep 17 00:00:00 2001
From: David Steele <david@pgmasters.net>
Date: Thu, 7 Mar 2019 11:30:19 +0200
Subject: [PATCH] Add exclusive backup deprecation notes to documentation.
Update the exclusive backup documentation to explain the limitations of
the exclusive backup mode and make it clear that the feature is
deprecated.
Update the log message when the backup_label is found but recovery
cannot proceed.
---
doc/src/sgml/backup.sgml | 46 +++++++++++++++++++++++++++++++
src/backend/access/transam/xlog.c | 10 +++++--
2 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..ab5c81d382 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -948,6 +948,17 @@ SELECT * FROM pg_stop_backup(false, true);
</sect3>
<sect3 id="backup-lowlevel-base-backup-exclusive">
<title>Making an exclusive low level backup</title>
+
+ <note>
+ <para>
+ The exclusive backup method is deprecated and should be avoided in favor
+ of the non-exclusive backup method, e.g.
+ <application>pg_basebackup</application>. This method is deprecated
+ because it has serious risks which are enumerated at the end of this
+ section.
+ </para>
+ </note>
+
<para>
The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of backup
@@ -1054,6 +1065,41 @@ SELECT pg_stop_backup();
</listitem>
</orderedlist>
</para>
+
+ <note>
+ <para>
+ The primary issue with the exclusive method is that the
+ <filename>backup_label</filename> file is written into the data directory
+ when <function>pg_start_backup</function> is called and remains until
+ <function>pg_stop_backup</function> is called. If
+ <productname>PostgreSQL</productname> or the host terminates abnormally,
+ then <filename>backup_label</filename> will be left in the data directory
+ and <productname>PostgreSQL</productname> probably will not start. A log
+ message recommends that <filename>backup_label</filename> be removed if
+ not restoring from a backup.
+ </para>
+
+ <para>
+ However, if <filename>backup_label</filename> is present because a restore
+ is actually in progress, then removing it will result in corruption. For
+ this reason it is not recommended to automate the removal of
+ <filename>backup_label</filename>.
+ </para>
+
+ <para>
+ Another issue with exclusive backup mode is that it will continue until
+ <function>pg_stop_backup</function> is called, even if the calling process
+ is no longer performing the backup. The next time
+ <function>pg_start_backup</function> is called it will fail unless
+ <function>pg_stop_backup</function> is called manually first.
+ </para>
+
+ <para>
+ Finally, only one exclusive backup may be run at a time. However, it is
+ possible to run an exclusive backup at the same time as any number of
+ non-exclusive backups.
+ </para>
+ </note>
</sect3>
<sect3 id="backup-lowlevel-base-backup-data">
<title>Backing up the data directory</title>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ecd12fc53a..e12a04414b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6405,14 +6405,20 @@ StartupXLOG(void)
if (!ReadRecord(xlogreader, checkPoint.redo, LOG, false))
ereport(FATAL,
(errmsg("could not find redo location referenced by checkpoint record"),
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add recovery options to \"%s/postgresql.auto.conf\".\n"
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
+ "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
+ DataDir, DataDir, DataDir, DataDir)));
}
}
else
{
ereport(FATAL,
(errmsg("could not locate required checkpoint record"),
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add recovery options to \"%s/postgresql.auto.conf\".\n"
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
+ "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
+ DataDir, DataDir, DataDir, DataDir)));
wasShutdown = false; /* keep compiler quiet */
}
--
2.17.2 (Apple Git-113)
On Thu, Mar 07, 2019 at 11:33:20AM +0200, David Steele wrote:
OK, here's a new version that splits the deprecation notes from the
discussion of risks. I also fixed the indentation.
The documentation part looks fine to me. Just one nit regarding the
error hint.
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir))); + errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add recovery options to \"%s/postgresql.auto.conf\".\n"
Here do we really want to recommend adding options to
postgresql.auto.conf? This depends a lot on the solution integration
so I think that this hint could actually confuse some users because it
implies that they kind of *have* to do so, which is not correct. I
would recommend to be a bit more generic and just use "and add
necessary recovery configuration".
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n" + "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
Fine for these two ones.
+ DataDir, DataDir, DataDir, DataDir)));
:)
--
Michael
On Thu, Mar 7, 2019 at 5:35 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Mar 07, 2019 at 11:33:20AM +0200, David Steele wrote:
OK, here's a new version that splits the deprecation notes from the
discussion of risks. I also fixed the indentation.The documentation part looks fine to me. Just one nit regarding the
error hint.- errhint("If you are not restoring from a backup, try removing the
file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch
\"%s/recovery.signal\" and add recovery options to
\"%s/postgresql.auto.conf\".\n"Here do we really want to recommend adding options to
postgresql.auto.conf? This depends a lot on the solution integration
so I think that this hint could actually confuse some users because it
implies that they kind of *have* to do so, which is not correct. I
would recommend to be a bit more generic and just use "and add
necessary recovery configuration".
Agreed, I think we should never tell people to "add recovery options to
postgresql.auto.conf". Becuase they should never do that manually. If we
want to suggest people use postgresql.auto.conf surely they should be using
ALTER SYSTEM SET. Which of course doesn't work in this case, since
postgrsql isn't running yet.
So yeah either that, or say "add to postgresql.conf" without the auto?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 2019-03-07 10:33, David Steele wrote:
On 3/1/19 3:14 PM, Laurenz Albe wrote:
Magnus Hagander wrote:
Maybe have the first note say "This method is deprecated bceause it has serious
risks (see bellow)" and then list the actual risks at the end?Good idea. That may attract the attention of the dogs among the readers.
OK, here's a new version that splits the deprecation notes from the
discussion of risks. I also fixed the indentation.
The documentation changes appear to continue the theme from the other
thread that the exclusive backup mode is terrible and everyone should
feel bad about it. I don't think there is consensus about that.
I do welcome a more precise description of the handling of backup_label
and a better hint in the error message. I think we haven't gotten to
the final shape there yet, especially for the latter. I suggest to
focus on that.
The other changes repeat points already made in nearby documentation.
I think it would be helpful to frame the documentation in a way to
suggest that the nonexclusive mode is more for automation and more
sophisticated tools and the exclusive mode is more for manual or simple
scripted use.
If we do think that the exclusive mode will be removed in PG13, then I
don't think we need further documentation changes. It already says it's
deprecated, and we don't need to justify that at length. But again, I'm
not convinced that that will happen.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 18, 2019 at 8:33 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
The documentation changes appear to continue the theme from the other
thread that the exclusive backup mode is terrible and everyone should
feel bad about it. I don't think there is consensus about that.
+1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
On 2019-03-07 10:33, David Steele wrote:
On 3/1/19 3:14 PM, Laurenz Albe wrote:
Magnus Hagander wrote:
Maybe have the first note say "This method is deprecated bceause it has serious
risks (see bellow)" and then list the actual risks at the end?Good idea. That may attract the attention of the dogs among the readers.
OK, here's a new version that splits the deprecation notes from the
discussion of risks. I also fixed the indentation.The documentation changes appear to continue the theme from the other
thread that the exclusive backup mode is terrible and everyone should
feel bad about it. I don't think there is consensus about that.
I don't view it as up for much debate. The exclusive backup mode is
quite bad.
I do welcome a more precise description of the handling of backup_label
and a better hint in the error message. I think we haven't gotten to
the final shape there yet, especially for the latter. I suggest to
focus on that.
There isn't a way to handle the backup_label in a sane way when it's
created by the server in the data directory, which is why the
non-exclusive mode explicitly doesn't do that.
I think it would be helpful to frame the documentation in a way to
suggest that the nonexclusive mode is more for automation and more
sophisticated tools and the exclusive mode is more for manual or simple
scripted use.
I don't agree with this at all, that's not the reason the two exist nor
were they ever developed with the intent that one is for the 'simple'
case and one is for the 'automated' case. Trying to wedge them into
that framework strikes me as simply trying to sweep the serious issues
under the rug and I don't agree with that- if we are going to continue
to have this, we need to make it clear what the issues are. Sadly, we
will still have users who don't actually read the docs that carefully
and get bit by the exclusive backup mode because they didn't appreciate
the issues, but we will continue to have that until we finally remove
the exclusive mode.
If we do think that the exclusive mode will be removed in PG13, then I
don't think we need further documentation changes. It already says it's
deprecated, and we don't need to justify that at length. But again, I'm
not convinced that that will happen.
There were at least a few comments made on this thread that it wasn't
made clear enough in the documentation that it's deprecated. Saying
that we already deprecated it doesn't change that and doesn't do
anything to actually address those concerns. Continuing to carry
forward these two modes makes further progress in this area difficult
and unlikely to happen, and that's disappointing.
Thanks!
Stephen
On Mon, Mar 18, 2019 at 11:13 PM Stephen Frost <sfrost@snowman.net> wrote:
I don't view it as up for much debate.
In other words, you're not willing to listen to what other people
think about this issue.
I can't say I haven't noticed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
On Mon, Mar 18, 2019 at 11:13 PM Stephen Frost <sfrost@snowman.net> wrote:
I don't view it as up for much debate.
In other words, you're not willing to listen to what other people
think about this issue.
I have listened, but unfortunately the discussion just revolves around
"oh, it isn't actually all that bad", which, no, isn't something that's
going to sway my opinion.
I suppose some might view that as being principled.
Thanks!
Stephen
Hi,
On 2019-03-18 23:35:07 -0400, Stephen Frost wrote:
* Robert Haas (robertmhaas@gmail.com) wrote:
On Mon, Mar 18, 2019 at 11:13 PM Stephen Frost <sfrost@snowman.net> wrote:
I don't view it as up for much debate.
In other words, you're not willing to listen to what other people
think about this issue.I have listened, but unfortunately the discussion just revolves around
"oh, it isn't actually all that bad", which, no, isn't something that's
going to sway my opinion.
I think you're right about the original issue. But at some point you
just gotta settle for not everyone agreeing with you (and me). Just
pushing forward hard won't achieve anything. Nobody says you need to
agree with people saying "it's not all that bad", but that doesn't mean
you should just try to push forward at full speed.
Greetings,
Andres Freund
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
On 2019-03-18 23:35:07 -0400, Stephen Frost wrote:
* Robert Haas (robertmhaas@gmail.com) wrote:
On Mon, Mar 18, 2019 at 11:13 PM Stephen Frost <sfrost@snowman.net> wrote:
I don't view it as up for much debate.
In other words, you're not willing to listen to what other people
think about this issue.I have listened, but unfortunately the discussion just revolves around
"oh, it isn't actually all that bad", which, no, isn't something that's
going to sway my opinion.I think you're right about the original issue. But at some point you
just gotta settle for not everyone agreeing with you (and me). Just
pushing forward hard won't achieve anything. Nobody says you need to
agree with people saying "it's not all that bad", but that doesn't mean
you should just try to push forward at full speed.
This thread didn't exactly pop out of nowhere.. It's a result of the
prior discussion, with the goal of at least improving the situation by
documenting the issues and trying to make it clearer that the exclusive
mode is deprecated to hopefully provide a chance that we'll actually
remove it at some point in the future as being dangerous.
That's a pretty large step back from "let's rip it out for v12", and yet
people are now pushing back against even that, which is definitely
frustrating, and saying that I'm still pushing forward 'at full speed'
definitely comes across as not really considering that this started out
with "rip it out for v12" and has now regressed back to a documentation
patch.
Thanks!
Stephen
On 3/18/19 4:33 PM, Peter Eisentraut wrote:
On 2019-03-07 10:33, David Steele wrote:
On 3/1/19 3:14 PM, Laurenz Albe wrote:
Magnus Hagander wrote:
Maybe have the first note say "This method is deprecated bceause it has serious
risks (see bellow)" and then list the actual risks at the end?Good idea. That may attract the attention of the dogs among the readers.
OK, here's a new version that splits the deprecation notes from the
discussion of risks. I also fixed the indentation.The documentation changes appear to continue the theme from the other
thread that the exclusive backup mode is terrible and everyone should
feel bad about it. I don't think there is consensus about that.
I wouldn't characterize documenting the limitations of the method as
making people feel bad about it. If you feel my language implies that
then please let me know where you see it.
I do welcome a more precise description of the handling of backup_label
and a better hint in the error message. I think we haven't gotten to
the final shape there yet, especially for the latter. I suggest to
focus on that.
I was planning to update the error message hint as Magnus and Michael
have suggested.
I'm not a fan of normalizing the documentation around backup_label, i.e.
making it seem like a perfectly normal thing that you need to manually
delete a file to get the cluster to start. This may still lead users to
script their way around the problem, with possible corruption as the result.
The other changes repeat points already made in nearby documentation.
Granted, but in this sense they are meant to concisely describe why the
feature is deprecated, rather than being instructions in a user guide.
I think it would be helpful to frame the documentation in a way to
suggest that the nonexclusive mode is more for automation and more
sophisticated tools and the exclusive mode is more for manual or simple
scripted use.
I don't think the features were developed with this in mind and I
wouldn't want to characterize them this way now. Non-exclusive mode was
developed to address the shortcomings of exclusive mode, not as an
"automate-able" version of it.
Describing any backup method as primarily "manual" in nature seems
counter-intuitive to me.
If we do think that the exclusive mode will be removed in PG13, then I
don't think we need further documentation changes. It already says it's
deprecated, and we don't need to justify that at length. But again, I'm
not convinced that that will happen.
I think we should remove it entirely in PG13, but I'm not sure if there
is enough support. I'll propose it again in the first CF and see where
it goes.
This patch was intended to be a compromise based on discussion in the
thread about removing the feature.
If this is now a bridge too far then I'm at a bit of a loss as to how to
proceed. If we water it down and normalize it then we are not achieving
the goals of the patch as I see them -- to steer users away from this
method when possible and to make it less of a shock if it goes away.
Regards,
--
-David
david@pgmasters.net
On 3/8/19 6:08 AM, Magnus Hagander wrote:
On Thu, Mar 7, 2019 at 5:35 PM Michael Paquier <michael@paquier.xyz
<mailto:michael@paquier.xyz>> wrote:On Thu, Mar 07, 2019 at 11:33:20AM +0200, David Steele wrote:
OK, here's a new version that splits the deprecation notes from the
discussion of risks. I also fixed the indentation.The documentation part looks fine to me. Just one nit regarding the
error hint.- errhint("If you are not restoring from a backup, try
removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch
\"%s/recovery.signal\" and add recovery options to
\"%s/postgresql.auto.conf\".\n"Here do we really want to recommend adding options to
postgresql.auto.conf? This depends a lot on the solution integration
so I think that this hint could actually confuse some users because it
implies that they kind of *have* to do so, which is not correct. I
would recommend to be a bit more generic and just use "and add
necessary recovery configuration".Agreed, I think we should never tell people to "add recovery options to
postgresql.auto.conf". Becuase they should never do that manually. If we
want to suggest people use postgresql.auto.conf surely they should be
using ALTER SYSTEM SET. Which of course doesn't work in this case, since
postgrsql isn't running yet.So yeah either that, or say "add to postgresql.conf" without the auto?
I went with Michael's suggestion. Attached is a new patch.
I also think we should set a flag and throw the error below this if/else
block. This is a rather large message and maintaining two copies of it
is not ideal.
Please note that there have been objections to the patch later in this
thread by Peter and Robert. I'm not very interested in watering down
the documentation changes as Peter suggests, but I think at the very
least we should commit the added hints in the error message. For many
users this error will be their first point of contact with the
backup_label issue/behavior.
Regards,
--
-David
david@pgmasters.net
Attachments:
exclusive-backup-deprecation-doc-v04.patchtext/plain; charset=UTF-8; name=exclusive-backup-deprecation-doc-v04.patch; x-mac-creator=0; x-mac-type=0Download
From 87a53d0dc0e2118c553110813209eba55174bb1e Mon Sep 17 00:00:00 2001
From: David Steele <david@pgmasters.net>
Date: Wed, 20 Mar 2019 16:16:37 +0400
Subject: [PATCH] Add exclusive backup deprecation notes to documentation.
Update the exclusive backup documentation to explain the limitations of
the exclusive backup mode and make it clear that the feature is
deprecated.
Update the log message when the backup_label is found but recovery
cannot proceed.
---
doc/src/sgml/backup.sgml | 46 +++++++++++++++++++++++++++++++
src/backend/access/transam/xlog.c | 10 +++++--
2 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..ab5c81d382 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -948,6 +948,17 @@ SELECT * FROM pg_stop_backup(false, true);
</sect3>
<sect3 id="backup-lowlevel-base-backup-exclusive">
<title>Making an exclusive low level backup</title>
+
+ <note>
+ <para>
+ The exclusive backup method is deprecated and should be avoided in favor
+ of the non-exclusive backup method, e.g.
+ <application>pg_basebackup</application>. This method is deprecated
+ because it has serious risks which are enumerated at the end of this
+ section.
+ </para>
+ </note>
+
<para>
The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of backup
@@ -1054,6 +1065,41 @@ SELECT pg_stop_backup();
</listitem>
</orderedlist>
</para>
+
+ <note>
+ <para>
+ The primary issue with the exclusive method is that the
+ <filename>backup_label</filename> file is written into the data directory
+ when <function>pg_start_backup</function> is called and remains until
+ <function>pg_stop_backup</function> is called. If
+ <productname>PostgreSQL</productname> or the host terminates abnormally,
+ then <filename>backup_label</filename> will be left in the data directory
+ and <productname>PostgreSQL</productname> probably will not start. A log
+ message recommends that <filename>backup_label</filename> be removed if
+ not restoring from a backup.
+ </para>
+
+ <para>
+ However, if <filename>backup_label</filename> is present because a restore
+ is actually in progress, then removing it will result in corruption. For
+ this reason it is not recommended to automate the removal of
+ <filename>backup_label</filename>.
+ </para>
+
+ <para>
+ Another issue with exclusive backup mode is that it will continue until
+ <function>pg_stop_backup</function> is called, even if the calling process
+ is no longer performing the backup. The next time
+ <function>pg_start_backup</function> is called it will fail unless
+ <function>pg_stop_backup</function> is called manually first.
+ </para>
+
+ <para>
+ Finally, only one exclusive backup may be run at a time. However, it is
+ possible to run an exclusive backup at the same time as any number of
+ non-exclusive backups.
+ </para>
+ </note>
</sect3>
<sect3 id="backup-lowlevel-base-backup-data">
<title>Backing up the data directory</title>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ad12ebc426..3bb1e406e1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6365,14 +6365,20 @@ StartupXLOG(void)
if (!ReadRecord(xlogreader, checkPoint.redo, LOG, false))
ereport(FATAL,
(errmsg("could not find redo location referenced by checkpoint record"),
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
+ "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
+ DataDir, DataDir, DataDir)));
}
}
else
{
ereport(FATAL,
(errmsg("could not locate required checkpoint record"),
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
+ "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
+ DataDir, DataDir, DataDir)));
wasShutdown = false; /* keep compiler quiet */
}
--
2.17.2 (Apple Git-113)
On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:
Please note that there have been objections to the patch later in this
thread by Peter and Robert. I'm not very interested in watering down the
documentation changes as Peter suggests, but I think at the very least we
should commit the added hints in the error message. For many users this
error will be their first point of contact with the backup_label
issue/behavior.
The updates of the log message do not imply anything negative as I
read them as they mention to not remove the backup_label file. So
while we don't have an agreement about the docs, the log messages may
be able to be committed? Peter? Robert?
"will result in a corruptED cluster" is more correct?
--
Michael
On Mon, Mar 18, 2019 at 1:33 PM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-07 10:33, David Steele wrote:
On 3/1/19 3:14 PM, Laurenz Albe wrote:
I think it would be helpful to frame the documentation in a way to
suggest that the nonexclusive mode is more for automation and more
sophisticated tools and the exclusive mode is more for manual or simple
scripted use.
But that would be factually incorrect and backwards, so it seems like a
terrible idea, at least when it comes to manual. If you are doing it
manually, it's a lot *easier* to do it right with the non-exclusive mode,
because you can easily keep one psql and one shell open. And that's safe.
The only real use case that has been put forward for the exclusive backup
mode is when the backups are done through a script, and that script is
limited to only use something like bash (and can't use a scripting language
like perl or python or powershell or other more advanced scripting
languages).
And I don't think exclusive mode should be suggested for "simple scripts"
either, since it's anything but -- scripts using the exclusive mode
correctly will be anything but simple. A better term there would be to
single out shellscripts, I'd suggest, if we want to single something out.
Or more generic, for "scripting languages incapable of keeping a connection
open across multiple lines" or something?
We can certainly keep it, but let's not tell people something is simple
when it's not.
If we do think that the exclusive mode will be removed in PG13, then I
don't think we need further documentation changes. It already says it's
deprecated, and we don't need to justify that at length. But again, I'm
not convinced that that will happen.
But the complaints before was that the deprecation currently in the
documentation was not enough to remove it....
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:
Please note that there have been objections to the patch later in this
thread by Peter and Robert. I'm not very interested in watering down the
documentation changes as Peter suggests, but I think at the very least we
should commit the added hints in the error message. For many users this
error will be their first point of contact with the backup_label
issue/behavior.The updates of the log message do not imply anything negative as I
read them as they mention to not remove the backup_label file. So
while we don't have an agreement about the docs, the log messages may
be able to be committed? Peter? Robert?"will result in a corruptED cluster" is more correct?
I really like the proposed changes to the ereport() text. I think the
"Be careful" hint is a really helpful way of phrasing it. I think
"corrupt" as the patch has it is slightly better than "corrupted".
Obviously, we have to make the updates for recovery.signal no matter
what, and you could argue that part should be its own commit, but I
like all of the changes.
I'm not too impressed with the documentation changes. A lot of the
information being added is already present somewhere in that very same
section. It's reasonable to revise the section so that the dangers
stand out more clearly, but it doesn't seem good to revise it in a way
that ends up duplicating the existing information. Here's my
suggestion -- ditch the note at the end of the section and make the
one at the beginning read like this:
The exclusive backup method is deprecated and should be avoided.
Prior to <productname>PostgreSQL</productname> 9.6, this was the only
low-level method available, but it is now recommended that all users
upgrade their scripts to use non-exclusive backups.
Then, revise the first paragraph like this:
The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of
backup can only be taken on a primary and does not allow concurrent
backups. Moreover, because it writes a backup_label file on the
master, it can cause the master to fail to restart automatically after
a crash. On the other hand, the erroneous removal of a backup_label
file from a backup or standby is a common mistake which can can result
in serious data corruption. If it is necessary to use this method,
the following steps may be used.
Later, where it says:
Note that if the server crashes during the backup it may not be
possible to restart until the <literal>backup_label</literal>
file has been
manually deleted from the <envar>PGDATA</envar> directory.
Change it to read:
As noted above, if the server crashes during the backup it may not be
possible to restart until the <literal>backup_label</literal> file has
been manually deleted from the <envar>PGDATA</envar> directory. Note
that it is very important to never to remove the
<literal>backup_label</literal> file when restoring a backup, because
this will result in corruption. Confusion about the circumstances
under which it is appropriate to remove this file is a common cause of
data corruption when using this method; be very certain that you
remove the file only on an existing master and never when building a
standby or restoring a backup, even if you are building a standby that
will subsequently be promoted to a new master.
I also think we should revise this thoroughly terrible advice:
If you wish to place a time limit on the execution of
<function>pg_stop_backup</function>, set an appropriate
<varname>statement_timeout</varname> value, but make note that if
<function>pg_stop_backup</function> terminates because of this your backup
may not be valid.
That seems awful, not only because it encourages people to do that
particular thing and end up leaving the server in backup mode, but
also because it doesn't clearly articulate the extreme importance of
making sure that the server is not left in backup mode. So I would
propose that we strike that text entirely and replace it with
something like:
When using exclusive backup mode, it is absolutely imperative to make
sure that <function>pg_stop_backup</function> completes successfully
at the end of the backup. Even if the backup itself fails, for
example due to lack of disk space, failure to call
<function>pg_stop_backup</function> will leave the server in backup
mode indefinitely, causing future backups to fail and increasing the
risk of a restart during a time when <literal>backup_label</literal>
exists.
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi Robert,
On 3/20/19 6:31 PM, Robert Haas wrote:
On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:
Please note that there have been objections to the patch later in this
thread by Peter and Robert. I'm not very interested in watering down the
documentation changes as Peter suggests, but I think at the very least we
should commit the added hints in the error message. For many users this
error will be their first point of contact with the backup_label
issue/behavior.The updates of the log message do not imply anything negative as I
read them as they mention to not remove the backup_label file. So
while we don't have an agreement about the docs, the log messages may
be able to be committed? Peter? Robert?"will result in a corruptED cluster" is more correct?
I really like the proposed changes to the ereport() text. I think the
"Be careful" hint is a really helpful way of phrasing it. I think
"corrupt" as the patch has it is slightly better than "corrupted".
Obviously, we have to make the updates for recovery.signal no matter
what, and you could argue that part should be its own commit, but I
like all of the changes.
Cool.
I'm not too impressed with the documentation changes. A lot of the
information being added is already present somewhere in that very same
section. It's reasonable to revise the section so that the dangers
stand out more clearly, but it doesn't seem good to revise it in a way
that ends up duplicating the existing information.
OK.
Here's my
suggestion -- ditch the note at the end of the section and make the
one at the beginning read like this:The exclusive backup method is deprecated and should be avoided.
Prior to <productname>PostgreSQL</productname> 9.6, this was the only
low-level method available, but it is now recommended that all users
upgrade their scripts to use non-exclusive backups.Then, revise the first paragraph like this:
The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of
backup can only be taken on a primary and does not allow concurrent
backups. Moreover, because it writes a backup_label file on the
master, it can cause the master to fail to restart automatically after
a crash. On the other hand, the erroneous removal of a backup_label
file from a backup or standby is a common mistake which can can result
in serious data corruption. If it is necessary to use this method,
the following steps may be used.
This works for me as I feel like the cautions here (and below) are still
strongly worded. Peter?
Later, where it says:
Note that if the server crashes during the backup it may not be
possible to restart until the <literal>backup_label</literal>
file has been
manually deleted from the <envar>PGDATA</envar> directory.Change it to read:
As noted above, if the server crashes during the backup it may not be
possible to restart until the <literal>backup_label</literal> file has
been manually deleted from the <envar>PGDATA</envar> directory. Note
that it is very important to never to remove the
<literal>backup_label</literal> file when restoring a backup, because
this will result in corruption. Confusion about the circumstances
under which it is appropriate to remove this file is a common cause of
data corruption when using this method; be very certain that you
remove the file only on an existing master and never when building a
standby or restoring a backup, even if you are building a standby that
will subsequently be promoted to a new master.
Technically we are into repetition here, but I'm certainly OK with it as
this point bears repeating.
I also think we should revise this thoroughly terrible advice:
If you wish to place a time limit on the execution of
<function>pg_stop_backup</function>, set an appropriate
<varname>statement_timeout</varname> value, but make note that if
<function>pg_stop_backup</function> terminates because of this your backup
may not be valid.That seems awful, not only because it encourages people to do that
particular thing and end up leaving the server in backup mode, but
also because it doesn't clearly articulate the extreme importance of
making sure that the server is not left in backup mode. So I would
propose that we strike that text entirely and replace it with
something like:When using exclusive backup mode, it is absolutely imperative to make
sure that <function>pg_stop_backup</function> completes successfully
at the end of the backup. Even if the backup itself fails, for
example due to lack of disk space, failure to call
<function>pg_stop_backup</function> will leave the server in backup
mode indefinitely, causing future backups to fail and increasing the
risk of a restart during a time when <literal>backup_label</literal>
exists.
It's also pretty important that exclusive backups complete successfully
since backup_label is returned from pg_stop_backup() -- the backup will
definitely be corrupt without that if there was a checkpoint during the
backup. But, yeah, leaving a backup_label around for a long time
increases the restart risks a lot.
I'll revise the patch if Peter thinks this approach looks reasonable.
Thanks,
--
-David
david@pgmasters.net
Hi Robert,
On 3/20/19 5:08 PM, David Steele wrote:
I'll revise the patch if Peter thinks this approach looks reasonable.
Hopefully Peter's silence can be interpreted as consent. Probably just
busy, though.
I used your suggestions with minor editing. After some reflection, I
agree that the inline warnings are likely to be more effective than
something at the end, at least for those working on a new implementation.
Regards,
--
-David
david@pgmasters.net
Attachments:
exclusive-backup-deprecation-doc-v05.patchtext/plain; charset=UTF-8; name=exclusive-backup-deprecation-doc-v05.patch; x-mac-creator=0; x-mac-type=0Download
From bd1754696ca503795ca69386d1d29a7347a77276 Mon Sep 17 00:00:00 2001
From: David Steele <david@pgmasters.net>
Date: Fri, 29 Mar 2019 10:12:09 +0000
Subject: [PATCH] Add exclusive backup deprecation notes to documentation.
Update the exclusive backup documentation to explain the limitations of
the exclusive backup mode and make it clear that the feature is
deprecated.
Update the log message when the backup_label is found but recovery
cannot proceed.
---
doc/src/sgml/backup.sgml | 52 +++++++++++++++++++++++--------
src/backend/access/transam/xlog.c | 10 ++++--
2 files changed, 47 insertions(+), 15 deletions(-)
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..b67da8916a 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -948,13 +948,26 @@ SELECT * FROM pg_stop_backup(false, true);
</sect3>
<sect3 id="backup-lowlevel-base-backup-exclusive">
<title>Making an exclusive low level backup</title>
+
+ <note>
+ <para>
+ The exclusive backup method is deprecated and should be avoided.
+ Prior to <productname>PostgreSQL</productname> 9.6, this was the only
+ low-level method available, but it is now recommended that all users
+ upgrade their scripts to use non-exclusive backups.
+ </para>
+ </note>
+
<para>
The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps. This type of backup
- can only be taken on a primary and does not allow concurrent backups.
- Prior to <productname>PostgreSQL</productname> 9.6, this
- was the only low-level method available, but it is now recommended that
- all users upgrade their scripts to use non-exclusive backups if possible.
+ non-exclusive one, but it differs in a few key steps. This type of
+ backup can only be taken on a primary and does not allow concurrent
+ backups. Moreover, because it writes a backup_label file on the
+ master, it can cause the master to fail to restart automatically after
+ a crash. On the other hand, the erroneous removal of a backup_label
+ file from a backup or standby is a common mistake which can can result
+ in serious data corruption. If it is necessary to use this method,
+ the following steps may be used.
</para>
<para>
<orderedlist>
@@ -1011,9 +1024,17 @@ SELECT pg_start_backup('label', true);
consider during this backup.
</para>
<para>
- Note that if the server crashes during the backup it may not be
- possible to restart until the <literal>backup_label</literal> file has been
- manually deleted from the <envar>PGDATA</envar> directory.
+ As noted above, if the server crashes during the backup it may not be
+ possible to restart until the <literal>backup_label</literal> file has
+ been manually deleted from the <envar>PGDATA</envar> directory. Note
+ that it is very important to never remove the
+ <literal>backup_label</literal> file when restoring a backup, because
+ this will result in corruption. Confusion about when it is appropriate
+ to remove this file is a common cause of data corruption when using this
+ method; be very certain that you remove the file only on an existing
+ master and never when building a standby or restoring a backup, even if
+ you are building a standby that will subsequently be promoted to a new
+ master.
</para>
</listitem>
<listitem>
@@ -1045,11 +1066,16 @@ SELECT pg_stop_backup();
If the archive process has fallen behind
because of failures of the archive command, it will keep retrying
until the archive succeeds and the backup is complete.
- If you wish to place a time limit on the execution of
- <function>pg_stop_backup</function>, set an appropriate
- <varname>statement_timeout</varname> value, but make note that if
- <function>pg_stop_backup</function> terminates because of this your backup
- may not be valid.
+ </para>
+
+ <para>
+ When using exclusive backup mode, it is absolutely imperative to ensure
+ that <function>pg_stop_backup</function> completes successfully at the
+ end of the backup. Even if the backup itself fails, for example due to
+ lack of disk space, failure to call <function>pg_stop_backup</function>
+ will leave the server in backup mode indefinitely, causing future backups
+ to fail and increasing the risk of a restart failure during the time that
+ <literal>backup_label</literal> exists.
</para>
</listitem>
</orderedlist>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 19d7911ec5..9840a55c10 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6364,14 +6364,20 @@ StartupXLOG(void)
if (!ReadRecord(xlogreader, checkPoint.redo, LOG, false))
ereport(FATAL,
(errmsg("could not find redo location referenced by checkpoint record"),
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
+ "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
+ DataDir, DataDir, DataDir)));
}
}
else
{
ereport(FATAL,
(errmsg("could not locate required checkpoint record"),
- errhint("If you are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
+ errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
+ "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
+ "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
+ DataDir, DataDir, DataDir)));
wasShutdown = false; /* keep compiler quiet */
}
--
2.17.2 (Apple Git-113)
On 2019-03-20 14:42, Magnus Hagander wrote:
But that would be factually incorrect and backwards, so it seems like a
terrible idea, at least when it comes to manual. If you are doing it
manually, it's a lot *easier* to do it right with the non-exclusive
mode, because you can easily keep one psql and one shell open. And
that's safe.
The scenario I have in mind is, a poorly maintained server, nothing
installed, can't install anything (no internet connection, license
expired), flaky network, you fear it's going to fail soon, you need to
take a backup. The simplest procedure would appear to be: start backup
mode, copy files away, stop backup mode. Anything else that involves
holding a session open over there for the whole time is way more fragile
unless proper preparations have been made (and even then). So I don't
know what you want to call that scenario, but I would feel more
comfortable having these basic tools available in a bind.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Mar 29, 2019 at 6:27 AM David Steele <david@pgmasters.net> wrote:
I used your suggestions with minor editing. After some reflection, I
agree that the inline warnings are likely to be more effective than
something at the end, at least for those working on a new implementation.
I'm glad we could agree on something. Committed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Mar 29, 2019 at 1:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 29, 2019 at 6:27 AM David Steele <david@pgmasters.net> wrote:
I used your suggestions with minor editing. After some reflection, I
agree that the inline warnings are likely to be more effective than
something at the end, at least for those working on a new implementation.I'm glad we could agree on something. Committed.
+1, thanks.
Minor nitpick:
+ backup can only be taken on a primary and does not allow concurrent
+ backups. Moreover, because it writes a backup_label file on the
+ master, it can cause the master to fail to restart automatically after
Let's be consistent in if we call it a primary or a master, at least within
the same paragraph :)
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 2019-03-29 11:27, David Steele wrote:
I'll revise the patch if Peter thinks this approach looks reasonable.
Hopefully Peter's silence can be interpreted as consent. Probably just
busy, though.I used your suggestions with minor editing. After some reflection, I
agree that the inline warnings are likely to be more effective than
something at the end, at least for those working on a new implementation.
It looks very sensible now, I think. Thanks.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/29/19 11:58 AM, Peter Eisentraut wrote:
On 2019-03-20 14:42, Magnus Hagander wrote:
But that would be factually incorrect and backwards, so it seems like a
terrible idea, at least when it comes to manual. If you are doing it
manually, it's a lot *easier* to do it right with the non-exclusive
mode, because you can easily keep one psql and one shell open. And
that's safe.The scenario I have in mind is, a poorly maintained server, nothing
installed, can't install anything (no internet connection, license
expired), flaky network, you fear it's going to fail soon, you need to
take a backup. The simplest procedure would appear to be: start backup
mode, copy files away, stop backup mode. Anything else that involves
holding a session open over there for the whole time is way more fragile
unless proper preparations have been made (and even then). So I don't
know what you want to call that scenario, but I would feel more
comfortable having these basic tools available in a bind.
I would argue the best thing in this scenario is to use pg_basebackup.
It's a solid tool and likely far better than any script the user might
cook up on the spot.
Regards,
--
-David
david@pgmasters.net
On 3/29/19 12:25 PM, Magnus Hagander wrote:
On Fri, Mar 29, 2019 at 1:19 PM Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:On Fri, Mar 29, 2019 at 6:27 AM David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:I used your suggestions with minor editing. After some reflection, I
agree that the inline warnings are likely to be more effective than
something at the end, at least for those working on a newimplementation.
I'm glad we could agree on something. Committed.
Me, too. Thanks!
Minor nitpick: + backup can only be taken on a primary and does not allow concurrent + backups. Moreover, because it writes a backup_label file on the + master, it can cause the master to fail to restart automatically afterLet's be consistent in if we call it a primary or a master, at least
within the same paragraph :)
Agreed, let's stick with "primary".
Are we planning to back-patch this? The deprecation was added to the
docs in 9.6 -- I think these clarifications would be helpful.
Thanks,
--
-David
david@pgmasters.net
On Fri, Mar 29, 2019 at 8:45 AM David Steele <david@pgmasters.net> wrote:
Are we planning to back-patch this? The deprecation was added to the
docs in 9.6 -- I think these clarifications would be helpful.
I wasn't planning too, but I guess we could consider it. I'd be more
inclined to back-patch the documentation changes than the message text
changes, but what do other people think?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 3/29/19 12:46 PM, Robert Haas wrote:
On Fri, Mar 29, 2019 at 8:45 AM David Steele <david@pgmasters.net> wrote:
Are we planning to back-patch this? The deprecation was added to the
docs in 9.6 -- I think these clarifications would be helpful.I wasn't planning too, but I guess we could consider it. I'd be more
inclined to back-patch the documentation changes than the message text
changes, but what do other people think?
I think we should definitely do the docs, I'm 50% on the message. You
could argue that is is a behavioral change, but it is pretty important info.
Regards,
--
-David
david@pgmasters.net
On Fri, Mar 29, 2019 at 1:49 PM David Steele <david@pgmasters.net> wrote:
On 3/29/19 12:46 PM, Robert Haas wrote:
On Fri, Mar 29, 2019 at 8:45 AM David Steele <david@pgmasters.net>
wrote:
Are we planning to back-patch this? The deprecation was added to the
docs in 9.6 -- I think these clarifications would be helpful.I wasn't planning too, but I guess we could consider it. I'd be more
inclined to back-patch the documentation changes than the message text
changes, but what do other people think?I think we should definitely do the docs, I'm 50% on the message. You
could argue that is is a behavioral change, but it is pretty important
info.
+1 on the docs.
Is the changes to the messages going to cause issues or weirdness for
translators? That would be a reason not to backpatch it. Without that, I'm
leaning towards backpatching it.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 2019-03-29 13:54, Magnus Hagander wrote:
Is the changes to the messages going to cause issues or weirdness for
translators? That would be a reason not to backpatch it. Without that,
I'm leaning towards backpatching it.
Note that the messages refer to recovery.signal, so a backpatch would
have to rephrase the message.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services