WAL Archiving and base backup
Hello all,
I've been upgrading some of our PostgreSQL instances from 9.x to 12.x
and we're using a master/slave setup based on continuous archiving.
I can read on
https://www.postgresql.org/docs/12/continuous-archiving.html 25.3.3.2.
Making an Exclusive Low-Level Backup that the exclusive method is
deprecated.
But then this leads me to the following problem:
- Non-Exclusive Low-Level Backup requires that the functions
pg_start_backup() and pg_stop_backup() be called in the *same*
connection to the database
- we use scripts to automate the creation of the backup and it's
deployment on the slave, after a server maintenance for example
- the example given in 25.3.6.1. Standalone Hot Backups still uses the
exclusive (deprecated) method
So I have this question, how to script the making of base backup for
transfer on the slave server when the two SQL functions must be called
in the same connection, in Bash for example; is this doable?
Thanks for any pointers.
Regards
--
Issa
On Wed, Jan 12, 2022 at 1:22 PM Issa Gorissen <issa-gorissen@usa.net> wrote:
So I have this question, how to script the making of base backup for
transfer on the slave server when the two SQL functions must be called
in the same connection, in Bash for example; is this doable?
Not sure if I understand the problem correctly but if you are asking how to
open a connection to the database and keep it open while doing something
else in bash, then you could use "coproc" for instance. I sometimes use
this function:
coproc psql -XAtF $'\t' service="$srv"
pg () {
local sql exp
sql="$1"
[ "$2" ] && { sql="$2"; exp="$1"; }
echo "$sql" >&${COPROC[1]}
read -u ${COPROC[0]} || return
[ "$exp" ] || return 0
[ "$REPLY" = "$exp" ] || return 64
return 0
}
And here is some usage
local TMOUT=1
pg BEGIN 'BEGIN;'
This sends a BEGIN command and expects the word BEGIN as reply.
if pg 'LOCK TABLE' '
LOCK TABLE some_table
IN SHARE ROW EXCLUSIVE MODE NOWAIT;
';
then
:
elif (( $? > 128 )); then # read timeout exceeded
die "Cannot lock some_table";
else
die "Unexpected error while locking some_table";
fi
In the example above a table is locked with NOWAIT. Bash's read timeout is
set to 1sec. If that's exceeded because the lock is not obtained, read
comes back with status>128.
unset TMOUT
pg '
SELECT coalesce(min(id), -1)
, coalesce(max(id), -1)
FROM some_table'\;
Now we want to read some data. So, TMOUT is unset. The REPLY variable will
have the answer.
IFS=$'\t' read mn mx <<<"$REPLY"
And this is how to split the reply into 2 bash variables, mn and mx.
At the end of the transaction then
pg 'COMMIT' 'COMMIT;'
And send \q to finish psql. If "set -e" mode is active, make sure to negate
the result.
# expecting read to fail after \q. Hence the negation.
! pg '\q'
In simpler cases, when you just want to push commands to psql, you can also
use this:
exec {PSQL}> >(psql ...)
Note there is a blank between the 2 >. This is important.
Then
echo >&$PSQL 'create table tf ();'
echo >&$PSQL 'drop table tf;'
Does this help?
Thx a lot. I thought about it but was not so sure about having a complex
script (compared to the very simple version when using the exclusive
backup - but this this is deprecated...).
I will test your option with the simpler version and post it back to it
can maybe land in PostgreSQL documentation.
Regards
Greetings,
* Issa Gorissen (issa-gorissen@usa.net) wrote:
Thx a lot. I thought about it but was not so sure about having a complex
script (compared to the very simple version when using the exclusive backup
- but this this is deprecated...).I will test your option with the simpler version and post it back to it can
maybe land in PostgreSQL documentation.
The PG docs show how the command works and that's it. The commands
in the docs aren't intended to be actually used in production
environments. Writing a full solution involves having a good
understanding of the PG code and how WAL archiving, backups, et al, are
done. If you're not familiar with this portion of the PG code base, I'd
strongly suggest you look at using solutions written and maintained by
folks who are.
Trying to write documentation on how to develop a complete solution
would be quite an effort and would certainly go beyond bash scripting
and likely wouldn't end up getting used anyway- those who are developing
such solutions are already reading through the actual code.
Thanks,
Stephen
On 1/14/22 12:31 PM, Stephen Frost wrote:
Greetings,
* Issa Gorissen (issa-gorissen@usa.net) wrote:
Thx a lot. I thought about it but was not so sure about having a complex
script (compared to the very simple version when using the exclusive backup
- but this this is deprecated...).I will test your option with the simpler version and post it back to it can
maybe land in PostgreSQL documentation.The PG docs show how the command works and that's it. The commands
in the docs aren't intended to be actually used in production
environments. Writing a full solution involves having a good
understanding of the PG code and how WAL archiving, backups, et al, are
done. If you're not familiar with this portion of the PG code base, I'd
strongly suggest you look at using solutions written and maintained by
folks who are.
Needing to read the PG source code to write a workable PITR recovery
solution is a serious flaw in PG documentation (and why I use PgBackRest).
The documentation of two other RDBMSs that I've worked with (Rdb/VMS and SQL
Server) are perfectly clear on how to do such backups and restores with
relatively small amounts of scripting.
Trying to write documentation on how to develop a complete solution
would be quite an effort and would certainly go beyond bash scripting
and likely wouldn't end up getting used anyway- those who are developing
such solutions are already reading through the actual code.Thanks,
Stephen
--
Angular momentum makes the world go 'round.
Greetings,
* Ron (ronljohnsonjr@gmail.com) wrote:
On 1/14/22 12:31 PM, Stephen Frost wrote:
* Issa Gorissen (issa-gorissen@usa.net) wrote:
Thx a lot. I thought about it but was not so sure about having a complex
script (compared to the very simple version when using the exclusive backup
- but this this is deprecated...).I will test your option with the simpler version and post it back to it can
maybe land in PostgreSQL documentation.The PG docs show how the command works and that's it. The commands
in the docs aren't intended to be actually used in production
environments. Writing a full solution involves having a good
understanding of the PG code and how WAL archiving, backups, et al, are
done. If you're not familiar with this portion of the PG code base, I'd
strongly suggest you look at using solutions written and maintained by
folks who are.Needing to read the PG source code to write a workable PITR recovery
solution is a serious flaw in PG documentation (and why I use PgBackRest).
I disagree that it's a flaw in the documentation- it's an unfortunate
reality of the current core code. We shouldn't be trying to provide
documentation around how to write a tool like pgbackrest, we should,
instead, have a tool like pgbackrest in core with its own documentation,
as most other RDBMS's do.
The documentation of two other RDBMSs that I've worked with (Rdb/VMS and SQL
Server) are perfectly clear on how to do such backups and restores with
relatively small amounts of scripting.
... using tools which are purpose built to the task, no?
Thanks,
Stephen
On 1/14/22 1:40 PM, Stephen Frost wrote:
[snip]
We shouldn't be trying to provide
documentation around how to write a tool like pgbackrest, we should,
instead, have a tool like pgbackrest in core with its own documentation,
as most other RDBMS's do.
That's an excellent solution to this problem.
--
Angular momentum makes the world go 'round.
On Fri, Jan 14, 2022 at 1:48 PM Ron <ronljohnsonjr@gmail.com> wrote:
On 1/14/22 1:40 PM, Stephen Frost wrote:
[snip]We shouldn't be trying to provide
documentation around how to write a tool like pgbackrest, we should,
instead, have a tool like pgbackrest in core with its own documentation,
as most other RDBMS's do.That's an excellent solution to this problem.
I still don't really understand what is so great about it. About its only
redeeming feature is a declaration that "it is in core" and that newcomers
can just default to it without thinking. I'd rather just play favorites
and write "use pgbackrest" in our documentation. Or some hybrid approach
where we don't just pick one but instead guide people to the community
solutions that are out there. I don't think I really want the people
responsible for core to spend time on writing end-user backup tooling.
Their time is much more valuably spent working on the core product.
David J.
On 1/14/22 1:40 PM, Stephen Frost wrote:
snip]We shouldn't be trying to provide
documentation around how to write a tool like pgbackrest, we should,
instead, have a tool like pgbackrest in core with its own documentation,
as most other RDBMS's do.
That's an excellent solution to this problem.
While I know the reasons for not having something like pgbackrest in core, I think this is a major drawback for the project. People coming from commercial systems are used to have that tooling included by default.
+1 for having that in core
Regards
Daniel
On 1/14/22 13:04, Daniel Westermann (DWE) wrote:
On 1/14/22 1:40 PM, Stephen Frost wrote:
snip]We shouldn't be trying to provide
documentation around how to write a tool like pgbackrest, we should,
instead, have a tool like pgbackrest in core with its own documentation,
as most other RDBMS's do.That's an excellent solution to this problem.
While I know the reasons for not having something like pgbackrest in core, I think this is a major drawback for the project. People coming from commercial systems are used to have that tooling included by default.
They are also used to paying a big whopping bill, which is why they go
looking for alternatives. So like most things is it a tradeoff, sweat
equity vs hard cash.
+1 for having that in core
Regards
Daniel
--
Adrian Klaver
adrian.klaver@aklaver.com
On 1/14/22 3:42 PM, Adrian Klaver wrote:
On 1/14/22 13:04, Daniel Westermann (DWE) wrote:
On 1/14/22 1:40 PM, Stephen Frost wrote:
snip]We shouldn't be trying to provide
documentation around how to write a tool like pgbackrest, we should,
instead, have a tool like pgbackrest in core with its own documentation,
as most other RDBMS's do.That's an excellent solution to this problem.
While I know the reasons for not having something like pgbackrest in
core, I think this is a major drawback for the project. People coming
from commercial systems are used to have that tooling included by default.They are also used to paying a big whopping bill, which is why they go
looking for alternatives. So like most things is it a tradeoff, sweat
equity vs hard cash.
David Johnston's comment about the docs saying "use pgbackrest or barman,
etc" is a good compromise.
--
Angular momentum makes the world go 'round.
On 1/14/22 16:00, David G. Johnston wrote:
I still don't really understand what is so great about it. About its
only redeeming feature is a declaration that "it is in core" and that
newcomers can just default to it without thinking. I'd rather just
play favorites and write "use pgbackrest" in our documentation. Or
some hybrid approach where we don't just pick one but instead guide
people to the community solutions that are out there. I don't think I
really want the people responsible for core to spend time on writing
end-user backup tooling. Their time is much more valuably spent
working on the core product.David J.
Well, the "without thinking" part of your post can be rephrased as "ease
of use". Do database administrators really need to think about which
backup software to use? What kind of knowledge will such an evaluation
provide? All commercial databases have some form of backup software
included into the core database. After all, backup and restore are
extremely important functions which IMHO should be provided along with
the database software.
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
On 1/15/22 16:23, Mladen Gogala wrote:
On 1/14/22 16:00, David G. Johnston wrote:
I still don't really understand what is so great about it. About its
only redeeming feature is a declaration that "it is in core" and that
newcomers can just default to it without thinking. I'd rather just
play favorites and write "use pgbackrest" in our documentation. Or
some hybrid approach where we don't just pick one but instead guide
people to the community solutions that are out there. I don't think I
really want the people responsible for core to spend time on writing
end-user backup tooling. Their time is much more valuably spent
working on the core product.David J.
Well, the "without thinking" part of your post can be rephrased as "ease
of use". Do database administrators really need to think about which
backup software to use? What kind of knowledge will such an evaluation
provide? All commercial databases have some form of backup software
included into the core database. After all, backup and restore are
extremely important functions which IMHO should be provided along with
the database software.
That is easy to say the reality is:
Previous CommitFest:
https://commitfest.postgresql.org/35/
Note the number of 'Moved to next CF'
Current CommitFest ending 1/31/2022:
https://commitfest.postgresql.org/36/
Note number of 'Needs review'
This is for the current Core software. Adding more to that will just add
to the above 'Moved to next CF' and 'Needs review'.
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
--
Adrian Klaver
adrian.klaver@aklaver.com
On Sat, Jan 15, 2022 at 5:23 PM Mladen Gogala <gogala.mladen@gmail.com>
wrote:
On 1/14/22 16:00, David G. Johnston wrote:
I still don't really understand what is so great about it. About its only
redeeming feature is a declaration that "it is in core" and that newcomers
can just default to it without thinking. I'd rather just play favorites
and write "use pgbackrest" in our documentation. Or some hybrid approach
where we don't just pick one but instead guide people to the community
solutions that are out there. I don't think I really want the people
responsible for core to spend time on writing end-user backup tooling.
Their time is much more valuably spent working on the core product.David J.
Well, the "without thinking" part of your post can be rephrased as "ease
of use". Do database administrators really need to think about which backup
software to use? What kind of knowledge will such an evaluation provide?
All commercial databases have some form of backup software included into
the core database. After all, backup and restore are extremely important
functions which IMHO should be provided along with the database software.
I suppose I'm being a bit pessimistic here since if we didn't provide some
features that users wanted, and an external tool did, they would probably
think about the lack and realize that their needs would be better served by
a third-party tool.
In fact we already offer pg_dump/pg_restore and pg_basebackup (which seems
like it should have a matching "pg_baserestore" command...) in core. These
easy-to-use tools give the DBA the ability to backup their system. So the
claim that we lack such tooling is simply incorrect.
An additional consideration is that this kind of application would not
benefit from having the same release policies as the core server. It
probably shouldn't integrate with our existing commitfest site, buildfarm,
etc... It is probably best done as a GUI tool which the project has never
produced. It already has serious competition which means our corporate
sponsors probably won't be using it for their clients and thus will not be
incentivized to contribute to its development. I could go on, but what's
the point?
Spending a tiny fraction of the time it would take to develop an in-core
interactive backup and restore application gathering up user questions and
complaints and proposing patches to the documentation and Wiki would be a
better use of time in service to our end users. People make these things
work so it isn't a lack of solutions but a lack of knowledge for the
inexperienced and part-time DBA crowd. We can and should work toward
improving the learning curve experience - and such an endeavor can be done
with very little knowledge of PostgreSQL internals and thus is an excellent
way for people who don't code C and aren't in a position to code and review
core patches to contribute to the project and the broader community.
David J.
David J.
On Fri, Jan 14, 2022 at 11:31 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Issa Gorissen (issa-gorissen@usa.net) wrote:
Thx a lot. I thought about it but was not so sure about having a complex
script (compared to the very simple version when using the exclusivebackup
- but this this is deprecated...).
I will test your option with the simpler version and post it back to it
can
maybe land in PostgreSQL documentation.
The PG docs show how the command works and that's it. The commands
in the docs aren't intended to be actually used in production
environments. Writing a full solution involves having a good
understanding of the PG code and how WAL archiving, backups, et al, are
done.
For all my suggestions of "use third-party where possible" I do think that
we should have, probably as part of pg_basebackup, a mode that performs a
filesystem copy of WAL to an archive location and verifies that the archive
is valid otherwise it write to the PostgreSQL log that there was a
problem. pg_basebackup should either be able to install that command (with
maybe some CLI prompts for settings or something) using ALTER SYSTEM. Our
documentation can then demonstrate the example usage of the archive_command
parameters as being a generic form of that command.
i.e., pg_backbackup --wal-archive-location /path/to/dir/ --wal_to_archive $1
David J.
On 1/16/22 01:23, Mladen Gogala wrote:
On 1/14/22 16:00, David G. Johnston wrote:
I still don't really understand what is so great about it. About its
only redeeming feature is a declaration that "it is in core" and that
newcomers can just default to it without thinking. I'd rather just
play favorites and write "use pgbackrest" in our documentation. Or
some hybrid approach where we don't just pick one but instead guide
people to the community solutions that are out there. I don't think I
really want the people responsible for core to spend time on writing
end-user backup tooling. Their time is much more valuably spent
working on the core product.David J.
Well, the "without thinking" part of your post can be rephrased as "ease
of use". Do database administrators really need to think about which
backup software to use? What kind of knowledge will such an evaluation
provide? All commercial databases have some form of backup software
included into the core database. After all, backup and restore are
extremely important functions which IMHO should be provided along with
the database software.--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
Just to avoid any misunderstanding. I am perfectly happy using the
backup/restore with pg_dump and we use it for at least a decade with
success for our need of backups.
My question is really in the context of WAL archiving and preparing a
slave instance with a base 'non-exclusive' backup from a script (to be
precise, on reboot of the server running the slave instance) since the
exclusive way is deprecated.
Thx for all your valuable comments
--
Issa
On 1/16/22 13:12, Issa Gorissen wrote:
Just to avoid any misunderstanding. I am perfectly happy using the
backup/restore with pg_dump and we use it for at least a decade with
success for our need of backups.
I am using pgbackrest. I used to use pg_rman, being well acquainted with
rman backup tool belonging to another database system, but pg_rman was
removing WAL archives before they were delivered to replica. I have
replication slots from the OLTP database to the DW database with
partitions, hash and bloom indexes and massive amount of processors, for
parallel query. I need the replication to work. In addition to that,
pgbackrest can employ parallelism while pg_rman cannot do that. I find
pgbackrest by far the best Postgres backup system.
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
Greetings,
* David G. Johnston (david.g.johnston@gmail.com) wrote:
On Fri, Jan 14, 2022 at 11:31 AM Stephen Frost <sfrost@snowman.net> wrote:
* Issa Gorissen (issa-gorissen@usa.net) wrote:
Thx a lot. I thought about it but was not so sure about having a complex
script (compared to the very simple version when using the exclusivebackup
- but this this is deprecated...).
I will test your option with the simpler version and post it back to it
can
maybe land in PostgreSQL documentation.
The PG docs show how the command works and that's it. The commands
in the docs aren't intended to be actually used in production
environments. Writing a full solution involves having a good
understanding of the PG code and how WAL archiving, backups, et al, are
done.For all my suggestions of "use third-party where possible" I do think that
we should have, probably as part of pg_basebackup, a mode that performs a
filesystem copy of WAL to an archive location and verifies that the archive
is valid otherwise it write to the PostgreSQL log that there was a
problem. pg_basebackup should either be able to install that command (with
maybe some CLI prompts for settings or something) using ALTER SYSTEM. Our
documentation can then demonstrate the example usage of the archive_command
parameters as being a generic form of that command.
We already have pg_receivewal, which is part of pg_basebackup, and is
able to use a slot and such. I'm not sure that making pg_basebackup
somehow also work as an archive command makes much sense- it's really
intended to be a tool that's used remotely and that isn't going to work
when being called out of archive_command.
That is, if I'm on system B and doing pg_basebackup against system A,
all of which happens using the PG replication protocol, how would
pg_basebackup called out of archive_command be able to get the WAL over
to system B..?
Thanks,
Stephen
On Tue, Jan 18, 2022 at 10:53 AM Stephen Frost <sfrost@snowman.net> wrote:
We already have pg_receivewal, which is part of pg_basebackup, and is
able to use a slot and such. I'm not sure that making pg_basebackup
somehow also work as an archive command makes much sense
I suppose my proposal should have been:
Create and document a new "PostgreSQL Server Application" [1] and name it:
pg_archive_wal
Advise people to set their archive_command to "pg_archive_wal
--path=/location/of/archive %p/%f
1. https://www.postgresql.org/docs/current/reference-server.html
Having pg_basebackup still prompt for permission to add that command to the
system via ALTER SYSTEM (probably will some other logic) seems doable.
Having created a base backup one still must decide on a wal archiving
strategy. There appear to be two options, though as far as I can tell if
one simply reads the documentation regarding backups they will not discover
the pg_receivewal option. I, not knowing of that option myself, have been
operating under the assumption that if one uses pg_basebackup that one
would be required to setup an archive_command as well.
The superior option is having a persistently running pg_receivewal command
on a server. As noted above, the documentation does not do this option
justice.
The alternative option is to set archive_command; which at present is also
poorly documented. My proposal above simply tries to improve on this. And
while that is a good and easy starting point if there is agreement on
pg_receivewal being a superior archiving option (leaving archive_command
unset) reworking the documentation to guide the inexperience PostgreSQL DBA
toward a "minimal but effective" backup procedure is needed.
David J.
Greetings,
* David G. Johnston (david.g.johnston@gmail.com) wrote:
On Tue, Jan 18, 2022 at 10:53 AM Stephen Frost <sfrost@snowman.net> wrote:
We already have pg_receivewal, which is part of pg_basebackup, and is
able to use a slot and such. I'm not sure that making pg_basebackup
somehow also work as an archive command makes much senseI suppose my proposal should have been:
Create and document a new "PostgreSQL Server Application" [1] and name it:
pg_archive_wal
Advise people to set their archive_command to "pg_archive_wal
--path=/location/of/archive %p/%f1. https://www.postgresql.org/docs/current/reference-server.html
Or just link to a proper solution that already exists and includes
support for object stores, multiple repositories, checksums all files
and stores them in a manifest to validate them ...
Having pg_basebackup still prompt for permission to add that command to the
system via ALTER SYSTEM (probably will some other logic) seems doable.
It doesn't seem to follow, to me anyway, that if you're using
pg_basebackup that you'd want to use this proposed pg_archive_wal. Much
more likely would be that you'd want to use pg_receivewal, which doesn't
need any archive command to be set. Seems like a feature in need of a
use case is what I'm getting at here.
Having created a base backup one still must decide on a wal archiving
strategy. There appear to be two options, though as far as I can tell if
one simply reads the documentation regarding backups they will not discover
the pg_receivewal option. I, not knowing of that option myself, have been
operating under the assumption that if one uses pg_basebackup that one
would be required to setup an archive_command as well.
Adding a reference to pg_receivewal under the continuous archiving
section would probably make sense, similar to how pg_basebackup is
referenced from the base backup section. Might even make sense to have
a 'low level API' section for WAL archiving which mentions
archive_command ... or maybe not and rip out the existing 'low level
API' section as it really isn't nearly detailed enough for someone to be
able to write a proper tool and having it there implies that it does
provide that.
The superior option is having a persistently running pg_receivewal command
on a server. As noted above, the documentation does not do this option
justice.
Not sure that I'd say that it's the superior option, but it depends on
the options that are being considered and if you're limiting those to
"just what exists in core."
The alternative option is to set archive_command; which at present is also
poorly documented. My proposal above simply tries to improve on this. And
while that is a good and easy starting point if there is agreement on
pg_receivewal being a superior archiving option (leaving archive_command
unset) reworking the documentation to guide the inexperience PostgreSQL DBA
toward a "minimal but effective" backup procedure is needed.
Adding a new PG server application isn't exactly what I'd call just a
simple improvement to the documentation ...
The existing documentation for base backups does, in fact, link to
pg_basebackup to guide the new DBA to a solution for base backups that's
minimal but effective. Doing PITR is getting beyond just the minimal,
but even so, linking to pg_receivewal in a similar manner probably does
make sense since we link to pg_basebackup, so I think I agree with you
about that specific change. Hopefully, we can remove the long
deprecated exclusive backup option and further simplify the backup
documentation.
Thanks,
Stephen