12.2: Why do my Redo Logs disappear??
Hi all,
this is a 12.2 Release on FreeBSD 11.3.
I am doing RedoLog Archiving according to Docs Chapter 25.1.
During the last week I have lost 4 distinct Redo Logs; they are
not in the backup.
Loosing a RedoLog is very bad, because there is no redundancy,
loosing a single one of them makes the timeline disappear and it
will only reappear after another Base Backup. Very very bad.
So I did analyze the matter.
There are three ways to restart a Berkeley machine:
1. Cowboy it (aka pull-the-plug). This makes sure that everything is
certainly dead immediately, and, given all hard- and software is
well designed, nothing bad should happen.
2. Shut it down. This is the official means, and it takes very long,
because each and every applications are signalled and given time to
bring up whatever objections they may have.
In this case "pg_ctl stop" will be invoked with whatever options the
sysop has configured, and postgres will copy out a full log into
archive before terminating.
3. Halt/Reboot it, like this:
https://www.freebsd.org/cgi/man.cgi?query=reboot&sektion=8&manpath=FreeBSD+11.3-RELEASE
This is considered more safe than pull-the-plug, and still fast.
Applications are killed without much waiting, but all disk buffers
are flushed to permanent storage and filesystems closed.
In this case, it seems, Postgres will delete the current log
without archiving it. :(
What precisely happens (according to the OS sources) during reboot
is this: processes will be sent SIGTERM, and after some 2-10
seconds followed by SIGKILL.
--------------------------------------
Lets have a closer look:
We did a regular shutdown at 17:09, and then we did a reboot at 19:24.
Here is the content of the staging area (where the logs are
copied to and accumulated until it is worth to run a backup job):
# dir arch/
total 240
drwx------ 2 postgres postgres 5 Jun 8 17:09 .
drwxr-xr-x 6 postgres postgres 7 Jun 8 17:09 ..
-rw------- 1 postgres postgres 16777216 Jun 8 09:38 0000000100000017000000FC.ok
-rw------- 1 postgres postgres 16777216 Jun 8 10:48 0000000100000017000000FD.ok
-rw------- 1 postgres postgres 16777216 Jun 8 17:09 0000000100000017000000FE.ok
And here is the pg_wal directory:
# dir data12/pg_wal/
total 89256
drwx------ 3 postgres postgres 10 Jun 8 19:28 .
drwx------ 19 postgres postgres 23 Jun 8 19:28 ..
-rw------- 1 postgres postgres 335 Jun 7 07:36 0000000100000017000000EF.00000060.backup
-rw------- 1 postgres postgres 16777216 Jun 8 19:38 000000010000001800000000
-rw------- 1 postgres postgres 16777216 Jun 7 07:17 000000010000001800000001
-rw------- 1 postgres postgres 16777216 Jun 7 07:17 000000010000001800000002
-rw------- 1 postgres postgres 16777216 Jun 7 07:17 000000010000001800000003
-rw------- 1 postgres postgres 16777216 Jun 7 07:17 000000010000001800000004
-rw------- 1 postgres postgres 16777216 Jun 7 07:36 000000010000001800000005
drwx------ 2 postgres postgres 3 Jun 8 17:09 archive_status
# dir data12/pg_wal/archive_status
total 23
drwx------ 2 postgres postgres 3 Jun 8 17:09 .
drwx------ 3 postgres postgres 10 Jun 8 19:28 ..
-rw------- 1 postgres postgres 0 Jun 7 07:36 0000000100000017000000EF.00000060.backup.done
Now where the hell is my "FF" log ???
Lets check syslog - this was the full shutdown at 17:09:
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [10-1] :[] LOG: 00000: received fast shutdown request
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [10-2] :[] LOCATION: pmdie, postmaster.c:2780
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [11-1] :[] LOG: 00000: aborting any active transactions
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [11-2] :[] LOCATION: pmdie, postmaster.c:2813
Jun 8 17:09:38 <local0.debug> admn pg-bck[73549]: [10-1] :[] DEBUG: 00000: logical replication launcher shutting down
Jun 8 17:09:38 <local0.debug> admn pg-bck[73549]: [10-2] :[] LOCATION: ProcessInterrupts, postgres.c:2981
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [12-1] :[] LOG: 00000: background worker "logical replication launcher" (PID 73549) exited with exit code 1
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [12-2] :[] LOCATION: LogChildExit, postmaster.c:3657
Jun 8 17:09:38 <local0.info> admn pg-bck[73544]: [13-1] :[] LOG: 00000: shutting down
Jun 8 17:09:38 <local0.info> admn pg-bck[73544]: [13-2] :[] LOCATION: ShutdownXLOG, xlog.c:8321
Jun 8 17:09:45 <local0.info> admn pg-bck[82223]: RedoLog.bck invoked pg_wal/0000000100000017000000FE 0000000100000017000000FE
Jun 8 17:09:45 <local0.info> admn pg-bck[82223]: RedoLog.bck pg_wal/0000000100000017000000FE 0000000100000017000000FE returns 0
Jun 8 17:09:45 <local0.debug> admn pg-bck[73547]: [8-1] :[] DEBUG: 00000: archived write-ahead log file "0000000100000017000000FE"
Jun 8 17:09:45 <local0.debug> admn pg-bck[73547]: [8-2] :[] LOCATION: pgarch_archiveXlog, pgarch.c:675
Jun 8 17:09:55 <local0.info> admn pg-bck[73544]: [14-1] :[] LOG: 00000: checkpoint starting: shutdown immediate
Jun 8 17:09:55 <local0.info> admn pg-bck[73544]: [14-2] :[] LOCATION: LogCheckpointStart, xlog.c:8362
Jun 8 17:09:55 <local0.debug> admn pg-bck[73544]: [15-1] :[] DEBUG: 00000: performing replication slot checkpoint
Jun 8 17:09:55 <local0.debug> admn pg-bck[73544]: [15-2] :[] LOCATION: CheckPointReplicationSlots, slot.c:1078
Jun 8 17:09:55 <local0.info> admn pg-bck[73544]: [16-1] :[] LOG: 00000: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=10.131 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=16383 kB
Jun 8 17:09:55 <local0.info> admn pg-bck[73544]: [16-2] :[] LOCATION: LogCheckpointEnd, xlog.c:8435
Jun 8 17:09:55 <local0.info> admn pg-bck[73534]: [13-1] :[] LOG: 00000: database system is shut down
Jun 8 17:09:55 <local0.info> admn pg-bck[73534]: [13-2] :[] LOCATION: UnlinkLockFiles, miscinit.c:859
So, "FE" was copied out okay, and we have it in the staging area.
And here is the relevant stuff from the following startup - here we
consequentially are within the "FF" log:
Jun 8 17:15:38 <local0.info> admn pg-bck[6366]: [8-1] :[] LOG: 00000: database system was shut down at 2020-06-08 17:09:55 CEST
Jun 8 17:15:38 <local0.info> admn pg-bck[6366]: [8-2] :[] LOCATION: StartupXLOG, xlog.c:6242
Jun 8 17:15:38 <local0.debug> admn pg-bck[6366]: [9-1] :[] DEBUG: 00000: checkpoint record is at 17/FF000024
Jun 8 17:15:38 <local0.debug> admn pg-bck[6366]: [9-2] :[] LOCATION: StartupXLOG, xlog.c:6532
Jun 8 17:15:38 <local0.debug> admn pg-bck[6366]: [10-1] :[] DEBUG: 00000: redo record is at 17/FF000024; shutdown true
----------------------------------------
From the fast reboot @ 19:24 noting at all is logged.
At the following startup, we see that we are still within the "FF"
log:
Jun 8 19:28:24 <local0.debug> admn pg-bck[6465]: [1-1] :[] DEBUG: 00000: registering background worker "logical replication launcher"
Jun 8 19:28:24 <local0.debug> admn pg-bck[6465]: [1-2] :[] LOCATION: RegisterBackgroundWorker, bgworker.c:854
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [2-1] :[] LOG: 00000: starting PostgreSQL 12.2 on i386-portbld-freebsd11.3, compiled by gcc9 (FreeBSD Ports Collection) 9.3.0, 32-bit
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [2-2] :[] LOCATION: PostmasterMain, postmaster.c:997
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [3-1] :[] LOG: 00000: listening on IPv4 address "0.0.0.0", port 5433
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [3-2] :[] LOCATION: StreamServerPort, pqcomm.c:590
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [4-1] :[] LOG: 00000: listening on Unix socket "/tmp/.s.PGSQL.5433"
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [4-2] :[] LOCATION: StreamServerPort, pqcomm.c:585
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [5-1] :[] LOG: 00000: ending log output to stderr
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [5-2] :[] HINT: Future log output will go to log destination "syslog".
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [5-3] :[] LOCATION: PostmasterMain, postmaster.c:1297
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [6-1] :[] LOG: XX000: could not send test message on socket for statistics collector: Permission denied
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [6-2] :[] LOCATION: pgstat_init, pgstat.c:486
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [7-1] :[] LOG: 00000: trying another address for the statistics collector
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [7-2] :[] LOCATION: pgstat_init, pgstat.c:418
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [8-1] :[] LOG: 00000: database system was interrupted; last known up at 2020-06-08 17:25:38 CEST
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [8-2] :[] LOCATION: StartupXLOG, xlog.c:6267
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [9-1] :[] DEBUG: 00000: checkpoint record is at 17/FF01BFE8
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [9-2] :[] LOCATION: StartupXLOG, xlog.c:6532
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [10-1] :[] DEBUG: 00000: redo record is at 17/FF01BFB4; shutdown false
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [10-2] :[] LOCATION: StartupXLOG, xlog.c:6609
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [11-1] :[] DEBUG: 00000: next transaction ID: 18955154; next OID: 145913
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [11-2] :[] LOCATION: StartupXLOG, xlog.c:6613
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [12-1] :[] DEBUG: 00000: next MultiXactId: 1; next MultiXactOffset: 0
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [12-2] :[] LOCATION: StartupXLOG, xlog.c:6617
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [13-1] :[] DEBUG: 00000: oldest unfrozen transaction ID: 479, in database 13777
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [13-2] :[] LOCATION: StartupXLOG, xlog.c:6620
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [14-1] :[] DEBUG: 00000: oldest MultiXactId: 1, in database 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [14-2] :[] LOCATION: StartupXLOG, xlog.c:6623
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [15-1] :[] DEBUG: 00000: commit timestamp Xid oldest/newest: 0/0
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [15-2] :[] LOCATION: StartupXLOG, xlog.c:6626
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [16-1] :[] DEBUG: 00000: transaction ID wrap limit is 2147484126, limited by database with OID 13777
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [16-2] :[] LOCATION: SetTransactionIdLimit, varsup.c:408
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [17-1] :[] DEBUG: 00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [17-2] :[] LOCATION: SetMultiXactIdLimit, multixact.c:2267
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [18-1] :[] DEBUG: 00000: starting up replication slots
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [18-2] :[] LOCATION: StartupReplicationSlots, slot.c:1114
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [19-1] :[] LOG: 00000: database system was not properly shut down; automatic recovery in progress
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [19-2] :[] LOCATION: StartupXLOG, xlog.c:6764
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [20-1] :[] DEBUG: 00000: resetting unlogged relations: cleanup 1 init 0
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [20-2] :[] LOCATION: ResetUnloggedRelations, reinit.c:55
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [21-1] :[] LOG: 00000: redo starts at 17/FF01BFB4
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [21-2] :[] LOCATION: StartupXLOG, xlog.c:7035
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [22-1] :[] LOG: 00000: redo done at 17/FF01C098
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [22-2] :[] LOCATION: StartupXLOG, xlog.c:7297
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [23-1] :[] DEBUG: 00000: resetting unlogged relations: cleanup 0 init 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [23-2] :[] LOCATION: ResetUnloggedRelations, reinit.c:55
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [24-1] :[] LOG: 00000: checkpoint starting: end-of-recovery immediate
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [24-2] :[] LOCATION: LogCheckpointStart, xlog.c:8362
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [25-1] :[] DEBUG: 00000: performing replication slot checkpoint
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [25-2] :[] LOCATION: CheckPointReplicationSlots, slot.c:1078
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [26-1] :[] LOG: 00000: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.007 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16272 kB, estimate=16272 kB
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [26-2] :[] LOCATION: LogCheckpointEnd, xlog.c:8435
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [27-1] :[] DEBUG: 00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [27-2] :[] LOCATION: SetMultiXactIdLimit, multixact.c:2267
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [28-1] :[] DEBUG: 00000: MultiXact member stop limit is now 4294914944 based on MultiXact 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [28-2] :[] LOCATION: SetOffsetVacuumLimit, multixact.c:2630
There is no further information about anything concerning the logs,
but at this point in time "FF" has disappeared. It was NOT copied out,
we can see the timestamp on the pg_wal/archive_status still being at
17:09.
Nothing except postgres is supposed to write/delete anything within
the data tree, and the RedoLog.bck script doesn't delete anything at all.
Another cute question would be: The "FE" log was successfully copied
out at 17:09:45. The checkpoint at 17:09:55 then says "1 removed" -
this should be the "FE", because the one before was written seven
hours earlier (see the "ls" above) and should by long be gone.
But then the checkpoint at 19:28:24 again says "1 removed". What was
removed there? It is unlikely to be the "FE", and the "FF" would be
currently in use - and the archive_status directory was not written
since 17:09. But the "FF" has disappeared. So what is going on here?
cheerio,
PMc
On 6/8/20 5:02 PM, Peter wrote:
Hi all,
this is a 12.2 Release on FreeBSD 11.3.I am doing RedoLog Archiving according to Docs Chapter 25.1.
There is no ReDo logging, there is WAL logging.
What docs, because section 25.1 in the Postgres docs is :
https://www.postgresql.org/docs/12/backup-dump.html
25.1. SQL Dump
During the last week I have lost 4 distinct Redo Logs; they are
not in the backup.Loosing a RedoLog is very bad, because there is no redundancy,
loosing a single one of them makes the timeline disappear and it
will only reappear after another Base Backup. Very very bad.So I did analyze the matter.
There are three ways to restart a Berkeley machine:
1. Cowboy it (aka pull-the-plug). This makes sure that everything is
certainly dead immediately, and, given all hard- and software is
well designed, nothing bad should happen.2. Shut it down. This is the official means, and it takes very long,
because each and every applications are signalled and given time to
bring up whatever objections they may have.In this case "pg_ctl stop" will be invoked with whatever options the
sysop has configured, and postgres will copy out a full log into
archive before terminating.3. Halt/Reboot it, like this:
https://www.freebsd.org/cgi/man.cgi?query=reboot&sektion=8&manpath=FreeBSD+11.3-RELEASE
This is considered more safe than pull-the-plug, and still fast.
Applications are killed without much waiting, but all disk buffers
are flushed to permanent storage and filesystems closed.In this case, it seems, Postgres will delete the current log
without archiving it. :(What precisely happens (according to the OS sources) during reboot
is this: processes will be sent SIGTERM, and after some 2-10
seconds followed by SIGKILL.
https://www.postgresql.org/docs/12/server-shutdown.html
"
Important
It is best not to use SIGKILL to shut down the server. Doing so will
prevent the server from releasing shared memory and semaphores.
Furthermore, SIGKILL kills the postgres process without letting it relay
the signal to its subprocesses, so it might be necessary to kill the
individual subprocesses by hand as well.
To terminate an individual session while allowing other sessions to
continue, use pg_terminate_backend() (see Table 9.83) or send a SIGTERM
signal to the child process associated with the session."
What is RedoLog.bck?
--------------------------------------
Lets have a closer look:
We did a regular shutdown at 17:09, and then we did a reboot at 19:24.
Here is the content of the staging area (where the logs are
copied to and accumulated until it is worth to run a backup job):# dir arch/
total 240
drwx------ 2 postgres postgres 5 Jun 8 17:09 .
drwxr-xr-x 6 postgres postgres 7 Jun 8 17:09 ..
-rw------- 1 postgres postgres 16777216 Jun 8 09:38 0000000100000017000000FC.ok
-rw------- 1 postgres postgres 16777216 Jun 8 10:48 0000000100000017000000FD.ok
-rw------- 1 postgres postgres 16777216 Jun 8 17:09 0000000100000017000000FE.okAnd here is the pg_wal directory:
# dir data12/pg_wal/
total 89256
drwx------ 3 postgres postgres 10 Jun 8 19:28 .
drwx------ 19 postgres postgres 23 Jun 8 19:28 ..
-rw------- 1 postgres postgres 335 Jun 7 07:36 0000000100000017000000EF.00000060.backup
-rw------- 1 postgres postgres 16777216 Jun 8 19:38 000000010000001800000000
-rw------- 1 postgres postgres 16777216 Jun 7 07:17 000000010000001800000001
-rw------- 1 postgres postgres 16777216 Jun 7 07:17 000000010000001800000002
-rw------- 1 postgres postgres 16777216 Jun 7 07:17 000000010000001800000003
-rw------- 1 postgres postgres 16777216 Jun 7 07:17 000000010000001800000004
-rw------- 1 postgres postgres 16777216 Jun 7 07:36 000000010000001800000005
drwx------ 2 postgres postgres 3 Jun 8 17:09 archive_status
# dir data12/pg_wal/archive_status
total 23
drwx------ 2 postgres postgres 3 Jun 8 17:09 .
drwx------ 3 postgres postgres 10 Jun 8 19:28 ..
-rw------- 1 postgres postgres 0 Jun 7 07:36 0000000100000017000000EF.00000060.backup.doneNow where the hell is my "FF" log ???
Lets check syslog - this was the full shutdown at 17:09:
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [10-1] :[] LOG: 00000: received fast shutdown request
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [10-2] :[] LOCATION: pmdie, postmaster.c:2780
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [11-1] :[] LOG: 00000: aborting any active transactions
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [11-2] :[] LOCATION: pmdie, postmaster.c:2813
Jun 8 17:09:38 <local0.debug> admn pg-bck[73549]: [10-1] :[] DEBUG: 00000: logical replication launcher shutting down
Jun 8 17:09:38 <local0.debug> admn pg-bck[73549]: [10-2] :[] LOCATION: ProcessInterrupts, postgres.c:2981
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [12-1] :[] LOG: 00000: background worker "logical replication launcher" (PID 73549) exited with exit code 1
Jun 8 17:09:38 <local0.info> admn pg-bck[73534]: [12-2] :[] LOCATION: LogChildExit, postmaster.c:3657
Jun 8 17:09:38 <local0.info> admn pg-bck[73544]: [13-1] :[] LOG: 00000: shutting down
Jun 8 17:09:38 <local0.info> admn pg-bck[73544]: [13-2] :[] LOCATION: ShutdownXLOG, xlog.c:8321
Jun 8 17:09:45 <local0.info> admn pg-bck[82223]: RedoLog.bck invoked pg_wal/0000000100000017000000FE 0000000100000017000000FE
Jun 8 17:09:45 <local0.info> admn pg-bck[82223]: RedoLog.bck pg_wal/0000000100000017000000FE 0000000100000017000000FE returns 0
Jun 8 17:09:45 <local0.debug> admn pg-bck[73547]: [8-1] :[] DEBUG: 00000: archived write-ahead log file "0000000100000017000000FE"
Jun 8 17:09:45 <local0.debug> admn pg-bck[73547]: [8-2] :[] LOCATION: pgarch_archiveXlog, pgarch.c:675
Jun 8 17:09:55 <local0.info> admn pg-bck[73544]: [14-1] :[] LOG: 00000: checkpoint starting: shutdown immediate
Jun 8 17:09:55 <local0.info> admn pg-bck[73544]: [14-2] :[] LOCATION: LogCheckpointStart, xlog.c:8362
Jun 8 17:09:55 <local0.debug> admn pg-bck[73544]: [15-1] :[] DEBUG: 00000: performing replication slot checkpoint
Jun 8 17:09:55 <local0.debug> admn pg-bck[73544]: [15-2] :[] LOCATION: CheckPointReplicationSlots, slot.c:1078
Jun 8 17:09:55 <local0.info> admn pg-bck[73544]: [16-1] :[] LOG: 00000: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=10.131 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=16383 kB
Jun 8 17:09:55 <local0.info> admn pg-bck[73544]: [16-2] :[] LOCATION: LogCheckpointEnd, xlog.c:8435
Jun 8 17:09:55 <local0.info> admn pg-bck[73534]: [13-1] :[] LOG: 00000: database system is shut down
Jun 8 17:09:55 <local0.info> admn pg-bck[73534]: [13-2] :[] LOCATION: UnlinkLockFiles, miscinit.c:859So, "FE" was copied out okay, and we have it in the staging area.
And here is the relevant stuff from the following startup - here we
consequentially are within the "FF" log:Jun 8 17:15:38 <local0.info> admn pg-bck[6366]: [8-1] :[] LOG: 00000: database system was shut down at 2020-06-08 17:09:55 CEST
Jun 8 17:15:38 <local0.info> admn pg-bck[6366]: [8-2] :[] LOCATION: StartupXLOG, xlog.c:6242
Jun 8 17:15:38 <local0.debug> admn pg-bck[6366]: [9-1] :[] DEBUG: 00000: checkpoint record is at 17/FF000024
Jun 8 17:15:38 <local0.debug> admn pg-bck[6366]: [9-2] :[] LOCATION: StartupXLOG, xlog.c:6532
Jun 8 17:15:38 <local0.debug> admn pg-bck[6366]: [10-1] :[] DEBUG: 00000: redo record is at 17/FF000024; shutdown true
----------------------------------------From the fast reboot @ 19:24 noting at all is logged.
At the following startup, we see that we are still within the "FF"
log:Jun 8 19:28:24 <local0.debug> admn pg-bck[6465]: [1-1] :[] DEBUG: 00000: registering background worker "logical replication launcher"
Jun 8 19:28:24 <local0.debug> admn pg-bck[6465]: [1-2] :[] LOCATION: RegisterBackgroundWorker, bgworker.c:854
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [2-1] :[] LOG: 00000: starting PostgreSQL 12.2 on i386-portbld-freebsd11.3, compiled by gcc9 (FreeBSD Ports Collection) 9.3.0, 32-bit
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [2-2] :[] LOCATION: PostmasterMain, postmaster.c:997
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [3-1] :[] LOG: 00000: listening on IPv4 address "0.0.0.0", port 5433
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [3-2] :[] LOCATION: StreamServerPort, pqcomm.c:590
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [4-1] :[] LOG: 00000: listening on Unix socket "/tmp/.s.PGSQL.5433"
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [4-2] :[] LOCATION: StreamServerPort, pqcomm.c:585
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [5-1] :[] LOG: 00000: ending log output to stderr
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [5-2] :[] HINT: Future log output will go to log destination "syslog".
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [5-3] :[] LOCATION: PostmasterMain, postmaster.c:1297
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [6-1] :[] LOG: XX000: could not send test message on socket for statistics collector: Permission denied
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [6-2] :[] LOCATION: pgstat_init, pgstat.c:486
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [7-1] :[] LOG: 00000: trying another address for the statistics collector
Jun 8 19:28:24 <local0.info> admn pg-bck[6465]: [7-2] :[] LOCATION: pgstat_init, pgstat.c:418
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [8-1] :[] LOG: 00000: database system was interrupted; last known up at 2020-06-08 17:25:38 CEST
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [8-2] :[] LOCATION: StartupXLOG, xlog.c:6267
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [9-1] :[] DEBUG: 00000: checkpoint record is at 17/FF01BFE8
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [9-2] :[] LOCATION: StartupXLOG, xlog.c:6532
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [10-1] :[] DEBUG: 00000: redo record is at 17/FF01BFB4; shutdown false
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [10-2] :[] LOCATION: StartupXLOG, xlog.c:6609
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [11-1] :[] DEBUG: 00000: next transaction ID: 18955154; next OID: 145913
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [11-2] :[] LOCATION: StartupXLOG, xlog.c:6613
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [12-1] :[] DEBUG: 00000: next MultiXactId: 1; next MultiXactOffset: 0
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [12-2] :[] LOCATION: StartupXLOG, xlog.c:6617
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [13-1] :[] DEBUG: 00000: oldest unfrozen transaction ID: 479, in database 13777
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [13-2] :[] LOCATION: StartupXLOG, xlog.c:6620
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [14-1] :[] DEBUG: 00000: oldest MultiXactId: 1, in database 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [14-2] :[] LOCATION: StartupXLOG, xlog.c:6623
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [15-1] :[] DEBUG: 00000: commit timestamp Xid oldest/newest: 0/0
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [15-2] :[] LOCATION: StartupXLOG, xlog.c:6626
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [16-1] :[] DEBUG: 00000: transaction ID wrap limit is 2147484126, limited by database with OID 13777
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [16-2] :[] LOCATION: SetTransactionIdLimit, varsup.c:408
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [17-1] :[] DEBUG: 00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [17-2] :[] LOCATION: SetMultiXactIdLimit, multixact.c:2267
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [18-1] :[] DEBUG: 00000: starting up replication slots
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [18-2] :[] LOCATION: StartupReplicationSlots, slot.c:1114
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [19-1] :[] LOG: 00000: database system was not properly shut down; automatic recovery in progress
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [19-2] :[] LOCATION: StartupXLOG, xlog.c:6764
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [20-1] :[] DEBUG: 00000: resetting unlogged relations: cleanup 1 init 0
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [20-2] :[] LOCATION: ResetUnloggedRelations, reinit.c:55
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [21-1] :[] LOG: 00000: redo starts at 17/FF01BFB4
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [21-2] :[] LOCATION: StartupXLOG, xlog.c:7035
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [22-1] :[] LOG: 00000: redo done at 17/FF01C098
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [22-2] :[] LOCATION: StartupXLOG, xlog.c:7297
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [23-1] :[] DEBUG: 00000: resetting unlogged relations: cleanup 0 init 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [23-2] :[] LOCATION: ResetUnloggedRelations, reinit.c:55
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [24-1] :[] LOG: 00000: checkpoint starting: end-of-recovery immediate
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [24-2] :[] LOCATION: LogCheckpointStart, xlog.c:8362
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [25-1] :[] DEBUG: 00000: performing replication slot checkpoint
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [25-2] :[] LOCATION: CheckPointReplicationSlots, slot.c:1078
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [26-1] :[] LOG: 00000: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.007 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16272 kB, estimate=16272 kB
Jun 8 19:28:24 <local0.info> admn pg-bck[6467]: [26-2] :[] LOCATION: LogCheckpointEnd, xlog.c:8435
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [27-1] :[] DEBUG: 00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [27-2] :[] LOCATION: SetMultiXactIdLimit, multixact.c:2267
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [28-1] :[] DEBUG: 00000: MultiXact member stop limit is now 4294914944 based on MultiXact 1
Jun 8 19:28:24 <local0.debug> admn pg-bck[6467]: [28-2] :[] LOCATION: SetOffsetVacuumLimit, multixact.c:2630There is no further information about anything concerning the logs,
but at this point in time "FF" has disappeared. It was NOT copied out,
we can see the timestamp on the pg_wal/archive_status still being at
17:09.Nothing except postgres is supposed to write/delete anything within
the data tree, and the RedoLog.bck script doesn't delete anything at all.Another cute question would be: The "FE" log was successfully copied
out at 17:09:45. The checkpoint at 17:09:55 then says "1 removed" -
this should be the "FE", because the one before was written seven
hours earlier (see the "ls" above) and should by long be gone.But then the checkpoint at 19:28:24 again says "1 removed". What was
removed there? It is unlikely to be the "FE", and the "FF" would be
currently in use - and the archive_status directory was not written
since 17:09. But the "FF" has disappeared. So what is going on here?cheerio,
PMc
--
Adrian Klaver
adrian.klaver@aklaver.com
On Mon, Jun 8, 2020 at 5:17 PM Peter <pmc@citylink.dinoex.sub.org> wrote:
Loosing a RedoLog is very bad, because there is no redundancy,
loosing a single one of them makes the timeline disappear and it
will only reappear after another Base Backup. Very very bad.
In this case, it seems, Postgres will delete the current log
without archiving it. :(
I strongly suspect that you were hit by the bug fixed in commit
4e87c483. You should upgrade to Postgres 12.3 ASAP, to get that fix:
"Avoid premature recycling of WAL segments during crash recovery
(Jehan-Guillaume de Rorthais)
WAL segments that become ready to be archived during crash recovery
were potentially recycled without being archived."
Sorry that you were affected by this bug -- it really sucks.
--
Peter Geoghegan
On Mon, Jun 08, 2020 at 05:40:20PM -0700, Peter Geoghegan wrote:
!
! I strongly suspect that you were hit by the bug fixed in commit
! 4e87c483. You should upgrade to Postgres 12.3 ASAP, to get that fix:
!
! "Avoid premature recycling of WAL segments during crash recovery
! (Jehan-Guillaume de Rorthais)
!
! WAL segments that become ready to be archived during crash recovery
! were potentially recycled without being archived."
Ahh, thank You so much; this is good news: if it is an already known
bug, I can close all efforts, remove the debug-switches again, stuff
the sources back into their box and relax. ;)
I did a check if I would find something about lost archiving logs, but
didn't - and then it is always possible that it's just a mistake
of mine - as I know I do make mistakes.
And now for the nitpicking part :)
On Mon, Jun 08, 2020 at 05:35:40PM -0700, Adrian Klaver wrote:
! > I am doing RedoLog Archiving according to Docs Chapter 25.1.
!
! There is no ReDo logging, there is WAL logging.
Yes I know - and i don't care. Technically they're no longer WAL when
they're going to be archived. Their purpose then becomes to redo the
transactions, and even the messages say so:
! > LOG: 00000: redo done at 17/FF01C098
! What docs, because section 25.1 in the Postgres docs is :
!
! https://www.postgresql.org/docs/12/backup-dump.html
!
! 25.1. SQL Dump
Ups, mistake of mine. Should be 25.3.1.
! https://www.postgresql.org/docs/12/server-shutdown.html
! "
! Important
!
! It is best not to use SIGKILL to shut down the server. Doing so will prevent
! the server from releasing shared memory and semaphores. Furthermore, SIGKILL
! kills the postgres process without letting it relay the signal to its
! subprocesses, so it might be necessary to kill the individual subprocesses
! by hand as well.
And which of these would be of any concern if the machine is rebooted
anyway?
I had to install new hardware, and currently I'm trying to identify
a memory exhaustion issue. This makes it necessary to reboot the full
system quite often, and I neither want to wait for orderly termination
of dozens of subsytems, nor do I want to need fsck at restart. This
would make SIGKILL/reboot the method of choice.
! What is RedoLog.bck?
The script which one has to write according to the docs' section which
would be correctly numbered 25.3.1.
cheerio,
PMc
On 6/8/20 6:38 PM, Peter wrote:
On Mon, Jun 08, 2020 at 05:40:20PM -0700, Peter Geoghegan wrote:
And now for the nitpicking part :)
On Mon, Jun 08, 2020 at 05:35:40PM -0700, Adrian Klaver wrote:
! > I am doing RedoLog Archiving according to Docs Chapter 25.1.
!
! There is no ReDo logging, there is WAL logging.Yes I know - and i don't care. Technically they're no longer WAL when
they're going to be archived. Their purpose then becomes to redo the
Not according to the section you are referring to:
25.3.1. Setting Up WAL Archiving
Redoing is the process of replaying the WAL logs.
Ups, mistake of mine. Should be 25.3.1.
And which of these would be of any concern if the machine is rebooted
anyway?I had to install new hardware, and currently I'm trying to identify
a memory exhaustion issue. This makes it necessary to reboot the full
system quite often, and I neither want to wait for orderly termination
of dozens of subsytems, nor do I want to need fsck at restart. This
would make SIGKILL/reboot the method of choice.
That is your prerogative, just not sure it is conducive to the health of
your system.
! What is RedoLog.bck?
The script which one has to write according to the docs' section which
would be correctly numbered 25.3.1.cheerio,
PMc
--
Adrian Klaver
adrian.klaver@aklaver.com
Actually, the affair had some good side: as usual I was checking
my own designs first and looking for flaws, and indeed I found one:
If you do copy out the archive logs not directly to tape, but to
some disk area for further processing, then there is an issue with
possible loss. If you do it like the docs say, with a command like
this:
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
+/mnt/server/archivedir/%f' # Unix
That "cp" is usually not synchronous. So there is the possibility
that this command terminates successfully, and reports exitcode zero
back to the Postgres, and then the Postgres will consider that log
being safely away.
But the target of the copy may not yet been written to disk. If
at that point a power loss happens, the log may become missing/damaged/
incomplete, while the database may or may not consider it done
when restarting.
Therefore, mounting such a target filesystem in all-synchronous mode
might be a good idea. (UFS: "-o sync", ZFS: "set sync=always")
cheerio,
PMc
On 6/8/20 7:33 PM, Peter wrote:
Actually, the affair had some good side: as usual I was checking
my own designs first and looking for flaws, and indeed I found one:If you do copy out the archive logs not directly to tape, but to
some disk area for further processing, then there is an issue with
possible loss. If you do it like the docs say, with a command like
this:archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
+/mnt/server/archivedir/%f' # UnixThat "cp" is usually not synchronous. So there is the possibility
that this command terminates successfully, and reports exitcode zero
back to the Postgres, and then the Postgres will consider that log
being safely away.
Which is why just following the above command in the docs is:
"(This is an example, not a recommendation, and might not work on all
platforms.) "
Generally for peace of mind folks use third party tools like:
pg_backrest(https://pgbackrest.org/),
pg_probackup(https://postgrespro.com/products/extensions/pg_probackup)
or Barman(https://www.pgbarman.org/).
as they offer safety checks for your backups.
I use pg_backrest, but it does not look promising for running on BSD:
https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html
Not sure about pg_probackup.
Barman is Python package:
http://docs.pgbarman.org/release/2.10/#installation-from-sources
But the target of the copy may not yet been written to disk. If
at that point a power loss happens, the log may become missing/damaged/
incomplete, while the database may or may not consider it done
when restarting.Therefore, mounting such a target filesystem in all-synchronous mode
might be a good idea. (UFS: "-o sync", ZFS: "set sync=always")cheerio,
PMc
--
Adrian Klaver
adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes:
On 6/8/20 7:33 PM, Peter wrote:
That "cp" is usually not synchronous. So there is the possibility
that this command terminates successfully, and reports exitcode zero
back to the Postgres, and then the Postgres will consider that log
being safely away.
Which is why just following the above command in the docs is:
"(This is an example, not a recommendation, and might not work on all
platforms.) "
Yeah. There have been discussions about changing that disclaimer to be
more forceful, because in point of fact a plain "cp" is generally not safe
enough. You need to fsync the written file, and on many filesystems you
also have to fsync the directory it's in.
Generally for peace of mind folks use third party tools like:
+1. Rolling your own archive script is seldom advisable.
regards, tom lane
Greetings,
* Adrian Klaver (adrian.klaver@aklaver.com) wrote:
I use pg_backrest, but it does not look promising for running on BSD:
https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html
That's an unfortunately ancient post, really, considering that
pgbackrest has now been fully rewritten into C, and Luca as recently as
September 2019 was saying he has it working on FreeBSD.
If folks do run into issues with pgbackrest on FreeBSD, please let us
know.
Thanks,
Stephen
On 6/9/20 4:15 AM, Stephen Frost wrote:
Greetings,
* Adrian Klaver (adrian.klaver@aklaver.com) wrote:
I use pg_backrest, but it does not look promising for running on BSD:
https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.htmlThat's an unfortunately ancient post, really, considering that
pgbackrest has now been fully rewritten into C, and Luca as recently as
September 2019 was saying he has it working on FreeBSD.
Yeah, but this:
https://github.com/pgbackrest/pgbackrest/issues/686
is not clear on whether a user can do that w/o a certain amount of hand
holding.
If folks do run into issues with pgbackrest on FreeBSD, please let us
know.Thanks,
Stephen
--
Adrian Klaver
adrian.klaver@aklaver.com
Greetings,
* Adrian Klaver (adrian.klaver@aklaver.com) wrote:
On 6/9/20 4:15 AM, Stephen Frost wrote:
* Adrian Klaver (adrian.klaver@aklaver.com) wrote:
I use pg_backrest, but it does not look promising for running on BSD:
https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.htmlThat's an unfortunately ancient post, really, considering that
pgbackrest has now been fully rewritten into C, and Luca as recently as
September 2019 was saying he has it working on FreeBSD.Yeah, but this:
https://github.com/pgbackrest/pgbackrest/issues/686
is not clear on whether a user can do that w/o a certain amount of hand
holding.
I've asked Luca to update his blog post and/or re-test on FreeBSD and
he's said he would. We've moved to using autoconf and friends, and it's
all in C now, so it really shouldn't be as much of an issue these days.
I recall someone else building on FreeBSD not long ago, but not finding
a reference to it offhand.
If folks do run into issues with pgbackrest on FreeBSD, please let us
know.
... again, this.
Thanks,
Stephen
On Mon, Jun 08, 2020 at 09:21:47PM -0700, Adrian Klaver wrote:
!
! On 6/8/20 7:33 PM, Peter wrote:
! >
! > Actually, the affair had some good side: as usual I was checking
! > my own designs first and looking for flaws, and indeed I found one:
! > If you do copy out the archive logs not directly to tape, but to
! > some disk area for further processing, then there is an issue with
! > possible loss. If you do it like the docs say, with a command like
! > this:
! > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
! > +/mnt/server/archivedir/%f' # Unix
! > That "cp" is usually not synchronous. So there is the possibility
! > that this command terminates successfully, and reports exitcode zero
! > back to the Postgres, and then the Postgres will consider that log
! > being safely away.
!
! Which is why just following the above command in the docs is:
!
! "(This is an example, not a recommendation, and might not work on all
! platforms.) "
So, what You are basically saying is: my worries are justified and
correctly founded, and this is indeed a matter that needs to be taken
care of.
Thank You.
! Generally for peace of mind folks use third party tools like:
!
! pg_backrest(https://pgbackrest.org/),
! pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or
! Barman(https://www.pgbarman.org/).
Hmja. We may on occasion have a look into these...
! I use pg_backrest, but it does not look promising for running on BSD:
! https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.html
That looks mostly like the usual things which can be fixed.
Now, for the facts: I am already using a professional backup
solution. (It is actually a "dual-use/commercial" solution, of the
kind which you can either fetch from github and use without support,
or buy with a contract or whatever and get support.)
With this professional backup solution I have already identified 28
(spell: twenty-eight) bugs, and fixed/workarounded these, until I got it
properly working.
This professional backup solution also offers support for postgres.
Sadly, it only covers postgres up to Rel.9, and that piece of software
wasn't touched in the last 6 or 7 years.
But the bigger issue there is, that backup solution needs it's
own postgres database as it's backend - and it cannot backup the
database it is using. Looks quite pointless to me, then.
So I just did it all with shell (and it wasn't many lines).
So now, as I've been thru identifying and handling all the issues in
that one backup solution, and since it is supposed to handle *all*
backup demands (and not only postgres), I will certainly not start
and go thru the same process again with one of these supposed
solutions, where 90% of the code tries to solve the same things
redundantly again, but then only for PG.
Actually, I am getting very tired of reading that something which can
easily be done within 20 lines of shell scripting, would need special
"solutions", solutions that need to be compiled, solutions that would
bring along their own fashion of interpreter, solutions that have a
lot of their own dependencies and introduce mainly one thing: new bugs.
Does nobody know anymore how to do proper systems management
scripting? Using just the basic system tools which have proven to
work for more than 50 years now!?
! Not sure about pg_probackup.
Okay, I had a -very short- look into these. Just scanning the
introductory pages.
The only really interesting thing there is the pg_probackup. These
folks seem to have found a way to do row-level incremental backups.
And pgbarman seems to have an impressive understanding of ITIL (in
case anybody bothers about that).
All these tools do only cover PG, but do that in any possible regards.
This is fine as long as you do not run any computers, and the only
application you are using is Postgres.
But, if you have other applications as well, or have computers, then
you will need a different backup solution, something that will cover
your site-wide backup demands, in a consistent fashion (think
something in the style of ADSM, or nowadays called Spectrum Protect).
And then 90% of the things offered here become superfluous, because
they are already handled site-wide. And then you will have to
consider integration of both pieces - and that will most likely be
more work and more error-prone than just writing a few adapters in
shell.
cheerio,
PMc
On Tue, Jun 09, 2020 at 01:27:20AM -0400, Tom Lane wrote:
! Adrian Klaver <adrian.klaver@aklaver.com> writes:
! > On 6/8/20 7:33 PM, Peter wrote:
! >> That "cp" is usually not synchronous. So there is the possibility
! >> that this command terminates successfully, and reports exitcode zero
! >> back to the Postgres, and then the Postgres will consider that log
! >> being safely away.
!
! > Which is why just following the above command in the docs is:
! > "(This is an example, not a recommendation, and might not work on all
! > platforms.) "
!
! Yeah. There have been discussions about changing that disclaimer to be
! more forceful, because in point of fact a plain "cp" is generally not safe
! enough. You need to fsync the written file, and on many filesystems you
! also have to fsync the directory it's in.
It certainly does not need to be "more forceful" - because this is not
about behavioural education, like training dogs, horses, or monkeys,
and neither do we entertain a BDSM studio.
What it needs instead is mention of the magic word "fsync". Because,
we already know that - we just need a reminder at the proper place.
Systems integrators are professional people. They are not in need of
more beating (spell: forceful education), only of substantial
technical hints and informations.
! > Generally for peace of mind folks use third party tools like:
!
! +1. Rolling your own archive script is seldom advisable.
Well then, using commercial solutions brings it's own problems. E.g.,
the people I happened to work for often had problems with torsion,
which happens when the solution gets longer than, say, twenty meters,
and these are walked at high speeds.
They didn't have a problem with scripting - rather the opposite, they
were happy with it and paid good money for.
cheerio,
PMc
On 6/9/20 10:55 AM, Peter wrote:
On Mon, Jun 08, 2020 at 09:21:47PM -0700, Adrian Klaver wrote:
!
! On 6/8/20 7:33 PM, Peter wrote:
! >
! > Actually, the affair had some good side: as usual I was checking
! > my own designs first and looking for flaws, and indeed I found one:
! > If you do copy out the archive logs not directly to tape, but to
! > some disk area for further processing, then there is an issue with
! > possible loss. If you do it like the docs say, with a command like
! > this:
! > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
! > +/mnt/server/archivedir/%f' # Unix
! > That "cp" is usually not synchronous. So there is the possibility
! > that this command terminates successfully, and reports exitcode zero
! > back to the Postgres, and then the Postgres will consider that log
! > being safely away.
!
! Which is why just following the above command in the docs is:
!
! "(This is an example, not a recommendation, and might not work on all
! platforms.) "So, what You are basically saying is: my worries are justified and
correctly founded, and this is indeed a matter that needs to be taken
care of.
Thank You.! Generally for peace of mind folks use third party tools like:
!
! pg_backrest(https://pgbackrest.org/),
! pg_probackup(https://postgrespro.com/products/extensions/pg_probackup) or
! Barman(https://www.pgbarman.org/).Hmja. We may on occasion have a look into these...
! I use pg_backrest, but it does not look promising for running on BSD:
! https://fluca1978.github.io/2019/03/04/pgbackrest_FreeBSD.htmlThat looks mostly like the usual things which can be fixed.
Now, for the facts: I am already using a professional backup
solution. (It is actually a "dual-use/commercial" solution, of the
kind which you can either fetch from github and use without support,
or buy with a contract or whatever and get support.)With this professional backup solution I have already identified 28
(spell: twenty-eight) bugs, and fixed/workarounded these, until I got it
properly working.This professional backup solution also offers support for postgres.
Sadly, it only covers postgres up to Rel.9, and that piece of software
wasn't touched in the last 6 or 7 years.
But the bigger issue there is, that backup solution needs it's
own postgres database as it's backend - and it cannot backup the
database it is using. Looks quite pointless to me, then.
So I just did it all with shell (and it wasn't many lines).
The backup solution is?
So now, as I've been thru identifying and handling all the issues in
that one backup solution, and since it is supposed to handle *all*
backup demands (and not only postgres), I will certainly not start
and go thru the same process again with one of these supposed
solutions, where 90% of the code tries to solve the same things
redundantly again, but then only for PG.
They are not supposed. They are in use by many people/organizations
across a wide variety of installations.
Actually, I am getting very tired of reading that something which can
easily be done within 20 lines of shell scripting, would need special
"solutions", solutions that need to be compiled, solutions that would
bring along their own fashion of interpreter, solutions that have a
lot of their own dependencies and introduce mainly one thing: new bugs.
They where developed as they could not be done in 20 lines of shell
scripting and work at a reliable level.
Fine rant below. Go forth and work your wonders.
Does nobody know anymore how to do proper systems management
scripting? Using just the basic system tools which have proven to
work for more than 50 years now!?! Not sure about pg_probackup.
Okay, I had a -very short- look into these. Just scanning the
introductory pages.The only really interesting thing there is the pg_probackup. These
folks seem to have found a way to do row-level incremental backups.And pgbarman seems to have an impressive understanding of ITIL (in
case anybody bothers about that).All these tools do only cover PG, but do that in any possible regards.
This is fine as long as you do not run any computers, and the only
application you are using is Postgres.
But, if you have other applications as well, or have computers, then
you will need a different backup solution, something that will cover
your site-wide backup demands, in a consistent fashion (think
something in the style of ADSM, or nowadays called Spectrum Protect).And then 90% of the things offered here become superfluous, because
they are already handled site-wide. And then you will have to
consider integration of both pieces - and that will most likely be
more work and more error-prone than just writing a few adapters in
shell.cheerio,
PMc
--
Adrian Klaver
adrian.klaver@aklaver.com
Greetings,
* Peter (pmc@citylink.dinoex.sub.org) wrote:
This professional backup solution also offers support for postgres.
Sadly, it only covers postgres up to Rel.9, and that piece of software
wasn't touched in the last 6 or 7 years.
Then it certainly doesn't work with the changes in v12, and probably has
other issues, as you allude to.
Actually, I am getting very tired of reading that something which can
easily be done within 20 lines of shell scripting, would need special
This is just simply false- you can't do it properly in 20 lines of shell
scripting. Sure, you can write something that has probably next to no
error checking, uses the deprecated API that'll cause your systems to
fail to start if you ever happen to have a reboot during a backup, and
has no way to provide verification that the backup was at all successful
after the fact, but that's not what I'd consider a proper solution-
instead it's one that'll end up causing you a lot of pain down the road.
Even the shell-script based solution (which I've never used and
personally wouldn't really recommend, but to each their own) called
'pitery' (available here: https://github.com/dalibo/pitrery) is
thousands of lines of code.
Does nobody know anymore how to do proper systems management
scripting? Using just the basic system tools which have proven to
work for more than 50 years now!?
I've not met anything I'd call 'proper systems management scripting'
that's 20 lines of code, shell script or not.
! Not sure about pg_probackup.
Okay, I had a -very short- look into these. Just scanning the
introductory pages.The only really interesting thing there is the pg_probackup. These
folks seem to have found a way to do row-level incremental backups.
pg_probackup doesn't do row-level incremental backups, unless I've
missed some pretty serious change in its development, but it does
provide page-level, with, as I recall, an extension that didn't get
good reception when it was posted and discussed on these mailing lists
by other PG hackers. I don't know if those concerns about it have been
addressed or not, you might ask the pg_probackup folks if you're
considering it as a solution.
This is fine as long as you do not run any computers, and the only
application you are using is Postgres.
But, if you have other applications as well, or have computers, then
you will need a different backup solution, something that will cover
your site-wide backup demands, in a consistent fashion (think
something in the style of ADSM, or nowadays called Spectrum Protect).And then 90% of the things offered here become superfluous, because
they are already handled site-wide. And then you will have to
consider integration of both pieces - and that will most likely be
more work and more error-prone than just writing a few adapters in
shell.
pgbackrest's repo can be safely backed up using the simple file-based
backup utilities that you're referring to here. I suspect some of the
other solution's backups also could be, but you'd probably want to make
sure.
PG generally isn't something that can be backed up using the simple file
based backup solutions, as you might appreciate from just considering
the number of tools written to specifically deal with the complexity of
backing up an online PG cluster.
Thanks,
Stephen
On 6/9/20 12:02 PM, Peter wrote:
On Tue, Jun 09, 2020 at 01:27:20AM -0400, Tom Lane wrote:
! Adrian Klaver <adrian.klaver@aklaver.com> writes:
! > On 6/8/20 7:33 PM, Peter wrote:
! >> That "cp" is usually not synchronous. So there is the possibility
! >> that this command terminates successfully, and reports exitcode zero
! >> back to the Postgres, and then the Postgres will consider that log
! >> being safely away.
!
! > Which is why just following the above command in the docs is:
! > "(This is an example, not a recommendation, and might not work on all
! > platforms.) "
!
! Yeah. There have been discussions about changing that disclaimer to be
! more forceful, because in point of fact a plain "cp" is generally not safe
! enough. You need to fsync the written file, and on many filesystems you
! also have to fsync the directory it's in.It certainly does not need to be "more forceful" - because this is not
about behavioural education, like training dogs, horses, or monkeys,
and neither do we entertain a BDSM studio.What it needs instead is mention of the magic word "fsync". Because,
we already know that - we just need a reminder at the proper place.Systems integrators are professional people. They are not in need of
more beating (spell: forceful education), only of substantial
technical hints and informations.! > Generally for peace of mind folks use third party tools like:
!
! +1. Rolling your own archive script is seldom advisable.Well then, using commercial solutions brings it's own problems. E.g.,
FYI, the projects Stephen and I mentioned are Open Source. I'm sure you
can get paid support for them, but you exist a higher plane then that so
you can use then for free.
the people I happened to work for often had problems with torsion,
which happens when the solution gets longer than, say, twenty meters,
and these are walked at high speeds.They didn't have a problem with scripting - rather the opposite, they
were happy with it and paid good money for.cheerio,
PMc
--
Adrian Klaver
adrian.klaver@aklaver.com
On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote:
! The backup solution is?
! Fine rant below. Go forth and work your wonders.
I don't need to, anymore. I did that, for about 20 years - people
I used to work for as a consultant (major banks and insurance shops)
would usually run Informix or Oracle. Postgres is just my own private
fancy.
On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
! * Peter (pmc@citylink.dinoex.sub.org) wrote:
! > This professional backup solution also offers support for postgres.
! > Sadly, it only covers postgres up to Rel.9, and that piece of software
! > wasn't touched in the last 6 or 7 years.
!
! Then it certainly doesn't work with the changes in v12, and probably has
! other issues, as you allude to.
Just having a look at their webpage, something seems to have been updated
recently, they now state that they have a new postgres adapter:
https://www.bareos.com/en/company_news/postgres-plugin-en1.html
Enjoy reading, and tell us what You think.
! > Actually, I am getting very tired of reading that something which can
! > easily be done within 20 lines of shell scripting, would need special
!
! This is just simply false- you can't do it properly in 20 lines of shell
! scripting.
Well, Your own docs show how to do it with a one-liner. So please
don't blame me for improving that to 20 lines.
! Sure, you can write something that has probably next to no
! error checking,
Before judging that, one should first specify what precisely is the
demand.
In a basic approach, the demand may be to get the logs out on tape in
a failsafe automated fashion without any miss, and get the data tree
out periodically, and have guaranteed that these files are untampered
as on disk.
And that can very well be done properly with an incremental filesystem
backup software plus some 20 lines of shellscript.
Now talking about doing an automated restore, or, having some menu-
driven solution, or -the worst of all- having a solution that can be
operated by morons - that's an entirely different matter.
In my understanding, backup is done via pgdump. The archive logs are
for emergencies (data corruption, desaster), only. And emergencies
would usually be handled by some professional people who know what
they have to do.
You may consider different demands, and that is also fine, but doesn't
need to concern me.
! uses the deprecated API that'll cause your systems to
! fail to start if you ever happen to have a reboot during a backup
It is highly unlikely that I did never have that happen during 15
years. So what does that mean? If I throw in a pg_start_backup('bogus'),
and then restart the cluster, it will not work anymore?
Lets see...
Clean stop/start - no issue whatsoever. (LOG: online backup mode
canceled)
kill -9 the whole flock - no issue whatsoever (Log: database system
was interrupted)
I won't pull the plug now, but that has certainly happened lots of
times in the past, and also yielded no issue whatsoever - simply
because there *never* was *any* issue whatsover with Postgres (until
I got the idea to install the relatively fresh R.12 - but that's
understandable).
So maybe this problem exists only on Windows?
And yes, I read that whole horrible discussion, and I could tear my
hair out, really, concerning the "deprecated API". I suppose You mean
the mentioning in the docs that the "exclusive low-level backup" is
somehow deprecated.
This is a very big bad. Because: normally you can run the base backup
as a strictly ordinary file-level backup in "full" mode, just as any
backup software can do it. You will simply execute the
pg_start_backup() and pg_stop_backup() commands in the before- and
after- hooks - and any backup software will offer these hooks.
But now, with the now recommended "non-exclusive low-level backup",
the task is different: now your before-hook needs to do two things
at the same time:
1. keep a socket open in order to hold the connection to postgres
(because postgres will terminate the backup when the socket is
closed), and
2. invoke exit(0) (because the actual backup will not start until
the before- hook has properly delivered a successful exit code.
And, that is not only difficult, it is impossible.
So, what has to be done instead: you need to write a separate network
daemon, with the only purpose of holding that connection to postgres
open. And that network daemon needs to handle the communication to
the backup software on one side, and to postgres on the other side.
And that network daemon then needs the full-blown feature requirements
as a fail-safe network daemon should have (and that is a LOT!), plus
it needs to handle all kinds of possible failures (like network
interruptions) in that triangle, during the backup, and properly
notify both sides of whatever might be ongoing (and that is NOT
trivial).
So yes, this is really a LOT of work. But the point is: this all is
not really necessary, because currently the stuff works fine in the
old way.
So, well, do away with the old method - but you cannot do it away
inside of rel.12 - and then I will stay with 12 for as long as
possible (and I don't think I will be the only one).
! has no way to provide verification that the backup was at all successful
It doesn't need to. Thats the main point of using file level standard
backup - if that is tested and works, then it works for the data tree
and the logs just the same. And any monitoring is also just the same.
I see no point in creating artificial complications, which then create
a necessity for individual tools to handle them, which then create a
new requirement for testing and validating all these individual tools -
as this is strictly against the original idea as Brian Kernighan
explained it: use simple and versatile tools, and combine these to
achieve the individual task.
! > The only really interesting thing there is the pg_probackup. These
! > folks seem to have found a way to do row-level incremental backups.
!
! pg_probackup doesn't do row-level incremental backups, unless I've
! missed some pretty serious change in its development, but it does
! provide page-level,
Ah, well, anyway that seems to be something significantly smaller
than the usual 1 gig table file at once.
! with, as I recall, an extension that didn't get
! good reception when it was posted and discussed on these mailing lists
! by other PG hackers. I don't know if those concerns about it have been
! addressed or not, you might ask the pg_probackup folks if you're
! considering it as a solution.
Okay, thanks. That's interesting. I was just thinking if one could
cannibalize that respective code and make it into a filter for my own
purposes. And yes, the license would allow that.
And I was thinking that it will be quite an effort to get some kind
of logical verification that this scheme does actually work properly.
I don't consider it as a solution; I consider it as a piece of
functionality that, if working properly, does actually increase the
possibilities.
! PG generally isn't something that can be backed up using the simple file
! based backup solutions, as you might appreciate from just considering
! the number of tools written to specifically deal with the complexity of
! backing up an online PG cluster.
Yes, one could assume that. But then, I would prefer well-founded
technical reasons for what exactly would not work that way, and why it
would not work that way. And there seems to be not much about that.
And in such a case I tend to trust my own understanding, similar to the
full_page_writes matter. (In 2008 I heard about ZFS, and I concluded
that if ZFS is indeed copy-on-write, and if the description of the
full_page_writes option is correct, then one could safely switch it
off and free a lot of backup space - factor 10 at that time, with some
Rel.8. And so I started to use ZFS. Nobody would confirm that at that
time, but nowadays everybody does it.)
This was actually my job as a consultant: to de-mystify technology,
and make it understandable as an arrangement of well explainable
pieces of functionality with well-deducible consequences.
But this is no longer respected today; now people are expected to
*NOT* understand the technology they handle, and instead believe
in marketing and that it all is very complicated and un-intellegible.
cheerio,
PMc
Greetings,
* Peter (pmc@citylink.dinoex.sub.org) wrote:
On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
! * Peter (pmc@citylink.dinoex.sub.org) wrote:
! > This professional backup solution also offers support for postgres.
! > Sadly, it only covers postgres up to Rel.9, and that piece of software
! > wasn't touched in the last 6 or 7 years.
!
! Then it certainly doesn't work with the changes in v12, and probably has
! other issues, as you allude to.Just having a look at their webpage, something seems to have been updated
recently, they now state that they have a new postgres adapter:https://www.bareos.com/en/company_news/postgres-plugin-en1.html
Enjoy reading, and tell us what You think.
I'm afraid I'm not particularly interested in performing a pro bono
evaluation of a commercial product, though considering they've put out a
press release with obviously broken links, I already have suspicions of
what I'd find ... (try clicking on their 'experimental/nightly' link).
A quick look at the docs also shows that it's referring to
recovery.conf, which no longer exists since v12 was released back in
September, so, yeah, isn't exactly current.
! > Actually, I am getting very tired of reading that something which can
! > easily be done within 20 lines of shell scripting, would need special
!
! This is just simply false- you can't do it properly in 20 lines of shell
! scripting.Well, Your own docs show how to do it with a one-liner. So please
don't blame me for improving that to 20 lines.
No, the documentation provides an example for the purpose of
understanding how the replacement in the command is done and explicitly
says that you probably shouldn't use that command.
! Sure, you can write something that has probably next to no
! error checking,Before judging that, one should first specify what precisely is the
demand.
I really don't need to in order to be able to judge the notion of a 20
line shell script being able to manage to perform a backup correctly.
In my understanding, backup is done via pgdump. The archive logs are
for emergencies (data corruption, desaster), only. And emergencies
would usually be handled by some professional people who know what
they have to do.
No, that's not the case. pg_dump isn't at all involved in the backups
that we're talking about here which are physical, file-level, backups.
! uses the deprecated API that'll cause your systems to
! fail to start if you ever happen to have a reboot during a backupIt is highly unlikely that I did never have that happen during 15
years. So what does that mean? If I throw in a pg_start_backup('bogus'),
and then restart the cluster, it will not work anymore?
If you perform a pg_start_backup(), have a checkpoint happen such that
older WAL is removed, and then reboot the box or kill -9 postgres, no,
it's not going to start anymore because there's going to be a
backup_label file that is telling the cluster that it needs to start
replaying WAL from an older point in time than what you've got WAL for.
Lets see...
Clean stop/start - no issue whatsoever. (LOG: online backup mode
canceled)
kill -9 the whole flock - no issue whatsoever (Log: database system
was interrupted)
I won't pull the plug now, but that has certainly happened lots of
times in the past, and also yielded no issue whatsoever - simply
because there *never* was *any* issue whatsover with Postgres (until
I got the idea to install the relatively fresh R.12 - but that's
understandable).
Being lucky really isn't what you want to bet on.
So maybe this problem exists only on Windows?
No, it's not Windows specific.
And yes, I read that whole horrible discussion, and I could tear my
hair out, really, concerning the "deprecated API". I suppose You mean
the mentioning in the docs that the "exclusive low-level backup" is
somehow deprecated.
Yes, it's deprecated specifically because of the issues outlined above.
They aren't hypothetical, they do happen, and people do get bit by them.
This is a very big bad. Because: normally you can run the base backup
as a strictly ordinary file-level backup in "full" mode, just as any
backup software can do it. You will simply execute the
pg_start_backup() and pg_stop_backup() commands in the before- and
after- hooks - and any backup software will offer these hooks.But now, with the now recommended "non-exclusive low-level backup",
the task is different: now your before-hook needs to do two things
at the same time:
1. keep a socket open in order to hold the connection to postgres
(because postgres will terminate the backup when the socket is
closed), and
2. invoke exit(0) (because the actual backup will not start until
the before- hook has properly delivered a successful exit code.
And, that is not only difficult, it is impossible.
One would imagine that if the commercial vendors wished to actually
support PG properly, they'd manage to figure out a way to do so that
doesn't involve the kind of hook scripts and poor assumptions made about
them that you're discussing here.
Considering that every single backup solution written specifically for
PG, including the shell-based ones, have managed to figure out how to
work with the new API, it hardly seems impossible for them to do so.
So yes, this is really a LOT of work. But the point is: this all is
not really necessary, because currently the stuff works fine in the
old way.
Unfortunately, no, it doesn't work fine in the general case- you might
be lucky enough to get it to work sometimes without failure, but that's
not how one designs systems, to work in the 'lucky' case and fail badly
in other cases.
So, well, do away with the old method - but you cannot do it away
inside of rel.12 - and then I will stay with 12 for as long as
possible (and I don't think I will be the only one).
You're welcome to stay with it as long as you'd like. I do hope we
finally rip it out, as was discussed before, in v13. Of course, we'll
stop supporting v12 about 5 years after we release it.
I see no point in creating artificial complications, which then create
a necessity for individual tools to handle them, which then create a
new requirement for testing and validating all these individual tools -
as this is strictly against the original idea as Brian Kernighan
explained it: use simple and versatile tools, and combine these to
achieve the individual task.
These aren't artificial complications.
! PG generally isn't something that can be backed up using the simple file
! based backup solutions, as you might appreciate from just considering
! the number of tools written to specifically deal with the complexity of
! backing up an online PG cluster.Yes, one could assume that. But then, I would prefer well-founded
technical reasons for what exactly would not work that way, and why it
would not work that way. And there seems to be not much about that.
I've explained them above, and they were explained on the thread you
evidently glanced at regarding deprecating the old API.
And in such a case I tend to trust my own understanding, similar to the
full_page_writes matter. (In 2008 I heard about ZFS, and I concluded
that if ZFS is indeed copy-on-write, and if the description of the
full_page_writes option is correct, then one could safely switch it
off and free a lot of backup space - factor 10 at that time, with some
Rel.8. And so I started to use ZFS. Nobody would confirm that at that
time, but nowadays everybody does it.)
I don't agree that 'everybody does it', nor that it's a particularly
good idea to turn off full_page_writes and depend on ZFS to magic it.
In fact, I'd suggest you go watch this PGCon talk, once it's available
later this month (from a competitor of mine, but a terribly smart
individual, so you don't need to listen to me about it)-
This was actually my job as a consultant: to de-mystify technology,
and make it understandable as an arrangement of well explainable
pieces of functionality with well-deducible consequences.
But this is no longer respected today; now people are expected to
*NOT* understand the technology they handle, and instead believe
in marketing and that it all is very complicated and un-intellegible.
Perhaps I'm wrong, but I tend to feel like I've got a pretty decent
handle on both PostgreSQL and on how file-level backups of it work.
Thanks,
Stephen
On 6/9/20 4:35 PM, Peter wrote:
On Tue, Jun 09, 2020 at 12:34:38PM -0700, Adrian Klaver wrote:
And that can very well be done properly with an incremental filesystem
backup software plus some 20 lines of shellscript.
Read the caveats here:
https://www.postgresql.org/docs/12/backup-file.html
Now talking about doing an automated restore, or, having some menu-
driven solution, or -the worst of all- having a solution that can be
operated by morons - that's an entirely different matter.In my understanding, backup is done via pgdump. The archive logs are
for emergencies (data corruption, desaster), only. And emergencies
would usually be handled by some professional people who know what
they have to do.
Read the entire section below, for why WAL's are for backup also. FYI,
if you don't properly set it up then you may not be protected for data
corruption. See PITR.
https://www.postgresql.org/docs/12/continuous-archiving.html
Postgres is used by a wide gamut of people of differing abilities, many
of who appreciate the availability of tested solutions to protect their
data as they are not morons and understand there are people who can make
their life easier.
This was actually my job as a consultant: to de-mystify technology,
and make it understandable as an arrangement of well explainable
pieces of functionality with well-deducible consequences.
Not seeing it.
But this is no longer respected today; now people are expected to
*NOT* understand the technology they handle, and instead believe
in marketing and that it all is very complicated and un-intellegible.cheerio,
PMc
--
Adrian Klaver
adrian.klaver@aklaver.com
On Tue, Jun 09, 2020 at 03:42:48PM -0400, Stephen Frost wrote:
! > And then 90% of the things offered here become superfluous, because
! > they are already handled site-wide. And then you will have to
! > consider integration of both pieces - and that will most likely be
! > more work and more error-prone than just writing a few adapters in
! > shell.
!
! pgbackrest's repo can be safely backed up using the simple file-based
! backup utilities that you're referring to here. I suspect some of the
! other solution's backups also could be, but you'd probably want to make
! sure.
What repo?? I seem to have missed that at first glance.
Are You indeed suggesting that one should have their data within
the database, where it is worked with, and then use Your tool
to copy it to some "repo" disk playground whatever area, and then
use their regular backup system to COPY IT AGAIN into their
backup/archiving system? Are You kiddin'?
Are You indeed suggesting that people should buy the amount
of disks that they use for their database AGAIN in order for Your
software to copy the stuff around?
Is this becoming a madhouse, or are You going to refund them that?
Let me tell You something: the people I used to work for, sometimes
had a problem. They had some amount of data that was created during
the day, and they had the night to write that data away to backup.
That would usually mean, four or eight of the big tapes, streaming in
parallel, fibers saturated, all night thru. And the problem usually was
that they would need a longer night. At least the math had to be done
properly.
Maybe You never encountered these, but there are surroundings where
there is no spare room for nonsense. Maybe that'S why these people
preferred to use oracle.
cheerio,
PMc