PATCH: Exclude unlogged tables from base backups
Including unlogged relations in base backups takes up space and is
wasteful since they are truncated during backup recovery.
The attached patches exclude unlogged relations from base backups except
for the init fork, which is required to recreate the main fork during
recovery.
* exclude-unlogged-v1-01.patch
Some refactoring of reinit.c was required to reduce code duplication but
the coverage report showed that most of the interesting parts of
reinit.c were not being tested. This patch adds coverage for reinit.c.
* exclude-unlogged-v1-02.patch
Refactor reinit.c to allow other modules to identify and work with
unlogged relation forks.
* exclude-unlogged-v1-03.patch
Exclude unlogged relation forks (except init) from pg_basebackup to save
space (and time).
I decided not to try and document unlogged exclusions in the continuous
backup documentation yet (they are noted in the protocol docs). I would
like to get some input on whether the community thinks this is a good
idea. It's a non-trivial procedure that would be easy to misunderstand
and does not affect the quality of the backup other than using less
space. Thoughts?
I'll add these patches to the next CF.
--
-David
david@pgmasters.net
Attachments:
exclude-unlogged-v1-01.patchtext/plain; charset=UTF-8; name=exclude-unlogged-v1-01.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/test/recovery/t/014_unlogged_reinit.pl b/src/test/recovery/t/014_unlogged_reinit.pl
new file mode 100644
index 0000000000..35feba69a0
--- /dev/null
+++ b/src/test/recovery/t/014_unlogged_reinit.pl
@@ -0,0 +1,117 @@
+# Tests that unlogged tables are properly reinitialized after a crash.
+#
+# The behavior should be the same when restoring from a backup but that is not
+# tested here (yet).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Initialize node without replication settings
+my $node = get_new_node('main');
+
+$node->init;
+$node->start;
+my $pgdata = $node->data_dir;
+
+# Create an unlogged table to test that forks other than init are not copied
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+my $tablespaceDir = undef;
+my $ts1UnloggedPath = undef;
+
+SKIP:
+{
+ skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+ # Create unlogged tables in a tablespace
+ $tablespaceDir = TestLib::tempdir . "/ts1";
+
+ mkdir($tablespaceDir)
+ or die "unable to mkdir \"$tablespaceDir\"";
+
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE ts1 LOCATION '$tablespaceDir'");
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE ts1_unlogged (id int) TABLESPACE ts1');
+
+ $ts1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('ts1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+ ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+}
+
+# Crash the postmaster
+$node->stop('immediate');
+
+# Write forks to test that they are removed during recovery
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_vm"],
+ 'touch vm fork in base');
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_fsm"],
+ 'touch fsm fork in base');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${baseUnloggedPath}")
+ or die "unable to remove \"${baseUnloggedPath}\"";
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+ # Write forks to test that they are removed by recovery
+ $node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_vm"],
+ 'touch vm fork in tablespace');
+ $node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_fsm"],
+ 'touch fsm fork in tablespace');
+
+ # Remove main fork to test that it is recopied from init
+ unlink("$pgdata/${ts1UnloggedPath}")
+ or die "unable to remove \"${ts1UnloggedPath}\"";
+}
+
+# Start the postmaster
+$node->start;
+
+# Check unlogged table in base
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork not in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_fsm", 'fsm fork not in base');
+
+# Drop unlogged table
+$node->safe_psql('postgres', 'DROP TABLE base_unlogged');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 4 if ($windows_os);
+
+ # Check unlogged table in tablespace
+ ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+ ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+ ok(!-f "$pgdata/${ts1UnloggedPath}_vm", 'vm fork not in tablespace');
+ ok(!-f "$pgdata/${ts1UnloggedPath}_fsm", 'fsm fork not in tablespace');
+
+ # Drop unlogged table
+ $node->safe_psql('postgres', 'DROP TABLE ts1_unlogged');
+
+ # Drop tablespace
+ $node->safe_psql('postgres', 'DROP TABLESPACE ts1');
+ rmdir($tablespaceDir)
+ or die "unable to rmdir \"$tablespaceDir\"";
+}
exclude-unlogged-v1-02.patchtext/plain; charset=UTF-8; name=exclude-unlogged-v1-02.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 00678cb182..0c2568e3f5 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -21,7 +21,6 @@
#include "storage/copydir.h"
#include "storage/fd.h"
#include "storage/reinit.h"
-#include "utils/hsearch.h"
#include "utils/memutils.h"
static void ResetUnloggedRelationsInTablespaceDir(const char *tsdirname,
@@ -146,6 +145,99 @@ ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
}
/*
+ * Find all unlogged relations in the specified directory and return their OIDs.
+ *
+ * It's possible that someone could create a ton of unlogged relations in the
+ * same database & tablespace, so we'd better use a hash table rather than an
+ * array or linked list to keep track of which files need to be reset.
+ * Otherwise, search operations would be O(n^2).
+ */
+HTAB *
+ResetUnloggedRelationsHash(const char *dbspacedirname)
+{
+ DIR *dbspace_dir;
+ struct dirent *de;
+ HTAB *hash = NULL;
+
+ /* Scan the directory. */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ unlogged_relation_entry ent;
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* Also skip it unless this is the init fork. */
+ if (forkNum != INIT_FORKNUM)
+ continue;
+
+ /* Create the hash table if it has not been created already. */
+ if (!hash)
+ {
+ HASHCTL ctl;
+
+ memset(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(unlogged_relation_entry);
+ ctl.entrysize = sizeof(unlogged_relation_entry);
+ hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
+ }
+
+ /*
+ * Put the OID portion of the name into the hash table, if it
+ * isn't already.
+ */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, de->d_name, oidchars);
+ hash_search(hash, &ent, HASH_ENTER, NULL);
+ }
+
+ /* Done with the first pass. */
+ FreeDir(dbspace_dir);
+
+ return hash;
+}
+
+/*
+ * Determine whether the specified file is an unlogged relation fork.
+ *
+ * If not an unlogged relation then return notUnlogged, otherwise return
+ * unloggedInit if an unlogged init fork and unloggedOther if any other unlogged
+ * fork.
+ */
+UnloggedRelationFork
+ResetUnloggedRelationsMatch(HTAB *hash, const char *file)
+{
+ ForkNumber forkNum;
+ int oidchars;
+ bool found;
+ unlogged_relation_entry ent;
+
+ /* If it's not a relation then it's not unlogged. */
+ if (!parse_filename_for_nontemp_relation(file, &oidchars, &forkNum))
+ return notUnlogged;
+
+ /* An unlogged init fork. */
+ if (forkNum == INIT_FORKNUM)
+ return unloggedInit;
+
+ /* See whether the OID portion of the name shows up in the hash table. */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, file, oidchars);
+ hash_search(hash, &ent, HASH_FIND, &found);
+
+ /* If found this is another fork of an unlogged table (but not init). */
+ if (found)
+ return unloggedOther;
+
+ return notUnlogged;
+}
+
+/*
* Process one per-dbspace directory for ResetUnloggedRelations
*/
static void
@@ -166,58 +258,13 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
if ((op & UNLOGGED_RELATION_CLEANUP) != 0)
{
HTAB *hash;
- HASHCTL ctl;
-
- /*
- * It's possible that someone could create a ton of unlogged relations
- * in the same database & tablespace, so we'd better use a hash table
- * rather than an array or linked list to keep track of which files
- * need to be reset. Otherwise, this cleanup operation would be
- * O(n^2).
- */
- memset(&ctl, 0, sizeof(ctl));
- ctl.keysize = sizeof(unlogged_relation_entry);
- ctl.entrysize = sizeof(unlogged_relation_entry);
- hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
-
- /* Scan the directory. */
- dbspace_dir = AllocateDir(dbspacedirname);
- while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
- {
- ForkNumber forkNum;
- int oidchars;
- unlogged_relation_entry ent;
-
- /* Skip anything that doesn't look like a relation data file. */
- if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
- &forkNum))
- continue;
-
- /* Also skip it unless this is the init fork. */
- if (forkNum != INIT_FORKNUM)
- continue;
- /*
- * Put the OID portion of the name into the hash table, if it
- * isn't already.
- */
- memset(ent.oid, 0, sizeof(ent.oid));
- memcpy(ent.oid, de->d_name, oidchars);
- hash_search(hash, &ent, HASH_ENTER, NULL);
- }
-
- /* Done with the first pass. */
- FreeDir(dbspace_dir);
+ /* Build a hash table of all unlogged relations. */
+ hash = ResetUnloggedRelationsHash(dbspacedirname);
- /*
- * If we didn't find any init forks, there's no point in continuing;
- * we can bail out now.
- */
- if (hash_get_num_entries(hash) == 0)
- {
- hash_destroy(hash);
+ /* No need to continue if there are no unlogged tables. */
+ if (!hash)
return;
- }
/*
* Now, make a second pass and remove anything that matches.
@@ -225,30 +272,9 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
dbspace_dir = AllocateDir(dbspacedirname);
while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
{
- ForkNumber forkNum;
- int oidchars;
- bool found;
- unlogged_relation_entry ent;
-
- /* Skip anything that doesn't look like a relation data file. */
- if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
- &forkNum))
- continue;
-
- /* We never remove the init fork. */
- if (forkNum == INIT_FORKNUM)
- continue;
-
- /*
- * See whether the OID portion of the name shows up in the hash
- * table.
- */
- memset(ent.oid, 0, sizeof(ent.oid));
- memcpy(ent.oid, de->d_name, oidchars);
- hash_search(hash, &ent, HASH_FIND, &found);
-
- /* If so, nuke it! */
- if (found)
+ /* If this is an unlogged relation fork other than init, nuke it! */
+ if (ResetUnloggedRelationsMatch(
+ hash, de->d_name) == unloggedOther)
{
snprintf(rm_path, sizeof(rm_path), "%s/%s",
dbspacedirname, de->d_name);
diff --git a/src/include/storage/reinit.h b/src/include/storage/reinit.h
index 90e494e933..f5781b5bf2 100644
--- a/src/include/storage/reinit.h
+++ b/src/include/storage/reinit.h
@@ -15,9 +15,24 @@
#ifndef REINIT_H
#define REINIT_H
+#include "utils/hsearch.h"
+
extern void ResetUnloggedRelations(int op);
#define UNLOGGED_RELATION_CLEANUP 0x0001
#define UNLOGGED_RELATION_INIT 0x0002
+/* Return values for ResetUnloggedRelationsMatch(). */
+typedef enum
+{
+ notUnlogged, /* Not a relation or not an unlogged relation. */
+ unloggedInit, /* An unlogged relation init fork. */
+ unloggedOther /* An unlogged relation fork other than init. */
+} UnloggedRelationFork;
+
+/* Utility functions for identifying unlogged table forks. */
+extern HTAB *ResetUnloggedRelationsHash(const char *dbspacedirname);
+extern UnloggedRelationFork ResetUnloggedRelationsMatch(
+ HTAB *unloggedHash, const char *fileName);
+
#endif /* REINIT_H */
exclude-unlogged-v1-03.patchtext/plain; charset=UTF-8; name=exclude-unlogged-v1-03.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 8174e3defa..acac286bf4 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2549,6 +2549,12 @@ The commands accepted in walsender mode are:
</listitem>
<listitem>
<para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
<filename>pg_wal</filename>, including subdirectories. If the backup is run
with WAL files included, a synthesized version of <filename>pg_wal</filename> will be
included, but it will only contain the files necessary for the
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index cd7d391b2f..59ab7e19c9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -33,6 +33,7 @@
#include "storage/dsm_impl.h"
#include "storage/fd.h"
#include "storage/ipc.h"
+#include "storage/reinit.h"
#include "utils/builtins.h"
#include "utils/elog.h"
#include "utils/ps_status.h"
@@ -958,6 +959,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
char pathbuf[MAXPGPATH * 2];
struct stat statbuf;
int64 size = 0;
+ HTAB *unloggedHash = NULL; /* Unlogged tables in this path. */
+ const char *unloggedDelim; /* Split this path from parent path. */
+
+ /*
+ * Find any unlogged relations in this path and store them in a hash. All
+ * unlogged relation forks except init will be excluded from the backup.
+ *
+ * Start by finding the location of the delimiter between the parent
+ * path and the current path.
+ */
+ unloggedDelim = strrchr(path, '/');
+
+ /* Does this path look like a database path (i.e. all digits)? */
+ if (unloggedDelim != NULL &&
+ strspn(unloggedDelim + 1, "0123456789") == strlen(unloggedDelim + 1))
+ {
+ /* Part of path that contains the parent directory. */
+ int parentPathLen = unloggedDelim - path;
+
+ /*
+ * Build the unlogged relation hash if the parent path is either
+ * $PGDATA/base or a tablespace version path.
+ */
+ if (strncmp(path, "./base", parentPathLen) == 0 ||
+ (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) &&
+ strncmp(unloggedDelim - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+ TABLESPACE_VERSION_DIRECTORY,
+ sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0))
+ unloggedHash = ResetUnloggedRelationsHash(path);
+ }
dir = AllocateDir(path);
while ((de = ReadDir(dir, path)) != NULL)
@@ -1007,6 +1038,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (excludeFound)
continue;
+ /* Exclude all forks for unlogged tables except the init fork. */
+ if (unloggedHash && ResetUnloggedRelationsMatch(
+ unloggedHash, de->d_name) == unloggedOther)
+ {
+ elog(DEBUG2, "unlogged relation file \"%s\" excluded from backup",
+ de->d_name);
+ continue;
+ }
+
snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
/* Skip pg_control here to back up it last */
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..4a22c4a657 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
use Config;
use PostgresNode;
use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 87;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -66,6 +66,16 @@ foreach my $filename (
# positive.
$node->safe_psql('postgres', 'SELECT 1;');
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none' ],
'pg_basebackup runs');
ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
@@ -96,6 +106,12 @@ foreach my $filename (
ok(!-f "$tempdir/backup/$filename", "$filename not copied");
}
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok(!-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
# Make sure existing backup_label was ignored.
isnt(slurp_file("$tempdir/backup/backup_label"),
'DONOTCOPY', 'existing backup_label not copied');
@@ -147,7 +163,7 @@ unlink "$pgdata/$superlongname";
# skip on Windows.
SKIP:
{
- skip "symlinks not supported on Windows", 11 if ($windows_os);
+ skip "symlinks not supported on Windows", 15 if ($windows_os);
# Move pg_replslot out of $pgdata and create a symlink to it.
$node->stop;
@@ -177,6 +193,19 @@ SKIP:
my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;');
+
+ my $tblspc1UnloggedPath = $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok(-f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath",
+ 'unlogged main fork in tablespace');
+
$node->command_fails(
[ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp' ],
'plain format with tablespaces fails without tablespace mapping');
@@ -195,11 +224,20 @@ SKIP:
"tablespace symlink was updated");
closedir $dh;
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) = $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged imain fork not in tablespace backup');
+
ok( -d "$tempdir/backup1/pg_replslot",
'pg_replslot symlink copied as directory');
mkdir "$tempdir/tbl=spc2";
$node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
$node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
$node->safe_psql('postgres',
"CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
Hi,
On 2017-12-12 17:49:54 -0500, David Steele wrote:
Including unlogged relations in base backups takes up space and is wasteful
since they are truncated during backup recovery.The attached patches exclude unlogged relations from base backups except for
the init fork, which is required to recreate the main fork during recovery.
How do you reliably identify unlogged relations while writes are going
on? Without locks that sounds, uh, nontrivial?
I decided not to try and document unlogged exclusions in the continuous
backup documentation yet (they are noted in the protocol docs). I would
like to get some input on whether the community thinks this is a good idea.
It's a non-trivial procedure that would be easy to misunderstand and does
not affect the quality of the backup other than using less space. Thoughts?
Think it's a good idea, I've serious concerns about practicability of a
correct implementation though.
- Andres
Hi Andres,
On 12/12/17 5:52 PM, Andres Freund wrote:
On 2017-12-12 17:49:54 -0500, David Steele wrote:
Including unlogged relations in base backups takes up space and is wasteful
since they are truncated during backup recovery.The attached patches exclude unlogged relations from base backups except for
the init fork, which is required to recreate the main fork during recovery.How do you reliably identify unlogged relations while writes are going
on? Without locks that sounds, uh, nontrivial?
I don't think this is an issue. If the init fork exists it should be OK
if it is torn since it will be recreated from WAL.
If the forks are written out of order (i.e. main before init), which is
definitely possible, then I think worst case is some files will be
backed up that don't need to be. The main fork is unlikely to be very
large at that point so it doesn't seem like a big deal.
I don't see this as any different than what happens during recovery.
The unlogged forks are cleaned / re-inited before replay starts which is
the same thing we are doing here.
I decided not to try and document unlogged exclusions in the continuous
backup documentation yet (they are noted in the protocol docs). I would
like to get some input on whether the community thinks this is a good idea.
It's a non-trivial procedure that would be easy to misunderstand and does
not affect the quality of the backup other than using less space. Thoughts?Think it's a good idea, I've serious concerns about practicability of a
correct implementation though.
Well, I would be happy if you had a look!
Thanks.
--
-David
david@pgmasters.net
Hi,
On 2017-12-12 18:04:44 -0500, David Steele wrote:
On 12/12/17 5:52 PM, Andres Freund wrote:
On 2017-12-12 17:49:54 -0500, David Steele wrote:
Including unlogged relations in base backups takes up space and is wasteful
since they are truncated during backup recovery.The attached patches exclude unlogged relations from base backups except for
the init fork, which is required to recreate the main fork during recovery.How do you reliably identify unlogged relations while writes are going
on? Without locks that sounds, uh, nontrivial?I don't think this is an issue. If the init fork exists it should be OK if
it is torn since it will be recreated from WAL.
I'm not worried about torn pages.
If the forks are written out of order (i.e. main before init), which is
definitely possible, then I think worst case is some files will be backed up
that don't need to be. The main fork is unlikely to be very large at that
point so it doesn't seem like a big deal.I don't see this as any different than what happens during recovery. The
unlogged forks are cleaned / re-inited before replay starts which is the
same thing we are doing here.
It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled. What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?
Greetings,
Andres Freund
On Wed, Dec 13, 2017 at 8:04 AM, David Steele <david@pgmasters.net> wrote:
On 12/12/17 5:52 PM, Andres Freund wrote:
On 2017-12-12 17:49:54 -0500, David Steele wrote:
Including unlogged relations in base backups takes up space and is
wasteful
since they are truncated during backup recovery.The attached patches exclude unlogged relations from base backups except
for
the init fork, which is required to recreate the main fork during
recovery.How do you reliably identify unlogged relations while writes are going
on? Without locks that sounds, uh, nontrivial?I don't think this is an issue. If the init fork exists it should be OK if
it is torn since it will be recreated from WAL.
Yeah, I was just typing that until I saw your message.
If the forks are written out of order (i.e. main before init), which is
definitely possible, then I think worst case is some files will be backed up
that don't need to be. The main fork is unlikely to be very large at that
point so it doesn't seem like a big deal.
As far as I recall the init forks are logged before the main forks. I
don't think that we should rely on that assumption though to be always
satisfied.
I decided not to try and document unlogged exclusions in the continuous
backup documentation yet (they are noted in the protocol docs). I would
like to get some input on whether the community thinks this is a good
idea.
It's a non-trivial procedure that would be easy to misunderstand and does
not affect the quality of the backup other than using less space.
Thoughts?Think it's a good idea, I've serious concerns about practicability of a
correct implementation though.Well, I would be happy if you had a look!
You can count me in. I think that this patch has value for some
dedicated workloads. It is a waste to backup stuff that will be
removed at recovery anyway.
--
Michael
On 12/12/17 6:07 PM, Andres Freund wrote:
I don't see this as any different than what happens during recovery. The
unlogged forks are cleaned / re-inited before replay starts which is the
same thing we are doing here.It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled. What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?
Well, that's a good point!
How about rechecking the presence of the init fork after a main/other
fork has been found? Is it possible for an init fork to still be lying
around after an oid has been recycled? Seems like it could be...
--
-David
david@pgmasters.net
On 2017-12-12 18:18:09 -0500, David Steele wrote:
On 12/12/17 6:07 PM, Andres Freund wrote:
I don't see this as any different than what happens during recovery. The
unlogged forks are cleaned / re-inited before replay starts which is the
same thing we are doing here.It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled. What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?Well, that's a good point!
How about rechecking the presence of the init fork after a main/other fork
has been found? Is it possible for an init fork to still be lying around
after an oid has been recycled? Seems like it could be...
I don't see how that'd help. You could just have gone through this cycle
multiple times by the time you get to rechecking. All not very likely,
but I don't want us to rely on luck here...
If we had a way to prevent relfilenode reuse across multiple checkpoints
this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate.
I guess we could have the basebackup create placeholder files that
prevent relfilenode reuse, but that seems darned ugly.
Greetings,
Andres Freund
Hi Michael,
On 12/12/17 6:08 PM, Michael Paquier wrote:
If the forks are written out of order (i.e. main before init), which is
definitely possible, then I think worst case is some files will be backed up
that don't need to be. The main fork is unlikely to be very large at that
point so it doesn't seem like a big deal.As far as I recall the init forks are logged before the main forks. I
don't think that we should rely on that assumption though to be always
satisfied.
Indeed, nothing is sure until a checkpoint. Until then we must assume
writes are random.
Well, I would be happy if you had a look!
You can count me in. I think that this patch has value for some
dedicated workloads.
Thanks!
It is a waste to backup stuff that will be
removed at recovery anyway.
It also causes confusion when the recovered database is smaller than the
backup. I can't tell you how many times I have answered this question...
--
-David
david@pgmasters.net
On 12/12/17 6:21 PM, Andres Freund wrote:
On 2017-12-12 18:18:09 -0500, David Steele wrote:
On 12/12/17 6:07 PM, Andres Freund wrote:
It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled. What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?Well, that's a good point!
How about rechecking the presence of the init fork after a main/other fork
has been found? Is it possible for an init fork to still be lying around
after an oid has been recycled? Seems like it could be...I don't see how that'd help. You could just have gone through this cycle
multiple times by the time you get to rechecking. All not very likely,
but I don't want us to rely on luck here...
Definitely not.
If we had a way to prevent relfilenode reuse across multiple checkpoints
this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate.
Or error the backup if there is wraparound?
We already have an error if a standby is promoted during backup -- so
there is some precedent.
I guess we could have the basebackup create placeholder files that
prevent relfilenode reuse, but that seems darned ugly.
Yes, very ugly.
--
-David
david@pgmasters.net
Hi,
On 2017-12-12 18:30:47 -0500, David Steele wrote:
If we had a way to prevent relfilenode reuse across multiple checkpoints
this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate.Or error the backup if there is wraparound?
That seems entirely unacceptable to me. On a machine with lots of
toasting etc going on an oid wraparound doesn't take a long time. We've
only one oid counter for all tables, and relfilenodes are inferred from
that ....
Greetings,
Andres Freund
On 12/12/17 6:33 PM, Andres Freund wrote:
On 2017-12-12 18:30:47 -0500, David Steele wrote:
If we had a way to prevent relfilenode reuse across multiple checkpoints
this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate.Or error the backup if there is wraparound?
That seems entirely unacceptable to me. On a machine with lots of
toasting etc going on an oid wraparound doesn't take a long time. We've
only one oid counter for all tables, and relfilenodes are inferred from
that ....
Fair enough. I'll think on it.
--
-David
david@pgmasters.net
Andres,
* Andres Freund (andres@anarazel.de) wrote:
On 2017-12-12 18:04:44 -0500, David Steele wrote:
If the forks are written out of order (i.e. main before init), which is
definitely possible, then I think worst case is some files will be backed up
that don't need to be. The main fork is unlikely to be very large at that
point so it doesn't seem like a big deal.I don't see this as any different than what happens during recovery. The
unlogged forks are cleaned / re-inited before replay starts which is the
same thing we are doing here.It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled. What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?
We *are* actually talking about the recovery case here because this is a
backup that's happening and WAL replay will be happening after the
pg_basebackup is done and then the backup restored somewhere and PG
started up again.
If the persistence is changed then the table will be written into the
WAL, no? All of the WAL generated during a backup (which is what we're
talking about here) has to be replayed after the restore is done and is
before the database is considered consistent, so none of this matters,
as far as I can see, because the drop table or alter table logged or
anything else will be in the WAL that ends up getting replayed.
If that's not correct, then isn't there a live issue here with how
backups are happening today with unlogged tables and online backups?
I don't think there is, because, as David points out, the unlogged
tables are cleaned up first and then WAL replay happens during recovery,
so the init fork will cause the relation to be overwritten, but then
later the logged 'drop table' and subsequent re-use of the relfilenode
to create a new table (or persistence change) will all be in the WAL and
will be replayed over top and will take care of this.
Thanks!
Stephen
On 12/12/17 8:48 PM, Stephen Frost wrote:
Andres,
* Andres Freund (andres@anarazel.de) wrote:
On 2017-12-12 18:04:44 -0500, David Steele wrote:
If the forks are written out of order (i.e. main before init), which is
definitely possible, then I think worst case is some files will be backed up
that don't need to be. The main fork is unlikely to be very large at that
point so it doesn't seem like a big deal.I don't see this as any different than what happens during recovery. The
unlogged forks are cleaned / re-inited before replay starts which is the
same thing we are doing here.It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled. What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?We *are* actually talking about the recovery case here because this is a
backup that's happening and WAL replay will be happening after the
pg_basebackup is done and then the backup restored somewhere and PG
started up again.If the persistence is changed then the table will be written into the
WAL, no? All of the WAL generated during a backup (which is what we're
talking about here) has to be replayed after the restore is done and is
before the database is considered consistent, so none of this matters,
as far as I can see, because the drop table or alter table logged or
anything else will be in the WAL that ends up getting replayed.
Yes - that's the way I see it. At least when I'm not tired from a day
of coding like I was last night...
I don't think there is, because, as David points out, the unlogged
tables are cleaned up first and then WAL replay happens during recovery,
so the init fork will cause the relation to be overwritten, but then
later the logged 'drop table' and subsequent re-use of the relfilenode
to create a new table (or persistence change) will all be in the WAL and
will be replayed over top and will take care of this.
Files can be copied in any order, so if an OID is recycled the backup
could copy its first, second, or nth incarnation. It doesn't really
matter since all of it will be clobbered by WAL replay.
The new base backup code just does the non-init fork removal in advance,
following the same rules that would apply on recovery given the same
file set.
--
-David
david@pgmasters.net
David,
* David Steele (david@pgmasters.net) wrote:
On 12/12/17 8:48 PM, Stephen Frost wrote:
I don't think there is, because, as David points out, the unlogged
tables are cleaned up first and then WAL replay happens during recovery,
so the init fork will cause the relation to be overwritten, but then
later the logged 'drop table' and subsequent re-use of the relfilenode
to create a new table (or persistence change) will all be in the WAL and
will be replayed over top and will take care of this.Files can be copied in any order, so if an OID is recycled the backup
could copy its first, second, or nth incarnation. It doesn't really
matter since all of it will be clobbered by WAL replay.The new base backup code just does the non-init fork removal in advance,
following the same rules that would apply on recovery given the same
file set.
Just to be clear- the new base backup code doesn't actually *do* the
non-init fork removal, it simply doesn't include the non-init fork in
the backup when there is an init fork, right?
We certainly wouldn't want a basebackup actually running around removing
the main fork for unlogged tables on a running and otherwise healthy
system. ;)
Thanks!
Stephen
On 12/13/17 10:04 AM, Stephen Frost wrote:
Just to be clear- the new base backup code doesn't actually *do* the
non-init fork removal, it simply doesn't include the non-init fork in
the backup when there is an init fork, right?
It does *not* do the unlogged non-init fork removal. The code I
refactored in reinit.c is about identifying the forks, not removing
them. That code is reused to determine what to exclude from the backup.
I added the regression tests to ensure that the behavior of reinit.c is
unchanged after the refactor.
We certainly wouldn't want a basebackup actually running around removing
the main fork for unlogged tables on a running and otherwise healthy
system. ;)
That would not be good.
--
-David
david@pgmasters.net
On Tue, Dec 12, 2017 at 8:48 PM, Stephen Frost <sfrost@snowman.net> wrote:
If the persistence is changed then the table will be written into the
WAL, no? All of the WAL generated during a backup (which is what we're
talking about here) has to be replayed after the restore is done and is
before the database is considered consistent, so none of this matters,
as far as I can see, because the drop table or alter table logged or
anything else will be in the WAL that ends up getting replayed.
I can't see a hole in this argument. If we copy the init fork and
skip copying the main fork, then either we skipped copying the right
file, or the file we skipped copying will be recreated with the
correct contents during WAL replay anyway.
We could have a problem if wal_level=minimal, because then the new
file might not have been WAL-logged; but taking an online backup with
wal_level=minimal isn't supported precisely because we won't have WAL
replay to fix things up.
We would also have a problem if the missing file caused something in
recovery to croak on the grounds that the file was expected to be
there, but I don't think anything works that way; I think we just
assume missing files are an expected failure mode and silently do
nothing if asked to remove them.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
All,
I have reviewed and tested these patches.
The patches applied cleanly in order against master at (90947674fc).
I ran the provided regression tests and a 'check-world'. All tests succeeded.
Marking ready for committer.
-Adam
On Thu, Dec 14, 2017 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 12, 2017 at 8:48 PM, Stephen Frost <sfrost@snowman.net> wrote:
If the persistence is changed then the table will be written into the
WAL, no? All of the WAL generated during a backup (which is what we're
talking about here) has to be replayed after the restore is done and is
before the database is considered consistent, so none of this matters,
as far as I can see, because the drop table or alter table logged or
anything else will be in the WAL that ends up getting replayed.I can't see a hole in this argument. If we copy the init fork and
skip copying the main fork, then either we skipped copying the right
file, or the file we skipped copying will be recreated with the
correct contents during WAL replay anyway.We could have a problem if wal_level=minimal, because then the new
file might not have been WAL-logged; but taking an online backup with
wal_level=minimal isn't supported precisely because we won't have WAL
replay to fix things up.We would also have a problem if the missing file caused something in
recovery to croak on the grounds that the file was expected to be
there, but I don't think anything works that way; I think we just
assume missing files are an expected failure mode and silently do
nothing if asked to remove them.
I also couldn't see a problem in this approach.
Here is the first review comments.
+ unloggedDelim = strrchr(path, '/');
I think it doesn't work fine on windows. How about using
last_dir_separator() instead?
----
+ * Find all unlogged relations in the specified directory and return
their OIDs.
What the ResetUnloggedrelationsHash() actually returns is a hash
table. The comment of this function seems not appropriate.
----
+ /* Part of path that contains the parent directory. */
+ int parentPathLen = unloggedDelim - path;
+
+ /*
+ * Build the unlogged relation hash if the parent path is either
+ * $PGDATA/base or a tablespace version path.
+ */
+ if (strncmp(path, "./base", parentPathLen) == 0 ||
+ (parentPathLen >=
(sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) &&
+ strncmp(unloggedDelim -
(sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+ TABLESPACE_VERSION_DIRECTORY,
+
sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0))
+ unloggedHash = ResetUnloggedRelationsHash(path);
+ }
How about using get_parent_directory() to get parent directory name?
Also, I think it's better to destroy the unloggedHash after use.
----
+ /* Exclude all forks for unlogged tables except the
init fork. */
+ if (unloggedHash && ResetUnloggedRelationsMatch(
+ unloggedHash, de->d_name) == unloggedOther)
+ {
+ elog(DEBUG2, "unlogged relation file \"%s\"
excluded from backup",
+ de->d_name);
+ continue;
+ }
I think it's better to log this debug message at DEBUG2 level for
consistency with other messages.
----
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged imain fork not in tablespace backup');
s/imain/main/
----
If a new unlogged relation is created after constructed the
unloggedHash before sending file, we cannot exclude such relation. It
would not be problem if the taking backup is not long because the new
unlogged relation unlikely becomes so large. However, if takeing a
backup takes a long time, we could include large main fork in the
backup.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Hi Masahiko,
Thanks for the review!
On 1/22/18 3:14 AM, Masahiko Sawada wrote:
On Thu, Dec 14, 2017 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
We would also have a problem if the missing file caused something in
recovery to croak on the grounds that the file was expected to be
there, but I don't think anything works that way; I think we just
assume missing files are an expected failure mode and silently do
nothing if asked to remove them.I also couldn't see a problem in this approach.
Here is the first review comments.
+ unloggedDelim = strrchr(path, '/');
I think it doesn't work fine on windows. How about using
last_dir_separator() instead?
I think this function is OK on Windows -- we use it quite a bit.
However, last_dir_separator() is clearer so I have changed it.
---- + * Find all unlogged relations in the specified directory and return their OIDs.What the ResetUnloggedrelationsHash() actually returns is a hash
table. The comment of this function seems not appropriate.
Fixed.
+ /* Part of path that contains the parent directory. */ + int parentPathLen = unloggedDelim - path; + + /* + * Build the unlogged relation hash if the parent path is either + * $PGDATA/base or a tablespace version path. + */ + if (strncmp(path, "./base", parentPathLen) == 0 || + (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) && + strncmp(unloggedDelim - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1), + TABLESPACE_VERSION_DIRECTORY, + sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0)) + unloggedHash = ResetUnloggedRelationsHash(path); + }How about using get_parent_directory() to get parent directory name?
get_parent_directory() munges the string that is passed to it which I
was trying to avoid (we'd need a copy) - and I don't think it makes the
rest of the logic any simpler without constructing yet another string to
hold the tablespace path.
I know performance isn't the most important thing here, so if the
argument is for clarity perhaps it makes sense. Otherwise I don't know
if it's worth it.
Also, I think it's better to destroy the unloggedHash after use.
Whoops! Fixed.
+ /* Exclude all forks for unlogged tables except the init fork. */ + if (unloggedHash && ResetUnloggedRelationsMatch( + unloggedHash, de->d_name) == unloggedOther) + { + elog(DEBUG2, "unlogged relation file \"%s\" excluded from backup", + de->d_name); + continue; + }I think it's better to log this debug message at DEBUG2 level for
consistency with other messages.
I think you mean DEBUG1? It's already at DEBUG2.
I considered using DEBUG1 but decided against it. The other exclusions
will produce a limited amount of output because there are only a few of
them. In the case of unlogged tables there could be any number of
exclusions and I thought that was too noisy for DEBUG1.
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath", + 'unlogged imain fork not in tablespace backup');s/imain/main/
Fixed.
If a new unlogged relation is created after constructed the
unloggedHash before sending file, we cannot exclude such relation. It
would not be problem if the taking backup is not long because the new
unlogged relation unlikely becomes so large. However, if takeing a
backup takes a long time, we could include large main fork in the
backup.
This is a good point. It's per database directory which makes it a
little better, but maybe not by much.
Three options here:
1) Leave it as is knowing that unlogged relations created during the
backup may be copied and document it that way.
2) Construct a list for SendDir() to work against so the gap between
creating that and creating the unlogged hash is as small as possible.
The downside here is that the list may be very large and take up a lot
of memory.
3) Check each file that looks like a relation in the loop to see if it
has an init fork. This might affect performance since an
opendir/readdir loop would be required for every relation.
Personally, I'm in favor of #1, at least for the time being. I've
updated the docs as indicated in case you and Adam agree.
New patches attached.
Thanks!
--
-David
david@pgmasters.net
Attachments:
exclude-unlogged-v2-01.patchtext/plain; charset=UTF-8; name=exclude-unlogged-v2-01.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/test/recovery/t/014_unlogged_reinit.pl b/src/test/recovery/t/014_unlogged_reinit.pl
new file mode 100644
index 0000000000..ac2e251158
--- /dev/null
+++ b/src/test/recovery/t/014_unlogged_reinit.pl
@@ -0,0 +1,117 @@
+# Tests that unlogged tables are properly reinitialized after a crash.
+#
+# The behavior should be the same when restoring from a backup but that is not
+# tested here (yet).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Initialize node without replication settings
+my $node = get_new_node('main');
+
+$node->init;
+$node->start;
+my $pgdata = $node->data_dir;
+
+# Create an unlogged table to test that forks other than init are not copied
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+my $tablespaceDir = undef;
+my $ts1UnloggedPath = undef;
+
+SKIP:
+{
+ skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+ # Create unlogged tables in a tablespace
+ $tablespaceDir = TestLib::tempdir . "/ts1";
+
+ mkdir($tablespaceDir)
+ or die "unable to mkdir \"$tablespaceDir\"";
+
+ $node->safe_psql('postgres',
+ "CREATE TABLESPACE ts1 LOCATION '$tablespaceDir'");
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE ts1_unlogged (id int) TABLESPACE ts1');
+
+ $ts1UnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('ts1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+ ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+}
+
+# Crash the postmaster
+$node->stop('immediate');
+
+# Write forks to test that they are removed during recovery
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_vm"],
+ 'touch vm fork in base');
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_fsm"],
+ 'touch fsm fork in base');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${baseUnloggedPath}")
+ or die "unable to remove \"${baseUnloggedPath}\"";
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+ # Write forks to test that they are removed by recovery
+ $node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_vm"],
+ 'touch vm fork in tablespace');
+ $node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_fsm"],
+ 'touch fsm fork in tablespace');
+
+ # Remove main fork to test that it is recopied from init
+ unlink("$pgdata/${ts1UnloggedPath}")
+ or die "unable to remove \"${ts1UnloggedPath}\"";
+}
+
+# Start the postmaster
+$node->start;
+
+# Check unlogged table in base
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork not in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_fsm", 'fsm fork not in base');
+
+# Drop unlogged table
+$node->safe_psql('postgres', 'DROP TABLE base_unlogged');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+ skip "symlinks not supported on Windows", 4 if ($windows_os);
+
+ # Check unlogged table in tablespace
+ ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+ ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+ ok(!-f "$pgdata/${ts1UnloggedPath}_vm", 'vm fork not in tablespace');
+ ok(!-f "$pgdata/${ts1UnloggedPath}_fsm", 'fsm fork not in tablespace');
+
+ # Drop unlogged table
+ $node->safe_psql('postgres', 'DROP TABLE ts1_unlogged');
+
+ # Drop tablespace
+ $node->safe_psql('postgres', 'DROP TABLESPACE ts1');
+ rmdir($tablespaceDir)
+ or die "unable to rmdir \"$tablespaceDir\"";
+}
exclude-unlogged-v2-02.patchtext/plain; charset=UTF-8; name=exclude-unlogged-v2-02.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 92363ae6ad..73a25fa3e7 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -21,7 +21,6 @@
#include "storage/copydir.h"
#include "storage/fd.h"
#include "storage/reinit.h"
-#include "utils/hsearch.h"
#include "utils/memutils.h"
static void ResetUnloggedRelationsInTablespaceDir(const char *tsdirname,
@@ -145,6 +144,100 @@ ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
FreeDir(ts_dir);
}
+/*
+ * Find all unlogged relations in the specified directory and return their OIDs
+ * as a hash table.
+ *
+ * It's possible that someone could create a ton of unlogged relations in the
+ * same database & tablespace, so we'd better use a hash table rather than an
+ * array or linked list to keep track of which files need to be reset.
+ * Otherwise, search operations would be O(n^2).
+ */
+HTAB *
+ResetUnloggedRelationsHash(const char *dbspacedirname)
+{
+ DIR *dbspace_dir;
+ struct dirent *de;
+ HTAB *hash = NULL;
+
+ /* Scan the directory. */
+ dbspace_dir = AllocateDir(dbspacedirname);
+ while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+ {
+ ForkNumber forkNum;
+ int oidchars;
+ unlogged_relation_entry ent;
+
+ /* Skip anything that doesn't look like a relation data file. */
+ if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+ &forkNum))
+ continue;
+
+ /* Also skip it unless this is the init fork. */
+ if (forkNum != INIT_FORKNUM)
+ continue;
+
+ /* Create the hash table if it has not been created already. */
+ if (!hash)
+ {
+ HASHCTL ctl;
+
+ memset(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(unlogged_relation_entry);
+ ctl.entrysize = sizeof(unlogged_relation_entry);
+ hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
+ }
+
+ /*
+ * Put the OID portion of the name into the hash table, if it
+ * isn't already.
+ */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, de->d_name, oidchars);
+ hash_search(hash, &ent, HASH_ENTER, NULL);
+ }
+
+ /* Done with the first pass. */
+ FreeDir(dbspace_dir);
+
+ return hash;
+}
+
+/*
+ * Determine whether the specified file is an unlogged relation fork.
+ *
+ * If not an unlogged relation then return notUnlogged, otherwise return
+ * unloggedInit if an unlogged init fork and unloggedOther if any other unlogged
+ * fork.
+ */
+UnloggedRelationFork
+ResetUnloggedRelationsMatch(HTAB *hash, const char *file)
+{
+ ForkNumber forkNum;
+ int oidchars;
+ bool found;
+ unlogged_relation_entry ent;
+
+ /* If it's not a relation then it's not unlogged. */
+ if (!parse_filename_for_nontemp_relation(file, &oidchars, &forkNum))
+ return notUnlogged;
+
+ /* An unlogged init fork. */
+ if (forkNum == INIT_FORKNUM)
+ return unloggedInit;
+
+ /* See whether the OID portion of the name shows up in the hash table. */
+ memset(ent.oid, 0, sizeof(ent.oid));
+ memcpy(ent.oid, file, oidchars);
+ hash_search(hash, &ent, HASH_FIND, &found);
+
+ /* If found this is another fork of an unlogged table (but not init). */
+ if (found)
+ return unloggedOther;
+
+ return notUnlogged;
+}
+
/*
* Process one per-dbspace directory for ResetUnloggedRelations
*/
@@ -166,58 +259,13 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
if ((op & UNLOGGED_RELATION_CLEANUP) != 0)
{
HTAB *hash;
- HASHCTL ctl;
-
- /*
- * It's possible that someone could create a ton of unlogged relations
- * in the same database & tablespace, so we'd better use a hash table
- * rather than an array or linked list to keep track of which files
- * need to be reset. Otherwise, this cleanup operation would be
- * O(n^2).
- */
- memset(&ctl, 0, sizeof(ctl));
- ctl.keysize = sizeof(unlogged_relation_entry);
- ctl.entrysize = sizeof(unlogged_relation_entry);
- hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
-
- /* Scan the directory. */
- dbspace_dir = AllocateDir(dbspacedirname);
- while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
- {
- ForkNumber forkNum;
- int oidchars;
- unlogged_relation_entry ent;
-
- /* Skip anything that doesn't look like a relation data file. */
- if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
- &forkNum))
- continue;
-
- /* Also skip it unless this is the init fork. */
- if (forkNum != INIT_FORKNUM)
- continue;
- /*
- * Put the OID portion of the name into the hash table, if it
- * isn't already.
- */
- memset(ent.oid, 0, sizeof(ent.oid));
- memcpy(ent.oid, de->d_name, oidchars);
- hash_search(hash, &ent, HASH_ENTER, NULL);
- }
-
- /* Done with the first pass. */
- FreeDir(dbspace_dir);
+ /* Build a hash table of all unlogged relations. */
+ hash = ResetUnloggedRelationsHash(dbspacedirname);
- /*
- * If we didn't find any init forks, there's no point in continuing;
- * we can bail out now.
- */
- if (hash_get_num_entries(hash) == 0)
- {
- hash_destroy(hash);
+ /* No need to continue if there are no unlogged tables. */
+ if (!hash)
return;
- }
/*
* Now, make a second pass and remove anything that matches.
@@ -225,30 +273,9 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
dbspace_dir = AllocateDir(dbspacedirname);
while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
{
- ForkNumber forkNum;
- int oidchars;
- bool found;
- unlogged_relation_entry ent;
-
- /* Skip anything that doesn't look like a relation data file. */
- if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
- &forkNum))
- continue;
-
- /* We never remove the init fork. */
- if (forkNum == INIT_FORKNUM)
- continue;
-
- /*
- * See whether the OID portion of the name shows up in the hash
- * table.
- */
- memset(ent.oid, 0, sizeof(ent.oid));
- memcpy(ent.oid, de->d_name, oidchars);
- hash_search(hash, &ent, HASH_FIND, &found);
-
- /* If so, nuke it! */
- if (found)
+ /* If this is an unlogged relation fork other than init, nuke it! */
+ if (ResetUnloggedRelationsMatch(
+ hash, de->d_name) == unloggedOther)
{
snprintf(rm_path, sizeof(rm_path), "%s/%s",
dbspacedirname, de->d_name);
diff --git a/src/include/storage/reinit.h b/src/include/storage/reinit.h
index addda2c0ea..cd422e519a 100644
--- a/src/include/storage/reinit.h
+++ b/src/include/storage/reinit.h
@@ -15,9 +15,24 @@
#ifndef REINIT_H
#define REINIT_H
+#include "utils/hsearch.h"
+
extern void ResetUnloggedRelations(int op);
#define UNLOGGED_RELATION_CLEANUP 0x0001
#define UNLOGGED_RELATION_INIT 0x0002
+/* Return values for ResetUnloggedRelationsMatch(). */
+typedef enum
+{
+ notUnlogged, /* Not a relation or not an unlogged relation. */
+ unloggedInit, /* An unlogged relation init fork. */
+ unloggedOther /* An unlogged relation fork other than init. */
+} UnloggedRelationFork;
+
+/* Utility functions for identifying unlogged table forks. */
+extern HTAB *ResetUnloggedRelationsHash(const char *dbspacedirname);
+extern UnloggedRelationFork ResetUnloggedRelationsMatch(
+ HTAB *unloggedHash, const char *fileName);
+
#endif /* REINIT_H */
exclude-unlogged-v2-03.patchtext/plain; charset=UTF-8; name=exclude-unlogged-v2-03.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 4c5ed1e6d6..3fec491bbe 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2552,6 +2552,14 @@ The commands accepted in walsender mode are:
with <filename>pgsql_tmp</filename>.
</para>
</listitem>
+ <listitem>
+ <para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery. Unlogged tables
+ created during a backup may be copied, but will be excluded from the
+ next backup.
+ </para>
+ </listitem>
<listitem>
<para>
<filename>pg_wal</filename>, including subdirectories. If the backup is run
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index dd7ad64862..1a5c91de17 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -26,6 +26,7 @@
#include "nodes/pg_list.h"
#include "pgtar.h"
#include "pgstat.h"
+#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
#include "replication/walsender.h"
@@ -33,6 +34,7 @@
#include "storage/dsm_impl.h"
#include "storage/fd.h"
#include "storage/ipc.h"
+#include "storage/reinit.h"
#include "utils/builtins.h"
#include "utils/elog.h"
#include "utils/ps_status.h"
@@ -959,6 +961,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
char pathbuf[MAXPGPATH * 2];
struct stat statbuf;
int64 size = 0;
+ HTAB *unloggedHash = NULL; /* Unlogged tables in this path. */
+ const char *unloggedDelim; /* Split this path from parent path. */
+
+ /*
+ * Find any unlogged relations in this path and store them in a hash. All
+ * unlogged relation forks except init will be excluded from the backup.
+ *
+ * Start by finding the location of the delimiter between the parent
+ * path and the current path.
+ */
+ unloggedDelim = last_dir_separator(path);
+
+ /* Does this path look like a database path (i.e. all digits)? */
+ if (unloggedDelim != NULL &&
+ strspn(unloggedDelim + 1, "0123456789") == strlen(unloggedDelim + 1))
+ {
+ /* Part of path that contains the parent directory. */
+ int parentPathLen = unloggedDelim - path;
+
+ /*
+ * Build the unlogged relation hash if the parent path is either
+ * $PGDATA/base or a tablespace version path.
+ */
+ if (strncmp(path, "./base", parentPathLen) == 0 ||
+ (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) &&
+ strncmp(unloggedDelim - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+ TABLESPACE_VERSION_DIRECTORY,
+ sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0))
+ unloggedHash = ResetUnloggedRelationsHash(path);
+ }
dir = AllocateDir(path);
while ((de = ReadDir(dir, path)) != NULL)
@@ -1008,6 +1040,15 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (excludeFound)
continue;
+ /* Exclude all forks for unlogged tables except the init fork. */
+ if (unloggedHash && ResetUnloggedRelationsMatch(
+ unloggedHash, de->d_name) == unloggedOther)
+ {
+ elog(DEBUG2, "unlogged relation file \"%s\" excluded from backup",
+ de->d_name);
+ continue;
+ }
+
snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
/* Skip pg_control here to back up it last */
@@ -1177,6 +1218,11 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
(errmsg("skipping special file \"%s\"", pathbuf)));
}
FreeDir(dir);
+
+ /* Free the hash containing unlogged tables, if it was created */
+ if (unloggedHash != NULL)
+ hash_destroy(unloggedHash);
+
return size;
}
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..455c7fca0d 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
use Config;
use PostgresNode;
use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 87;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -66,6 +66,16 @@ foreach my $filename (
# positive.
$node->safe_psql('postgres', 'SELECT 1;');
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none' ],
'pg_basebackup runs');
ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
@@ -96,6 +106,12 @@ foreach my $filename (
ok(!-f "$tempdir/backup/$filename", "$filename not copied");
}
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok(!-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
# Make sure existing backup_label was ignored.
isnt(slurp_file("$tempdir/backup/backup_label"),
'DONOTCOPY', 'existing backup_label not copied');
@@ -147,7 +163,7 @@ unlink "$pgdata/$superlongname";
# skip on Windows.
SKIP:
{
- skip "symlinks not supported on Windows", 11 if ($windows_os);
+ skip "symlinks not supported on Windows", 15 if ($windows_os);
# Move pg_replslot out of $pgdata and create a symlink to it.
$node->stop;
@@ -177,6 +193,19 @@ SKIP:
my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;');
+
+ my $tblspc1UnloggedPath = $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok(-f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath",
+ 'unlogged main fork in tablespace');
+
$node->command_fails(
[ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp' ],
'plain format with tablespaces fails without tablespace mapping');
@@ -195,11 +224,20 @@ SKIP:
"tablespace symlink was updated");
closedir $dh;
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) = $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
ok( -d "$tempdir/backup1/pg_replslot",
'pg_replslot symlink copied as directory');
mkdir "$tempdir/tbl=spc2";
$node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
$node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
$node->safe_psql('postgres',
"CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
If a new unlogged relation is created after constructed the
unloggedHash before sending file, we cannot exclude such relation. It
would not be problem if the taking backup is not long because the new
unlogged relation unlikely becomes so large. However, if takeing a
backup takes a long time, we could include large main fork in the
backup.This is a good point. It's per database directory which makes it a
little better, but maybe not by much.Three options here:
1) Leave it as is knowing that unlogged relations created during the
backup may be copied and document it that way.2) Construct a list for SendDir() to work against so the gap between
creating that and creating the unlogged hash is as small as possible.
The downside here is that the list may be very large and take up a lot
of memory.3) Check each file that looks like a relation in the loop to see if it
has an init fork. This might affect performance since an
opendir/readdir loop would be required for every relation.Personally, I'm in favor of #1, at least for the time being. I've
updated the docs as indicated in case you and Adam agree.
I agree with #1 and feel the updated docs are reasonable and
sufficient to address this case for now.
I have retested these patches against master at d6ab720360.
All test succeed.
Marking "Ready for Committer".
-Adam
I agree with #1 and feel the updated docs are reasonable and
sufficient to address this case for now.I have retested these patches against master at d6ab720360.
All test succeed.
Marking "Ready for Committer".
Actually, marked it "Ready for Review" to wait for Masahiko to comment/agree.
Masahiko,
If you agree with the above, would you mind updating the status accordingly?
-Adam
On 1/24/18 4:02 PM, Adam Brightwell wrote:
If a new unlogged relation is created after constructed the
unloggedHash before sending file, we cannot exclude such relation. It
would not be problem if the taking backup is not long because the new
unlogged relation unlikely becomes so large. However, if takeing a
backup takes a long time, we could include large main fork in the
backup.This is a good point. It's per database directory which makes it a
little better, but maybe not by much.Three options here:
1) Leave it as is knowing that unlogged relations created during the
backup may be copied and document it that way.2) Construct a list for SendDir() to work against so the gap between
creating that and creating the unlogged hash is as small as possible.
The downside here is that the list may be very large and take up a lot
of memory.3) Check each file that looks like a relation in the loop to see if it
has an init fork. This might affect performance since an
opendir/readdir loop would be required for every relation.Personally, I'm in favor of #1, at least for the time being. I've
updated the docs as indicated in case you and Adam agree.I agree with #1 and feel the updated docs are reasonable and
sufficient to address this case for now.I have retested these patches against master at d6ab720360.
All test succeed.
Marking "Ready for Committer".
Thanks, Adam!
Actually, I was talking to Stephen about this it seems like #3 would be
more practical if we just stat'd the init fork for each relation file
found. I doubt the stat would add a lot of overhead and we can track
each unlogged relation in a hash table to reduce overhead even more.
I'll look at that tomorrow and see if I can work out something practical.
--
-David
david@pgmasters.net
On Thu, Jan 25, 2018 at 3:25 AM, David Steele <david@pgmasters.net> wrote:
Hi Masahiko,
Thanks for the review!
On 1/22/18 3:14 AM, Masahiko Sawada wrote:
On Thu, Dec 14, 2017 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
We would also have a problem if the missing file caused something in
recovery to croak on the grounds that the file was expected to be
there, but I don't think anything works that way; I think we just
assume missing files are an expected failure mode and silently do
nothing if asked to remove them.I also couldn't see a problem in this approach.
Here is the first review comments.
+ unloggedDelim = strrchr(path, '/');
I think it doesn't work fine on windows. How about using
last_dir_separator() instead?I think this function is OK on Windows -- we use it quite a bit.
However, last_dir_separator() is clearer so I have changed it.
Thank you for updating this. I was concerned about a separator
character '/' might not work fine on windows.
---- + * Find all unlogged relations in the specified directory and return their OIDs.What the ResetUnloggedrelationsHash() actually returns is a hash
table. The comment of this function seems not appropriate.Fixed.
+ /* Part of path that contains the parent directory. */ + int parentPathLen = unloggedDelim - path; + + /* + * Build the unlogged relation hash if the parent path is either + * $PGDATA/base or a tablespace version path. + */ + if (strncmp(path, "./base", parentPathLen) == 0 || + (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) && + strncmp(unloggedDelim - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1), + TABLESPACE_VERSION_DIRECTORY, + sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0)) + unloggedHash = ResetUnloggedRelationsHash(path); + }How about using get_parent_directory() to get parent directory name?
get_parent_directory() munges the string that is passed to it which I
was trying to avoid (we'd need a copy) - and I don't think it makes the
rest of the logic any simpler without constructing yet another string to
hold the tablespace path.
Agreed.
I know performance isn't the most important thing here, so if the
argument is for clarity perhaps it makes sense. Otherwise I don't know
if it's worth it.Also, I think it's better to destroy the unloggedHash after use.
Whoops! Fixed.
+ /* Exclude all forks for unlogged tables except the init fork. */ + if (unloggedHash && ResetUnloggedRelationsMatch( + unloggedHash, de->d_name) == unloggedOther) + { + elog(DEBUG2, "unlogged relation file \"%s\" excluded from backup", + de->d_name); + continue; + }I think it's better to log this debug message at DEBUG2 level for
consistency with other messages.I think you mean DEBUG1? It's already at DEBUG2.
Oops, yes I meant DEBUG1.
I considered using DEBUG1 but decided against it. The other exclusions
will produce a limited amount of output because there are only a few of
them. In the case of unlogged tables there could be any number of
exclusions and I thought that was too noisy for DEBUG1.
IMO it's okay to output many unlogged tables for a debug purpose but I
see your point.
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath", + 'unlogged imain fork not in tablespace backup');s/imain/main/
Fixed.
If a new unlogged relation is created after constructed the
unloggedHash before sending file, we cannot exclude such relation. It
would not be problem if the taking backup is not long because the new
unlogged relation unlikely becomes so large. However, if takeing a
backup takes a long time, we could include large main fork in the
backup.This is a good point. It's per database directory which makes it a
little better, but maybe not by much.Three options here:
1) Leave it as is knowing that unlogged relations created during the
backup may be copied and document it that way.2) Construct a list for SendDir() to work against so the gap between
creating that and creating the unlogged hash is as small as possible.
The downside here is that the list may be very large and take up a lot
of memory.3) Check each file that looks like a relation in the loop to see if it
has an init fork. This might affect performance since an
opendir/readdir loop would be required for every relation.Personally, I'm in favor of #1, at least for the time being. I've
updated the docs as indicated in case you and Adam agree.
See below comment.
On Thu, Jan 25, 2018 at 6:23 AM, David Steele <david@pgmasters.net> wrote:
On 1/24/18 4:02 PM, Adam Brightwell wrote:
Actually, I was talking to Stephen about this it seems like #3 would be
more practical if we just stat'd the init fork for each relation file
found. I doubt the stat would add a lot of overhead and we can track
each unlogged relation in a hash table to reduce overhead even more.
Can the readdir handle files that are added during the loop? I think
that we still cannot exclude a new unlogged relation if the relation
is added after we execute readdir first time. To completely eliminate
it we need a sort of lock that prevents to create new unlogged
relation from current backends. Or we need to do readdir loop multiple
times to see if no new relations were added during sending files.
If you're updating the patch to implement #3, this patch should be
marked as "Waiting on Author". After updated I'll review it again.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On 1/25/18 12:31 AM, Masahiko Sawada wrote:
On Thu, Jan 25, 2018 at 3:25 AM, David Steele <david@pgmasters.net> wrote:
Here is the first review comments.
+ unloggedDelim = strrchr(path, '/');
I think it doesn't work fine on windows. How about using
last_dir_separator() instead?I think this function is OK on Windows -- we use it quite a bit.
However, last_dir_separator() is clearer so I have changed it.Thank you for updating this. I was concerned about a separator
character '/' might not work fine on windows.
Ah yes, I see what you mean now.
On Thu, Jan 25, 2018 at 6:23 AM, David Steele <david@pgmasters.net> wrote:
On 1/24/18 4:02 PM, Adam Brightwell wrote:
Actually, I was talking to Stephen about this it seems like #3 would be
more practical if we just stat'd the init fork for each relation file
found. I doubt the stat would add a lot of overhead and we can track
each unlogged relation in a hash table to reduce overhead even more.Can the readdir handle files that are added during the loop? I think
that we still cannot exclude a new unlogged relation if the relation
is added after we execute readdir first time. To completely eliminate
it we need a sort of lock that prevents to create new unlogged
relation from current backends. Or we need to do readdir loop multiple
times to see if no new relations were added during sending files.
As far as I know readdir() is platform-dependent in terms of how it
scans the dir and if files created after the opendir() will appear.
It shouldn't matter, though, since WAL replay will recreate those files.
If you're updating the patch to implement #3, this patch should be
marked as "Waiting on Author". After updated I'll review it again.
Attached is a new patch that uses stat() to determine if the init fork
for a relation file exists. I decided not to build a hash table as it
could use considerable memory and I didn't think it would be much faster
than a simple stat() call.
The reinit.c refactor has been removed since it was no longer needed.
I'll submit the tests I wrote for reinit.c as a separate patch for the
next CF.
Thanks,
--
-David
david@pgmasters.net
Attachments:
exclude-unlogged-v3.patchtext/plain; charset=UTF-8; name=exclude-unlogged-v3.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 4c5ed1e6d6..5854ec1533 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2552,6 +2552,12 @@ The commands accepted in walsender mode are:
with <filename>pgsql_tmp</filename>.
</para>
</listitem>
+ <listitem>
+ <para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+ </para>
+ </listitem>
<listitem>
<para>
<filename>pg_wal</filename>, including subdirectories. If the backup is run
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index dd7ad64862..688790ad0d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -26,6 +26,7 @@
#include "nodes/pg_list.h"
#include "pgtar.h"
#include "pgstat.h"
+#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
#include "replication/walsender.h"
@@ -33,6 +34,7 @@
#include "storage/dsm_impl.h"
#include "storage/fd.h"
#include "storage/ipc.h"
+#include "storage/reinit.h"
#include "utils/builtins.h"
#include "utils/elog.h"
#include "utils/ps_status.h"
@@ -959,12 +961,44 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
char pathbuf[MAXPGPATH * 2];
struct stat statbuf;
int64 size = 0;
+ const char *lastDir; /* Split last dir from parent path. */
+ bool isDbDir = false; /* Does this directory contain relations? */
+
+ /*
+ * Determine if the current path is a database directory that can
+ * contain relations.
+ *
+ * Start by finding the location of the delimiter between the parent
+ * path and the current path.
+ */
+ lastDir = last_dir_separator(path);
+
+ /* Does this path look like a database path (i.e. all digits)? */
+ if (lastDir != NULL &&
+ strspn(lastDir + 1, "0123456789") == strlen(lastDir + 1))
+ {
+ /* Part of path that contains the parent directory. */
+ int parentPathLen = lastDir - path;
+
+ /*
+ * Mark path as a database directory if the parent path is either
+ * $PGDATA/base or a tablespace version path.
+ */
+ if (strncmp(path, "./base", parentPathLen) == 0 ||
+ (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) &&
+ strncmp(lastDir - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+ TABLESPACE_VERSION_DIRECTORY,
+ sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0))
+ isDbDir = true;
+ }
dir = AllocateDir(path);
while ((de = ReadDir(dir, path)) != NULL)
{
int excludeIdx;
bool excludeFound;
+ ForkNumber relForkNum; /* Type of fork if file is a relation */
+ int relOidChars; /* Chars in filename that are the rel oid */
/* Skip special stuff */
if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
@@ -1008,6 +1042,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (excludeFound)
continue;
+ /* Exclude all forks for unlogged tables except the init fork */
+ if (isDbDir &&
+ parse_filename_for_nontemp_relation(de->d_name, &relOidChars,
+ &relForkNum))
+ {
+ /* Never exclude init forks */
+ if (relForkNum != INIT_FORKNUM)
+ {
+ char initForkFile[MAXPGPATH];
+ char relOid[OIDCHARS + 1];
+
+ /*
+ * If any other type of fork, check if there is an init fork
+ * with the same OID. If so, the file can be excluded.
+ */
+ strncpy(relOid, de->d_name, relOidChars);
+ snprintf(initForkFile, sizeof(initForkFile), "%s/%s_init",
+ path, relOid);
+
+ if (lstat(initForkFile, &statbuf) == 0)
+ {
+ elog(DEBUG2,
+ "unlogged relation file \"%s\" excluded from backup",
+ de->d_name);
+
+ continue;
+ }
+ }
+ }
+
snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
/* Skip pg_control here to back up it last */
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 92363ae6ad..7b8a1253c4 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -28,8 +28,6 @@ static void ResetUnloggedRelationsInTablespaceDir(const char *tsdirname,
int op);
static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
int op);
-static bool parse_filename_for_nontemp_relation(const char *name,
- int *oidchars, ForkNumber *fork);
typedef struct
{
@@ -373,7 +371,7 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
* portion of the filename. This is critical to protect against a possible
* buffer overrun.
*/
-static bool
+bool
parse_filename_for_nontemp_relation(const char *name, int *oidchars,
ForkNumber *fork)
{
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..455c7fca0d 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
use Config;
use PostgresNode;
use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 87;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -66,6 +66,16 @@ foreach my $filename (
# positive.
$node->safe_psql('postgres', 'SELECT 1;');
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none' ],
'pg_basebackup runs');
ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
@@ -96,6 +106,12 @@ foreach my $filename (
ok(!-f "$tempdir/backup/$filename", "$filename not copied");
}
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok(!-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
# Make sure existing backup_label was ignored.
isnt(slurp_file("$tempdir/backup/backup_label"),
'DONOTCOPY', 'existing backup_label not copied');
@@ -147,7 +163,7 @@ unlink "$pgdata/$superlongname";
# skip on Windows.
SKIP:
{
- skip "symlinks not supported on Windows", 11 if ($windows_os);
+ skip "symlinks not supported on Windows", 15 if ($windows_os);
# Move pg_replslot out of $pgdata and create a symlink to it.
$node->stop;
@@ -177,6 +193,19 @@ SKIP:
my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;');
+
+ my $tblspc1UnloggedPath = $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok(-f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath",
+ 'unlogged main fork in tablespace');
+
$node->command_fails(
[ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp' ],
'plain format with tablespaces fails without tablespace mapping');
@@ -195,11 +224,20 @@ SKIP:
"tablespace symlink was updated");
closedir $dh;
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) = $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
ok( -d "$tempdir/backup1/pg_replslot",
'pg_replslot symlink copied as directory');
mkdir "$tempdir/tbl=spc2";
$node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
$node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
$node->safe_psql('postgres',
"CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
diff --git a/src/include/storage/reinit.h b/src/include/storage/reinit.h
index addda2c0ea..5c34f88b95 100644
--- a/src/include/storage/reinit.h
+++ b/src/include/storage/reinit.h
@@ -16,6 +16,8 @@
#define REINIT_H
extern void ResetUnloggedRelations(int op);
+extern bool parse_filename_for_nontemp_relation(
+ const char *name, int *oidchars, ForkNumber *fork);
#define UNLOGGED_RELATION_CLEANUP 0x0001
#define UNLOGGED_RELATION_INIT 0x0002
On Wed, Jan 24, 2018 at 1:25 PM, David Steele <david@pgmasters.net> wrote:
I think you mean DEBUG1? It's already at DEBUG2.
I considered using DEBUG1 but decided against it. The other exclusions
will produce a limited amount of output because there are only a few of
them. In the case of unlogged tables there could be any number of
exclusions and I thought that was too noisy for DEBUG1.
+1. Even DEBUG2 seems pretty chatty for a message that just tells you
that something is working in an entirely expected fashion; consider
DEBUG3. Fortunately, base backups are not so common that this should
cause enormous log spam either way, but keeping the amount of debug
output down to a reasonable level is an important goal. Before
a43f1939d5dcd02f4df1604a68392332168e4be0, it wasn't really practical
to run a production server with log_min_messages lower than DEBUG2,
because you'd get so much log spam it would cause performance problems
(and maybe fill up the disk).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jan 26, 2018 at 4:58 AM, David Steele <david@pgmasters.net> wrote:
On 1/25/18 12:31 AM, Masahiko Sawada wrote:
On Thu, Jan 25, 2018 at 3:25 AM, David Steele <david@pgmasters.net> wrote:
Here is the first review comments.
+ unloggedDelim = strrchr(path, '/');
I think it doesn't work fine on windows. How about using
last_dir_separator() instead?I think this function is OK on Windows -- we use it quite a bit.
However, last_dir_separator() is clearer so I have changed it.Thank you for updating this. I was concerned about a separator
character '/' might not work fine on windows.Ah yes, I see what you mean now.
On Thu, Jan 25, 2018 at 6:23 AM, David Steele <david@pgmasters.net> wrote:
On 1/24/18 4:02 PM, Adam Brightwell wrote:
Actually, I was talking to Stephen about this it seems like #3 would be
more practical if we just stat'd the init fork for each relation file
found. I doubt the stat would add a lot of overhead and we can track
each unlogged relation in a hash table to reduce overhead even more.Can the readdir handle files that are added during the loop? I think
that we still cannot exclude a new unlogged relation if the relation
is added after we execute readdir first time. To completely eliminate
it we need a sort of lock that prevents to create new unlogged
relation from current backends. Or we need to do readdir loop multiple
times to see if no new relations were added during sending files.As far as I know readdir() is platform-dependent in terms of how it
scans the dir and if files created after the opendir() will appear.It shouldn't matter, though, since WAL replay will recreate those files.
Yea, agreed.
If you're updating the patch to implement #3, this patch should be
marked as "Waiting on Author". After updated I'll review it again.Attached is a new patch that uses stat() to determine if the init fork
for a relation file exists. I decided not to build a hash table as it
could use considerable memory and I didn't think it would be much faster
than a simple stat() call.The reinit.c refactor has been removed since it was no longer needed.
I'll submit the tests I wrote for reinit.c as a separate patch for the
next CF.
Thank you for updating the patch! The patch looks good to me. But I
have a question; can we exclude temp tables as well? The pg_basebackup
includes even temp tables. But I don't think that it's necessary for
backups.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Mon, Jan 29, 2018 at 07:28:22PM +0900, Masahiko Sawada wrote:
Thank you for updating the patch! The patch looks good to me. But I
have a question; can we exclude temp tables as well? The pg_basebackup
includes even temp tables. But I don't think that it's necessary for
backups.
They are not needed in base backups. Note that RemovePgTempFiles() does
not remove temporary relfilenodes after a crash per the comments on its
top. I have not looked at the patch in details, but if you finish by
not including those files in what's proposed there is much refactoring
possible.
--
Michael
On 1/29/18 5:28 AM, Masahiko Sawada wrote:
On Fri, Jan 26, 2018 at 4:58 AM, David Steele <david@pgmasters.net> wrote:
Attached is a new patch that uses stat() to determine if the init fork
for a relation file exists. I decided not to build a hash table as it
could use considerable memory and I didn't think it would be much faster
than a simple stat() call.The reinit.c refactor has been removed since it was no longer needed.
I'll submit the tests I wrote for reinit.c as a separate patch for the
next CF.Thank you for updating the patch! The patch looks good to me. But I
have a question; can we exclude temp tables as well? The pg_basebackup
includes even temp tables. But I don't think that it's necessary for
backups
Thank you for having another look at the patch.
Temp tables should be excluded by this code which is already in
basebackup.c:
/* Skip temporary files */
if (strncmp(de->d_name,
PG_TEMP_FILE_PREFIX,
strlen(PG_TEMP_FILE_PREFIX)) == 0)
continue;
This looks right to me.
Thanks,
--
-David
david@pgmasters.net
On 1/29/18 9:13 AM, David Steele wrote:
On 1/29/18 5:28 AM, Masahiko Sawada wrote:
But I
have a question; can we exclude temp tables as well? The pg_basebackup
includes even temp tables. But I don't think that it's necessary for
backupsThank you for having another look at the patch.
Temp tables should be excluded by this code which is already in
basebackup.c:/* Skip temporary files */
if (strncmp(de->d_name,
PG_TEMP_FILE_PREFIX,
strlen(PG_TEMP_FILE_PREFIX)) == 0)
continue;This looks right to me.
Whoops, my bad. Temp relations are stored in the db directories with a
"t" prefix. Looks like we can take care of those easily enough but I
think it should be a separate patch.
I'll plan to submit that for CF 2018-03.
Thanks!
--
-David
david@pgmasters.net
On Mon, Jan 29, 2018 at 1:17 PM, David Steele <david@pgmasters.net> wrote:
On 1/29/18 9:13 AM, David Steele wrote:
On 1/29/18 5:28 AM, Masahiko Sawada wrote:
But I
have a question; can we exclude temp tables as well? The pg_basebackup
includes even temp tables. But I don't think that it's necessary for
backupsThank you for having another look at the patch.
Temp tables should be excluded by this code which is already in
basebackup.c:/* Skip temporary files */
if (strncmp(de->d_name,
PG_TEMP_FILE_PREFIX,
strlen(PG_TEMP_FILE_PREFIX)) == 0)
continue;This looks right to me.
Whoops, my bad. Temp relations are stored in the db directories with a
"t" prefix. Looks like we can take care of those easily enough but I
think it should be a separate patch.I'll plan to submit that for CF 2018-03.
I agree, I believe this should be a separate patch.
As for the latest patch above, I have reviewed, applied and tested it.
It looks good to me. As well, it applies cleanly against master at
(97d4445a03). All tests passed when running 'check-world'.
If it is agreed that the temp file exclusion should be submitted as a
separate patch, then I will mark 'ready for committer'.
-Adam
On Tue, Jan 30, 2018 at 5:45 AM, Adam Brightwell
<adam.brightwell@crunchydata.com> wrote:
On Mon, Jan 29, 2018 at 1:17 PM, David Steele <david@pgmasters.net> wrote:
On 1/29/18 9:13 AM, David Steele wrote:
On 1/29/18 5:28 AM, Masahiko Sawada wrote:
But I
have a question; can we exclude temp tables as well? The pg_basebackup
includes even temp tables. But I don't think that it's necessary for
backupsThank you for having another look at the patch.
Temp tables should be excluded by this code which is already in
basebackup.c:/* Skip temporary files */
if (strncmp(de->d_name,
PG_TEMP_FILE_PREFIX,
strlen(PG_TEMP_FILE_PREFIX)) == 0)
continue;This looks right to me.
Whoops, my bad. Temp relations are stored in the db directories with a
"t" prefix. Looks like we can take care of those easily enough but I
think it should be a separate patch.I'll plan to submit that for CF 2018-03.
+1
I agree, I believe this should be a separate patch.
As for the latest patch above, I have reviewed, applied and tested it.
It looks good to me. As well, it applies cleanly against master at
(97d4445a03). All tests passed when running 'check-world'.If it is agreed that the temp file exclusion should be submitted as a
separate patch, then I will mark 'ready for committer'.
Agreed, please mark this patch as "Ready for Committer".
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On 1/29/18 8:10 PM, Masahiko Sawada wrote:
On Tue, Jan 30, 2018 at 5:45 AM, Adam Brightwell
<adam.brightwell@crunchydata.com> wrote:On Mon, Jan 29, 2018 at 1:17 PM, David Steele <david@pgmasters.net> wrote:
Whoops, my bad. Temp relations are stored in the db directories with a
"t" prefix. Looks like we can take care of those easily enough but I
think it should be a separate patch.I'll plan to submit that for CF 2018-03.
+1
I agree, I believe this should be a separate patch.
As for the latest patch above, I have reviewed, applied and tested it.
It looks good to me. As well, it applies cleanly against master at
(97d4445a03). All tests passed when running 'check-world'.If it is agreed that the temp file exclusion should be submitted as a
separate patch, then I will mark 'ready for committer'.Agreed, please mark this patch as "Ready for Committer".
I marked it just in case some enterprising committer from another time
zone swoops in and picks it up. Fingers crossed!
--
-David
david@pgmasters.net
On 1/29/18 8:10 PM, Masahiko Sawada wrote:
On Tue, Jan 30, 2018 at 5:45 AM, Adam Brightwell
If it is agreed that the temp file exclusion should be submitted as a
separate patch, then I will mark 'ready for committer'.Agreed, please mark this patch as "Ready for Committer".
Attached is a rebased patch that applies cleanly.
Thanks,
--
-David
david@pgmasters.net
Attachments:
exclude-unlogged-v4.patchtext/plain; charset=UTF-8; name=exclude-unlogged-v4.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3cec9e0b0c..a46c857e48 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2552,6 +2552,12 @@ The commands accepted in walsender mode are:
with <filename>pgsql_tmp</filename>.
</para>
</listitem>
+ <listitem>
+ <para>
+ Unlogged relations, except for the init fork which is required to
+ recreate the (empty) unlogged relation on recovery.
+ </para>
+ </listitem>
<listitem>
<para>
<filename>pg_wal</filename>, including subdirectories. If the backup is run
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 185f32a5f9..eb6eb7206d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -26,6 +26,7 @@
#include "nodes/pg_list.h"
#include "pgtar.h"
#include "pgstat.h"
+#include "port.h"
#include "postmaster/syslogger.h"
#include "replication/basebackup.h"
#include "replication/walsender.h"
@@ -33,6 +34,7 @@
#include "storage/dsm_impl.h"
#include "storage/fd.h"
#include "storage/ipc.h"
+#include "storage/reinit.h"
#include "utils/builtins.h"
#include "utils/ps_status.h"
#include "utils/relcache.h"
@@ -958,12 +960,44 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
char pathbuf[MAXPGPATH * 2];
struct stat statbuf;
int64 size = 0;
+ const char *lastDir; /* Split last dir from parent path. */
+ bool isDbDir = false; /* Does this directory contain relations? */
+
+ /*
+ * Determine if the current path is a database directory that can
+ * contain relations.
+ *
+ * Start by finding the location of the delimiter between the parent
+ * path and the current path.
+ */
+ lastDir = last_dir_separator(path);
+
+ /* Does this path look like a database path (i.e. all digits)? */
+ if (lastDir != NULL &&
+ strspn(lastDir + 1, "0123456789") == strlen(lastDir + 1))
+ {
+ /* Part of path that contains the parent directory. */
+ int parentPathLen = lastDir - path;
+
+ /*
+ * Mark path as a database directory if the parent path is either
+ * $PGDATA/base or a tablespace version path.
+ */
+ if (strncmp(path, "./base", parentPathLen) == 0 ||
+ (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) &&
+ strncmp(lastDir - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+ TABLESPACE_VERSION_DIRECTORY,
+ sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0))
+ isDbDir = true;
+ }
dir = AllocateDir(path);
while ((de = ReadDir(dir, path)) != NULL)
{
int excludeIdx;
bool excludeFound;
+ ForkNumber relForkNum; /* Type of fork if file is a relation */
+ int relOidChars; /* Chars in filename that are the rel oid */
/* Skip special stuff */
if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0)
@@ -1007,6 +1041,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (excludeFound)
continue;
+ /* Exclude all forks for unlogged tables except the init fork */
+ if (isDbDir &&
+ parse_filename_for_nontemp_relation(de->d_name, &relOidChars,
+ &relForkNum))
+ {
+ /* Never exclude init forks */
+ if (relForkNum != INIT_FORKNUM)
+ {
+ char initForkFile[MAXPGPATH];
+ char relOid[OIDCHARS + 1];
+
+ /*
+ * If any other type of fork, check if there is an init fork
+ * with the same OID. If so, the file can be excluded.
+ */
+ strncpy(relOid, de->d_name, relOidChars);
+ snprintf(initForkFile, sizeof(initForkFile), "%s/%s_init",
+ path, relOid);
+
+ if (lstat(initForkFile, &statbuf) == 0)
+ {
+ elog(DEBUG2,
+ "unlogged relation file \"%s\" excluded from backup",
+ de->d_name);
+
+ continue;
+ }
+ }
+ }
+
snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
/* Skip pg_control here to back up it last */
diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 92363ae6ad..7b8a1253c4 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -28,8 +28,6 @@ static void ResetUnloggedRelationsInTablespaceDir(const char *tsdirname,
int op);
static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
int op);
-static bool parse_filename_for_nontemp_relation(const char *name,
- int *oidchars, ForkNumber *fork);
typedef struct
{
@@ -373,7 +371,7 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
* portion of the filename. This is critical to protect against a possible
* buffer overrun.
*/
-static bool
+bool
parse_filename_for_nontemp_relation(const char *name, int *oidchars,
ForkNumber *fork)
{
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..455c7fca0d 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
use Config;
use PostgresNode;
use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 87;
program_help_ok('pg_basebackup');
program_version_ok('pg_basebackup');
@@ -66,6 +66,16 @@ foreach my $filename (
# positive.
$node->safe_psql('postgres', 'SELECT 1;');
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+ q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'unlogged main fork in base');
+
$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none' ],
'pg_basebackup runs');
ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
@@ -96,6 +106,12 @@ foreach my $filename (
ok(!-f "$tempdir/backup/$filename", "$filename not copied");
}
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+ 'unlogged init fork in backup');
+ok(!-f "$tempdir/backup/$baseUnloggedPath",
+ 'unlogged main fork not in backup');
+
# Make sure existing backup_label was ignored.
isnt(slurp_file("$tempdir/backup/backup_label"),
'DONOTCOPY', 'existing backup_label not copied');
@@ -147,7 +163,7 @@ unlink "$pgdata/$superlongname";
# skip on Windows.
SKIP:
{
- skip "symlinks not supported on Windows", 11 if ($windows_os);
+ skip "symlinks not supported on Windows", 15 if ($windows_os);
# Move pg_replslot out of $pgdata and create a symlink to it.
$node->stop;
@@ -177,6 +193,19 @@ SKIP:
my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+ # Create an unlogged table to test that forks other than init are not copied.
+ $node->safe_psql('postgres',
+ 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;');
+
+ my $tblspc1UnloggedPath = $node->safe_psql(
+ 'postgres', q{select pg_relation_filepath('tblspc1_unlogged')});
+
+ # Make sure main and init forks exist
+ ok(-f "$pgdata/${tblspc1UnloggedPath}_init",
+ 'unlogged init fork in tablespace');
+ ok(-f "$pgdata/$tblspc1UnloggedPath",
+ 'unlogged main fork in tablespace');
+
$node->command_fails(
[ 'pg_basebackup', '-D', "$tempdir/backup1", '-Fp' ],
'plain format with tablespaces fails without tablespace mapping');
@@ -195,11 +224,20 @@ SKIP:
"tablespace symlink was updated");
closedir $dh;
+ # Unlogged relation forks other than init should not be copied
+ my ($tblspc1UnloggedBackupPath) = $tblspc1UnloggedPath =~ /[^\/]*\/[^\/]*\/[^\/]*$/g;
+
+ ok(-f "$tempdir/tbackup/tblspc1/${tblspc1UnloggedBackupPath}_init",
+ 'unlogged init fork in tablespace backup');
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged main fork not in tablespace backup');
+
ok( -d "$tempdir/backup1/pg_replslot",
'pg_replslot symlink copied as directory');
mkdir "$tempdir/tbl=spc2";
$node->safe_psql('postgres', "DROP TABLE test1;");
+ $node->safe_psql('postgres', "DROP TABLE tblspc1_unlogged;");
$node->safe_psql('postgres', "DROP TABLESPACE tblspc1;");
$node->safe_psql('postgres',
"CREATE TABLESPACE tblspc2 LOCATION '$shorter_tempdir/tbl=spc2';");
diff --git a/src/include/storage/reinit.h b/src/include/storage/reinit.h
index addda2c0ea..5c34f88b95 100644
--- a/src/include/storage/reinit.h
+++ b/src/include/storage/reinit.h
@@ -16,6 +16,8 @@
#define REINIT_H
extern void ResetUnloggedRelations(int op);
+extern bool parse_filename_for_nontemp_relation(
+ const char *name, int *oidchars, ForkNumber *fork);
#define UNLOGGED_RELATION_CLEANUP 0x0001
#define UNLOGGED_RELATION_INIT 0x0002
Thank you, pushed
David Steele wrote:
On 1/29/18 8:10 PM, Masahiko Sawada wrote:
On Tue, Jan 30, 2018 at 5:45 AM, Adam Brightwell
If it is agreed that the temp file exclusion should be submitted as a
separate patch, then I will mark 'ready for committer'.Agreed, please mark this patch as "Ready for Committer".
Attached is a rebased patch that applies cleanly.
Thanks,
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On 3/23/18 12:14 PM, Teodor Sigaev wrote:
Thank you, pushed
Thank you, Teodor! I'll rebase the temp table exclusion patch and
provide an updated patch soon.
--
-David
david@pgmasters.net
On Fri, Mar 23, 2018 at 9:51 PM, David Steele <david@pgmasters.net> wrote:
On 3/23/18 12:14 PM, Teodor Sigaev wrote:
Thank you, pushed
Is it just me or the newly added test in 010_pg_basebackup.pl failing for
others too?
# Failed test 'unlogged main fork not in backup'
# at t/010_pg_basebackup.pl line 112.
t/010_pg_basebackup.pl ... 86/87 # Looks like you failed 1 test of 87.
I manually ran pg_basebackup and it correctly excludes the main fork on an
unlogged table from the backup, but it consistently copies the main fork
while running "make check" and thus fails the test.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 26, 2018 at 1:03 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:
On Fri, Mar 23, 2018 at 9:51 PM, David Steele <david@pgmasters.net> wrote:
On 3/23/18 12:14 PM, Teodor Sigaev wrote:
Thank you, pushed
Is it just me or the newly added test in 010_pg_basebackup.pl failing for
others too?# Failed test 'unlogged main fork not in backup'
# at t/010_pg_basebackup.pl line 112.
t/010_pg_basebackup.pl ... 86/87 # Looks like you failed 1 test of 87.
This one-liner patch fixes it for me.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001_pg_basebackup_fix.patchapplication/octet-stream; name=0001_pg_basebackup_fix.patchDownload
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index eb6eb7206d..b81ee1be0c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1056,7 +1056,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
* If any other type of fork, check if there is an init fork
* with the same OID. If so, the file can be excluded.
*/
- strncpy(relOid, de->d_name, relOidChars);
+ strncpy(relOid, de->d_name, relOidChars + 1);
snprintf(initForkFile, sizeof(initForkFile), "%s/%s_init",
path, relOid);
On Mon, Mar 26, 2018 at 4:52 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
On Mon, Mar 26, 2018 at 1:03 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:On Fri, Mar 23, 2018 at 9:51 PM, David Steele <david@pgmasters.net> wrote:
On 3/23/18 12:14 PM, Teodor Sigaev wrote:
Thank you, pushed
Is it just me or the newly added test in 010_pg_basebackup.pl failing for
others too?# Failed test 'unlogged main fork not in backup'
# at t/010_pg_basebackup.pl line 112.
t/010_pg_basebackup.pl ... 86/87 # Looks like you failed 1 test of 87.This one-liner patch fixes it for me.
Isn't this issue already fixed by commit
d0c0c894533f906b13b79813f02b2982ac675074?
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Mon, Mar 26, 2018 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
On Mon, Mar 26, 2018 at 4:52 PM, Pavan Deolasee
This one-liner patch fixes it for me.
Isn't this issue already fixed by commit
d0c0c894533f906b13b79813f02b2982ac675074?
Ah, right. Thanks for pointing out and sorry for the noise.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services