[PATCH]make pg_rewind to not copy useless WAL files
Hi all,
Currently, pg_rewind copies all WAL files from the source server, whether or not they are needed.
In some circumstances, will bring a lot of unnecessary network and disk IO consumption, and also increase the execution time of pg_rewind.
Such as when wal_keep_segments or max_wal_size is large.
According to pg_rewind's processing logic, only need to copy the WAL after the divergence from the source server.
The WAL before the divergence must already exists on the target server.
Also, there is no need to copy WALs that have been recovered.
This patch optimizes the above mentioned issues, as follows:
1. In the target data directory, do not delete the WAL files before the divergence.
2. When copying files from the source server, do not copy the WAL files before the divergence and the WAL files after the current WAL insert localtion.
Note:
The "current WAL insert localtion" above is obtained before copying data files. If a runing PostgreSQL server is used as the source server, the newly generated WAL files during pg_rewind running will not be copied to
the target data directory.
However, in this case the target server is typically used as a standby of the source server after pg_rewind is executed, so these WAL files will be copied via streaming replication later.
--
Best regards
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce.patchDownload
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 169,191 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, "pg_wal", 6) == 0 && IsXLogFileName(path + 7) &&
+ (strcmp(path + 7 + 8, divergence_wal_filename + 8) < 0 ||
+ strcmp(path + 7 + 8, last_source_wal_filename + 8) > 0))
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 315,326 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists &&
! !(strncmp(path, "pg_wal", 6) == 0 && IsXLogFileName(path + 7) &&
! strcmp(path + 7 + 8, divergence_wal_filename + 8) < 0))
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 58,63 **** bool dry_run = false;
--- 59,68 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ char divergence_wal_filename[MAXFNAMELEN];
+ char last_source_wal_filename[MAXFNAMELEN];
+
static void
usage(const char *progname)
{
***************
*** 103,108 **** main(int argc, char **argv)
--- 108,115 ----
XLogRecPtr endrec;
TimeLineID endtli;
ControlFileData ControlFile_new;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_rewind"));
progname = get_progname(argv[0]);
***************
*** 280,285 **** main(int argc, char **argv)
--- 287,317 ----
chkpttli);
/*
+ * Save the WAL filenames of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ * Note:The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ endtli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ endtli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+ }
+
+ XLByteToSeg(divergerec, startsegno);
+ XLogFileName(divergence_wal_filename, targetHistory[lastcommontliIndex].tli, startsegno);
+ XLByteToPrevSeg(endrec, endsegno);
+ XLogFileName(last_source_wal_filename, endtli, endsegno);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 29,34 **** extern bool dry_run;
--- 29,38 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern char divergence_wal_filename[];
+ extern char last_source_wal_filename[];
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
Hi!
On Sat, Sep 16, 2017 at 5:56 PM, chenhj <chjischj@163.com> wrote:
This patch optimizes the above mentioned issues, as follows:
1. In the target data directory, do not delete the WAL files before the
divergence.
2. When copying files from the source server, do not copy the WAL files
before the divergence and the WAL files after the current WAL insert
localtion.
Looks like cool optimization for me. Please, add this patch to the next
commitfest.
Do you think this patch should modify pg_rewind tap tests too? It would be
nice to make WAL files fetching more covered by tap tests. In particular,
new tests may generate more WAL files and make sure that pg_rewind fetches
only required files among them.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Sun, Sep 17, 2017 at 3:19 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
Hi!
On Sat, Sep 16, 2017 at 5:56 PM, chenhj <chjischj@163.com> wrote:
This patch optimizes the above mentioned issues, as follows:
1. In the target data directory, do not delete the WAL files before the
divergence.
2. When copying files from the source server, do not copy the WAL files
before the divergence and the WAL files after the current WAL insert
localtion.Looks like cool optimization for me. Please, add this patch to the next
commitfest.
Agreed.
Do you think this patch should modify pg_rewind tap tests too? It would be
nice to make WAL files fetching more covered by tap tests. In particular,
new tests may generate more WAL files and make sure that pg_rewind fetches
only required files among them.
This looks mandatory to me. Using pg_switch_wal() and a minimum amount
of WAL generated you could just make the set of WAL segments skipped
minimal data.
I have not checked in details, but I think that the positions where
you are applying the filters are using the right approach.
! !(strncmp(path, "pg_wal", 6) == 0 && IsXLogFileName(path + 7) &&
Please use XLOGDIR here.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At 2017-09-17 08:33:33, "Michael Paquier" <michael.paquier@gmail.com> wrote:
On Sun, Sep 17, 2017 at 3:19 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:Hi!
On Sat, Sep 16, 2017 at 5:56 PM, chenhj <chjischj@163.com> wrote:
This patch optimizes the above mentioned issues, as follows:
1. In the target data directory, do not delete the WAL files before the
divergence.
2. When copying files from the source server, do not copy the WAL files
before the divergence and the WAL files after the current WAL insert
localtion.Looks like cool optimization for me. Please, add this patch to the next
commitfest.Agreed.
Do you think this patch should modify pg_rewind tap tests too? It would be
nice to make WAL files fetching more covered by tap tests. In particular,
new tests may generate more WAL files and make sure that pg_rewind fetches
only required files among them.This looks mandatory to me. Using pg_switch_wal() and a minimum amount
of WAL generated you could just make the set of WAL segments skipped
minimal data.I have not checked in details, but I think that the positions where
you are applying the filters are using the right approach.! !(strncmp(path, "pg_wal", 6) == 0 && IsXLogFileName(path + 7) &&
Please use XLOGDIR here.
--
Michael
Thanks, I will use XLOGDIR and add TAP tests later.
--
Chen Huajun
Hi
This is the new pacth with TAP test and use Macro XLOGDIR.
And i had add this patch to the commitfest,
https://commitfest.postgresql.org/15/1302/
--
Best Regards,
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce_v2.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce_v2.patchDownload
*** a/src/bin/pg_rewind/RewindTest.pm
--- b/src/bin/pg_rewind/RewindTest.pm
***************
*** 52,57 **** our @EXPORT = qw(
--- 52,60 ----
standby_psql
check_query
+ master_query
+ standby_query
+
setup_cluster
start_master
create_standby
***************
*** 112,117 **** sub check_query
--- 115,163 ----
}
}
+ sub master_query
+ {
+ my $query = shift;
+
+ return run_query($node_master->connstr('postgres'), $query);
+ }
+
+ sub standby_query
+ {
+ my $query = shift;
+
+ return run_query($node_standby->connstr('postgres'), $query);
+ }
+
+ # Run a query, and return the output
+ sub run_query
+ {
+ my ($connstr, $query) = @_;
+ my ($stdout, $stderr);
+
+ # we want just the output, no formatting
+ my $result = run [
+ 'psql', '-q', '-A', '-t', '--no-psqlrc', '-d',
+ $connstr,
+ '-c', $query ],
+ '>', \$stdout, '2>', \$stderr;
+
+ if (!$result)
+ {
+ BAIL_OUT("psql failed: exit code = $result");
+ }
+ elsif ($stderr ne '')
+ {
+ BAIL_OUT("psql failed: $stderr");
+ }
+ else
+ {
+ $stdout =~ s/\n//g ;
+ $stdout =~ s/\r//g if $Config{osname} eq 'msys';
+ }
+ return $stdout;
+ }
+
sub setup_cluster
{
my $extra_name = shift;
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 169,192 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
+ IsXLogFileName(path + strlen(XLOGDIR"/")) &&
+ (strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0 ||
+ strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) > 0))
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 316,328 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists &&
! !(strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
! IsXLogFileName(path + strlen(XLOGDIR"/")) &&
! strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0))
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 58,63 **** bool dry_run = false;
--- 59,68 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ char divergence_wal_filename[MAXFNAMELEN];
+ char last_source_wal_filename[MAXFNAMELEN];
+
static void
usage(const char *progname)
{
***************
*** 103,108 **** main(int argc, char **argv)
--- 108,115 ----
XLogRecPtr endrec;
TimeLineID endtli;
ControlFileData ControlFile_new;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_rewind"));
progname = get_progname(argv[0]);
***************
*** 280,285 **** main(int argc, char **argv)
--- 287,317 ----
chkpttli);
/*
+ * Save the WAL filenames of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ * Note:The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ endtli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ endtli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+ }
+
+ XLByteToSeg(divergerec, startsegno);
+ XLogFileName(divergence_wal_filename, targetHistory[lastcommontliIndex].tli, startsegno);
+ XLByteToPrevSeg(endrec, endsegno);
+ XLogFileName(last_source_wal_filename, endtli, endsegno);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 29,34 **** extern bool dry_run;
--- 29,38 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern char divergence_wal_filename[];
+ extern char last_source_wal_filename[];
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
*** /dev/null
--- b/src/bin/pg_rewind/t/006_wal_copy.pl
***************
*** 0 ****
--- 1,124 ----
+ #
+ # Test pg_rewind only copy needed WALs from the source.
+ #
+ use strict;
+ use warnings;
+ use TestLib;
+ use Test::More tests => 12;
+
+ use RewindTest;
+
+ sub run_test
+ {
+ my $test_mode = shift;
+
+ RewindTest::setup_cluster($test_mode);
+ RewindTest::start_master();
+
+ # Setup parameter for WAL reclaim
+ master_psql("ALTER SYSTEM SET checkpoint_timeout = '1d'");
+ master_psql("ALTER SYSTEM SET min_wal_size = '80MB'");
+ master_psql("ALTER SYSTEM SET wal_keep_segments = 4");
+ master_psql("SELECT pg_reload_conf()");
+
+ RewindTest::create_standby($test_mode);
+
+ # Create a test table and insert rows in master.
+ master_psql("CREATE TABLE tbl1 (d text)");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 1')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 2')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 3')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 4')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 5, checkpoint')");
+ master_psql("CHECKPOINT");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, before promotion')");
+
+ # Promote standby
+ my $master_divergence_wal = master_query("SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ RewindTest::promote_standby();
+
+ # Insert rows in master after promotion
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, after promotion')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 7, after promotion')");
+
+ master_psql("CHECKPOINT");
+
+ # Insert rows in standby after promotion
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 6, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 7, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 8, after promotion')");
+
+ standby_psql("CHECKPOINT");
+
+ # Check WALs before pg_rewind
+ master_psql("SELECT * from pg_ls_waldir()");
+ print("master_divergence_wal: $master_divergence_wal\n");
+ my $master_wal_count_before_divergence = master_query("SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal'");
+ ok( $master_wal_count_before_divergence > 0, 'master_wal_count_before_divergence > 0');
+
+ standby_psql("SELECT * from pg_ls_waldir()");
+ my $standby_current_wal = standby_query("SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ print("standby_current_wal: $standby_current_wal\n");
+ my $standby_reclaimed_wal_count = standby_query("SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal'");
+ ok( $standby_reclaimed_wal_count > 0, 'standby_reclaimed_wal_count > 0');
+
+ # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one seconds
+ sleep(1);
+ my $pg_rewind_time = master_query("SELECT now()");
+ print("pg_rewind_time: $pg_rewind_time\n");
+
+ # Run pg_rewind and check
+ RewindTest::run_pg_rewind($test_mode);
+
+ master_psql("SELECT * from pg_ls_waldir()");
+
+ check_query(
+ 'SELECT * FROM tbl1',
+ qq(in master, wal 1
+ in master, wal 2
+ in master, wal 3
+ in master, wal 4
+ in master, wal 5, checkpoint
+ in master, wal 6, before promotion
+ in standby, wal 6, after promotion
+ in standby, wal 7, after promotion
+ in standby, wal 8, after promotion
+ ),
+ 'table content');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal' AND modification < '$pg_rewind_time' ",
+ qq($master_wal_count_before_divergence
+ ),
+ 'reserve master wals before divergence');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal' AND modification >= '$pg_rewind_time' ",
+ qq(0
+ ),
+ 'do not copy reclaimed wals from standby(new master)');
+
+ RewindTest::clean_rewind_test();
+ }
+
+ # Run the test in both modes
+ run_test('local');
+ run_test('remote');
+
+ exit(0);
On Fri, Sep 22, 2017 at 7:16 PM, chenhj <chjischj@163.com> wrote:
This is the new pacth with TAP test and use Macro XLOGDIR.
Good. I took a quick look over the patch.
Why do you need master_query(), standby_query() and run_query() in
RewindTest.pm?
You can do just $node_master->safe_psql() and $node_slave->safe_psql()
instead.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-09-23 01:59:0, "Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Fri, Sep 22, 2017 at 7:16 PM, chenhj <chjischj@163.com> wrote:
This is the new pacth with TAP test and use Macro XLOGDIR.
Good. I took a quick look over the patch.
Why do you need master_query(), standby_query() and run_query() in RewindTest.pm?
You can do just $node_master->safe_psql() and $node_slave->safe_psql() instead.
Ooh, i did not notice that function.Thank you for your advice!
---
Regards,
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce_v3.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce_v3.patchDownload
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 169,192 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
+ IsXLogFileName(path + strlen(XLOGDIR"/")) &&
+ (strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0 ||
+ strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) > 0))
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 316,328 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists &&
! !(strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
! IsXLogFileName(path + strlen(XLOGDIR"/")) &&
! strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0))
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 58,63 **** bool dry_run = false;
--- 59,68 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ char divergence_wal_filename[MAXFNAMELEN];
+ char last_source_wal_filename[MAXFNAMELEN];
+
static void
usage(const char *progname)
{
***************
*** 103,108 **** main(int argc, char **argv)
--- 108,115 ----
XLogRecPtr endrec;
TimeLineID endtli;
ControlFileData ControlFile_new;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_rewind"));
progname = get_progname(argv[0]);
***************
*** 280,285 **** main(int argc, char **argv)
--- 287,317 ----
chkpttli);
/*
+ * Save the WAL filenames of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ * Note:The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ endtli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ endtli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+ }
+
+ XLByteToSeg(divergerec, startsegno);
+ XLogFileName(divergence_wal_filename, targetHistory[lastcommontliIndex].tli, startsegno);
+ XLByteToPrevSeg(endrec, endsegno);
+ XLogFileName(last_source_wal_filename, endtli, endsegno);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 29,34 **** extern bool dry_run;
--- 29,38 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern char divergence_wal_filename[];
+ extern char last_source_wal_filename[];
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
*** /dev/null
--- b/src/bin/pg_rewind/t/006_wal_copy.pl
***************
*** 0 ****
--- 1,124 ----
+ #
+ # Test pg_rewind only copy needed WALs from the source.
+ #
+ use strict;
+ use warnings;
+ use TestLib;
+ use Test::More tests => 12;
+
+ use RewindTest;
+
+ sub run_test
+ {
+ my $test_mode = shift;
+
+ RewindTest::setup_cluster($test_mode);
+ RewindTest::start_master();
+
+ # Setup parameter for WAL reclaim
+ master_psql("ALTER SYSTEM SET checkpoint_timeout = '1d'");
+ master_psql("ALTER SYSTEM SET min_wal_size = '80MB'");
+ master_psql("ALTER SYSTEM SET wal_keep_segments = 4");
+ master_psql("SELECT pg_reload_conf()");
+
+ RewindTest::create_standby($test_mode);
+
+ # Create a test table and insert rows in master.
+ master_psql("CREATE TABLE tbl1 (d text)");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 1')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 2')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 3')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 4')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 5, checkpoint')");
+ master_psql("CHECKPOINT");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, before promotion')");
+
+ # Promote standby
+ my $master_divergence_wal = $node_master->safe_psql("postgres", "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ RewindTest::promote_standby();
+
+ # Insert rows in master after promotion
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, after promotion')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 7, after promotion')");
+
+ master_psql("CHECKPOINT");
+
+ # Insert rows in standby after promotion
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 6, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 7, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 8, after promotion')");
+
+ standby_psql("CHECKPOINT");
+
+ # Check WALs before pg_rewind
+ master_psql("SELECT * from pg_ls_waldir()");
+ print("master_divergence_wal: $master_divergence_wal\n");
+ my $master_wal_count_before_divergence = $node_master->safe_psql("postgres", "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal'");
+ ok( $master_wal_count_before_divergence > 0, 'master_wal_count_before_divergence > 0');
+
+ standby_psql("SELECT * from pg_ls_waldir()");
+ my $standby_current_wal = $node_standby->safe_psql("postgres", "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ print("standby_current_wal: $standby_current_wal\n");
+ my $standby_reclaimed_wal_count = $node_standby->safe_psql("postgres", "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal'");
+ ok( $standby_reclaimed_wal_count > 0, 'standby_reclaimed_wal_count > 0');
+
+ # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one seconds
+ sleep(1);
+ my $pg_rewind_time = $node_master->safe_psql("postgres", "SELECT now()");
+ print("pg_rewind_time: $pg_rewind_time\n");
+
+ # Run pg_rewind and check
+ RewindTest::run_pg_rewind($test_mode);
+
+ master_psql("SELECT * from pg_ls_waldir()");
+
+ check_query(
+ 'SELECT * FROM tbl1',
+ qq(in master, wal 1
+ in master, wal 2
+ in master, wal 3
+ in master, wal 4
+ in master, wal 5, checkpoint
+ in master, wal 6, before promotion
+ in standby, wal 6, after promotion
+ in standby, wal 7, after promotion
+ in standby, wal 8, after promotion
+ ),
+ 'table content');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal' AND modification < '$pg_rewind_time' ",
+ qq($master_wal_count_before_divergence
+ ),
+ 'reserve master wals before divergence');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal' AND modification >= '$pg_rewind_time' ",
+ qq(0
+ ),
+ 'do not copy reclaimed wals from standby(new master)');
+
+ RewindTest::clean_rewind_test();
+ }
+
+ # Run the test in both modes
+ run_test('local');
+ run_test('remote');
+
+ exit(0);
On Mon, Sep 25, 2017 at 6:26 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-23 01:59:0, "Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:On Fri, Sep 22, 2017 at 7:16 PM, chenhj <chjischj@163.com> wrote:
This is the new pacth with TAP test and use Macro XLOGDIR.
Good. I took a quick look over the patch.
Why do you need master_query(), standby_query() and run_query() in
RewindTest.pm?
You can do just $node_master->safe_psql() and $node_slave->safe_psql()
instead.Ooh, i did not notice that function.Thank you for your advice!
Great. I tried this patch. It applies cleanly, but doesn't compile.
pg_rewind.c:310:36: error: too few arguments provided to function-like
macro invocation
XLByteToSeg(divergerec, startsegno);
^
../../../src/include/access/xlog_internal.h:118:9: note: macro
'XLByteToSeg' defined here
#define XLByteToSeg(xlrp, logSegNo, wal_segsz_bytes) \
^
pg_rewind.c:310:2: error: use of undeclared identifier 'XLByteToSeg'
XLByteToSeg(divergerec, startsegno);
^
pg_rewind.c:311:89: error: too few arguments provided to function-like
macro invocation
XLogFileName(divergence_wal_filename,
targetHistory[lastcommontliIndex].tli, startsegno);^
../../../src/include/access/xlog_internal.h:155:9: note: macro
'XLogFileName' defined here
#define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \
^
pg_rewind.c:311:2: error: use of undeclared identifier 'XLogFileName'
XLogFileName(divergence_wal_filename,
targetHistory[lastcommontliIndex].tli, startsegno);
^
pg_rewind.c:312:34: error: too few arguments provided to function-like
macro invocation
XLByteToPrevSeg(endrec, endsegno);
^
../../../src/include/access/xlog_internal.h:121:9: note: macro
'XLByteToPrevSeg' defined here
#define XLByteToPrevSeg(xlrp, logSegNo, wal_segsz_bytes) \
^
pg_rewind.c:312:2: error: use of undeclared identifier 'XLByteToPrevSeg'
XLByteToPrevSeg(endrec, endsegno);
^
pg_rewind.c:313:57: error: too few arguments provided to function-like
macro invocation
XLogFileName(last_source_wal_filename, endtli, endsegno);
^
../../../src/include/access/xlog_internal.h:155:9: note: macro
'XLogFileName' defined here
#define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \
^
pg_rewind.c:313:2: error: use of undeclared identifier 'XLogFileName'
XLogFileName(last_source_wal_filename, endtli, endsegno);
^
8 errors generated.
It appears that your patch conflicts with fc49e24f. Please, rebase it.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-09-28 01:29:29,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
It appears that your patch conflicts with fc49e24f. Please, rebase it.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Yes, i had rebased it, Please check the new patch.
------
Best Regards,
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce_v4.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce_v4.patchDownload
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 169,192 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
+ IsXLogFileName(path + strlen(XLOGDIR"/")) &&
+ (strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0 ||
+ strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) > 0))
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 316,328 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists &&
! !(strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
! IsXLogFileName(path + strlen(XLOGDIR"/")) &&
! strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0))
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 59,64 **** bool dry_run = false;
--- 60,69 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ char divergence_wal_filename[MAXFNAMELEN];
+ char last_source_wal_filename[MAXFNAMELEN];
+
static void
usage(const char *progname)
{
***************
*** 104,109 **** main(int argc, char **argv)
--- 109,116 ----
XLogRecPtr endrec;
TimeLineID endtli;
ControlFileData ControlFile_new;
+ XLogSegNo startsegno;
+ XLogSegNo endsegno;
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_rewind"));
progname = get_progname(argv[0]);
***************
*** 281,286 **** main(int argc, char **argv)
--- 288,318 ----
chkpttli);
/*
+ * Save the WAL filenames of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ * Note:The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ endtli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ endtli = ControlFile_source.checkPointCopy.ThisTimeLineID;
+ }
+
+ XLByteToSeg(divergerec, startsegno, WalSegSz);
+ XLogFileName(divergence_wal_filename, targetHistory[lastcommontliIndex].tli, startsegno, WalSegSz);
+ XLByteToPrevSeg(endrec, endsegno, WalSegSz);
+ XLogFileName(last_source_wal_filename, endtli, endsegno, WalSegSz);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 30,35 **** extern int WalSegSz;
--- 30,39 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern char divergence_wal_filename[];
+ extern char last_source_wal_filename[];
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
*** /dev/null
--- b/src/bin/pg_rewind/t/006_wal_copy.pl
***************
*** 0 ****
--- 1,125 ----
+ #
+ # Test pg_rewind only copy needed WALs from the source.
+ #
+ use strict;
+ use warnings;
+ use TestLib;
+ use Test::More tests => 12;
+
+ use RewindTest;
+
+ sub run_test
+ {
+ my $test_mode = shift;
+
+ RewindTest::setup_cluster($test_mode);
+ RewindTest::start_master();
+
+ # Setup parameter for WAL reclaim
+ master_psql("ALTER SYSTEM SET checkpoint_timeout = '1d'");
+ master_psql("ALTER SYSTEM SET min_wal_size = '80MB'");
+ master_psql("ALTER SYSTEM SET wal_keep_segments = 4");
+ master_psql("ALTER SYSTEM SET log_checkpoints = on");
+ master_psql("SELECT pg_reload_conf()");
+
+ RewindTest::create_standby($test_mode);
+
+ # Create a test table and insert rows in master.
+ master_psql("CREATE TABLE tbl1 (d text)");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 1')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 2')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 3')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 4')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 5, checkpoint')");
+ master_psql("CHECKPOINT");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, before promotion')");
+
+ # Promote standby
+ my $master_divergence_wal = $node_master->safe_psql("postgres", "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ RewindTest::promote_standby();
+
+ # Insert rows in master after promotion
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, after promotion')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 7, after promotion')");
+
+ master_psql("CHECKPOINT");
+
+ # Insert rows in standby after promotion
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 6, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 7, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 8, after promotion')");
+
+ standby_psql("CHECKPOINT");
+
+ # Check WALs before pg_rewind
+ master_psql("SELECT * from pg_ls_waldir()");
+ print("master_divergence_wal: $master_divergence_wal\n");
+ my $master_wal_count_before_divergence = $node_master->safe_psql("postgres", "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal'");
+ ok( $master_wal_count_before_divergence > 0, 'master_wal_count_before_divergence > 0');
+
+ standby_psql("SELECT * from pg_ls_waldir()");
+ my $standby_current_wal = $node_standby->safe_psql("postgres", "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ print("standby_current_wal: $standby_current_wal\n");
+ my $standby_reclaimed_wal_count = $node_standby->safe_psql("postgres", "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal'");
+ ok( $standby_reclaimed_wal_count > 0, 'standby_reclaimed_wal_count > 0');
+
+ # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one seconds
+ sleep(1);
+ my $pg_rewind_time = $node_master->safe_psql("postgres", "SELECT now()");
+ print("pg_rewind_time: $pg_rewind_time\n");
+
+ # Run pg_rewind and check
+ RewindTest::run_pg_rewind($test_mode);
+
+ master_psql("SELECT * from pg_ls_waldir()");
+
+ check_query(
+ 'SELECT * FROM tbl1',
+ qq(in master, wal 1
+ in master, wal 2
+ in master, wal 3
+ in master, wal 4
+ in master, wal 5, checkpoint
+ in master, wal 6, before promotion
+ in standby, wal 6, after promotion
+ in standby, wal 7, after promotion
+ in standby, wal 8, after promotion
+ ),
+ 'table content');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal' AND modification >= '$pg_rewind_time' ",
+ qq(0
+ ),
+ 'do not copy wals before divergence');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal' AND modification >= '$pg_rewind_time' ",
+ qq(0
+ ),
+ 'do not copy reclaimed wals from the source server');
+
+ RewindTest::clean_rewind_test();
+ }
+
+ # Run the test in both modes
+ run_test('local');
+ run_test('remote');
+
+ exit(0);
On Thu, Sep 28, 2017 at 6:44 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-28 01:29:29,"Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:It appears that your patch conflicts with fc49e24f. Please, rebase it.
Yes, i had rebased it, Please check the new patch.
Good, now it applies cleanly.
else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
IsXLogFileName(path + strlen(XLOGDIR"/")) &&
(strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0 ||
strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) > 0))
According to our conding style, you should leave a space betwen XLOGDIF and
"/".
Also, you do a trick by comparison xlog segment numbers using strcmp().
It's nice, but I would prefer seeing XLogFromFileName() here. It would
improve code readability and be less error prone during further
modifications.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-09-29 00:43:18,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Thu, Sep 28, 2017 at 6:44 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-28 01:29:29,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
It appears that your patch conflicts with fc49e24f. Please, rebase it.
Yes, i had rebased it, Please check the new patch.
Good, now it applies cleanly.
else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
IsXLogFileName(path + strlen(XLOGDIR"/")) &&
(strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0 ||
strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) > 0))
According to our conding style, you should leave a space betwen XLOGDIF and "/".
Also, you do a trick by comparison xlog segment numbers using strcmp(). It's nice, but I would prefer seeing XLogFromFileName() here. It would improve code readability and be less error prone during further modifications.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Thanks for advice!
I had modified it.
-----
Best Regards,
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce_v5.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce_v5.patchDownload
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 67,72 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 68,75 ----
file_action_t action = FILE_ACTION_NONE;
size_t oldsize = 0;
file_entry_t *entry;
+ uint32 tli;
+ XLogSegNo segno;
Assert(map->array == NULL);
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 171,201 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
+ IsXLogFileName(path + strlen(XLOGDIR "/")))
+ {
+ XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
+ if (segno >= divergence_segno && segno <= last_source_segno)
+ {
+ action = FILE_ACTION_COPY;
+ oldsize = 0;
+ }
+ else
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 258,263 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
--- 280,288 ----
file_entry_t *key_ptr;
filemap_t *map = filemap;
file_entry_t *entry;
+ bool reserve = false;
+ uint32 tli;
+ XLogSegNo segno;
snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
if (lstat(localpath, &statbuf) < 0)
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 328,345 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! if(strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
! IsXLogFileName(path + strlen(XLOGDIR "/")))
! {
! XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
! if(segno < divergence_segno)
! reserve = true;
! }
!
! /* Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists && !reserve)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 59,64 **** bool dry_run = false;
--- 60,69 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ XLogSegNo divergence_segno;
+ XLogSegNo last_source_segno;
+
static void
usage(const char *progname)
{
***************
*** 281,286 **** main(int argc, char **argv)
--- 286,312 ----
chkpttli);
/*
+ * Save the WAL filenames of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ * Note:The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ }
+
+ XLByteToSeg(divergerec, divergence_segno, WalSegSz);
+ XLByteToPrevSeg(endrec, last_source_segno, WalSegSz);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 30,35 **** extern int WalSegSz;
--- 30,39 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern XLogSegNo divergence_segno;
+ extern XLogSegNo last_source_segno;
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
*** /dev/null
--- b/src/bin/pg_rewind/t/006_wal_copy.pl
***************
*** 0 ****
--- 1,125 ----
+ #
+ # Test pg_rewind only copy needed WALs from the source.
+ #
+ use strict;
+ use warnings;
+ use TestLib;
+ use Test::More tests => 12;
+
+ use RewindTest;
+
+ sub run_test
+ {
+ my $test_mode = shift;
+
+ RewindTest::setup_cluster($test_mode);
+ RewindTest::start_master();
+
+ # Setup parameter for WAL reclaim
+ master_psql("ALTER SYSTEM SET checkpoint_timeout = '1d'");
+ master_psql("ALTER SYSTEM SET min_wal_size = '80MB'");
+ master_psql("ALTER SYSTEM SET wal_keep_segments = 4");
+ master_psql("ALTER SYSTEM SET log_checkpoints = on");
+ master_psql("SELECT pg_reload_conf()");
+
+ RewindTest::create_standby($test_mode);
+
+ # Create a test table and insert rows in master.
+ master_psql("CREATE TABLE tbl1 (d text)");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 1')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 2')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 3')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 4')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 5, checkpoint')");
+ master_psql("CHECKPOINT");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, before promotion')");
+
+ # Promote standby
+ my $master_divergence_wal = $node_master->safe_psql("postgres", "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ RewindTest::promote_standby();
+
+ # Insert rows in master after promotion
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, after promotion')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 7, after promotion')");
+
+ master_psql("CHECKPOINT");
+
+ # Insert rows in standby after promotion
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 6, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 7, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 8, after promotion')");
+
+ standby_psql("CHECKPOINT");
+
+ # Check WALs before pg_rewind
+ master_psql("SELECT * from pg_ls_waldir()");
+ print("master_divergence_wal: $master_divergence_wal\n");
+ my $master_wal_count_before_divergence = $node_master->safe_psql("postgres", "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal'");
+ ok( $master_wal_count_before_divergence > 0, 'master_wal_count_before_divergence > 0');
+
+ standby_psql("SELECT * from pg_ls_waldir()");
+ my $standby_current_wal = $node_standby->safe_psql("postgres", "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ print("standby_current_wal: $standby_current_wal\n");
+ my $standby_reclaimed_wal_count = $node_standby->safe_psql("postgres", "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal'");
+ ok( $standby_reclaimed_wal_count > 0, 'standby_reclaimed_wal_count > 0');
+
+ # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one seconds
+ sleep(1);
+ my $pg_rewind_time = $node_master->safe_psql("postgres", "SELECT now()");
+ print("pg_rewind_time: $pg_rewind_time\n");
+
+ # Run pg_rewind and check
+ RewindTest::run_pg_rewind($test_mode);
+
+ master_psql("SELECT * from pg_ls_waldir()");
+
+ check_query(
+ 'SELECT * FROM tbl1',
+ qq(in master, wal 1
+ in master, wal 2
+ in master, wal 3
+ in master, wal 4
+ in master, wal 5, checkpoint
+ in master, wal 6, before promotion
+ in standby, wal 6, after promotion
+ in standby, wal 7, after promotion
+ in standby, wal 8, after promotion
+ ),
+ 'table content');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal' AND modification >= '$pg_rewind_time' ",
+ qq(0
+ ),
+ 'do not copy wals before divergence');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal' AND modification >= '$pg_rewind_time' ",
+ qq(0
+ ),
+ 'do not copy reclaimed wals from the source server');
+
+ RewindTest::clean_rewind_test();
+ }
+
+ # Run the test in both modes
+ run_test('local');
+ run_test('remote');
+
+ exit(0);
On Thu, Sep 28, 2017 at 10:52 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-29 00:43:18,"Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:On Thu, Sep 28, 2017 at 6:44 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-28 01:29:29,"Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:It appears that your patch conflicts with fc49e24f. Please, rebase it.
Yes, i had rebased it, Please check the new patch.
Good, now it applies cleanly.
else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
IsXLogFileName(path + strlen(XLOGDIR"/")) &&
(strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0
||
strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) >
0))According to our conding style, you should leave a space betwen XLOGDIF
and "/".
Also, you do a trick by comparison xlog segment numbers using strcmp().
It's nice, but I would prefer seeing XLogFromFileName() here. It would
improve code readability and be less error prone during further
modifications.Thanks for advice!
I had modified it.
OK. Patch becomes better.
I also have more general question. Why do we need upper bound for segment
number (last_source_segno)? I understand the purpose of lower bound
(divergence_segno) which save us from copying extra WAL files, but what is
upper bound for? As I understood, we anyway need to replay most recent WAL
records to reach consistent state after pg_rewind. I propose to
remove last_source_segno unless I'm missing something.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-09-29 05:31:51, "Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Thu, Sep 28, 2017 at 10:52 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-29 00:43:18,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Thu, Sep 28, 2017 at 6:44 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-28 01:29:29,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
It appears that your patch conflicts with fc49e24f. Please, rebase it.
Yes, i had rebased it, Please check the new patch.
Good, now it applies cleanly.
else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
IsXLogFileName(path + strlen(XLOGDIR"/")) &&
(strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0 ||
strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) > 0))
According to our conding style, you should leave a space betwen XLOGDIF and "/".
Also, you do a trick by comparison xlog segment numbers using strcmp(). It's nice, but I would prefer seeing XLogFromFileName() here. It would improve code readability and be less error prone during further modifications.
Thanks for advice!
I had modified it.
OK. Patch becomes better.
I also have more general question. Why do we need upper bound for segment number (last_source_segno)? I understand the purpose of lower bound (divergence_segno) which save us from copying extra WAL files, but what is upper bound for? As I understood, we anyway need to replay most recent WAL records to reach consistent state after pg_rewind. I propose to remove last_source_segno unless I'm missing something.
Thanks for relay!
When checkpoint occurs, some old WAL files will be renamed as future WAL files for later use.
The upper bound for segment number (last_source_segno) is used to avoid copying these extra WAL files.
When the parameter max_wal_size or max_min_size is large,these may be many renamed old WAL files for reused.
For example, I have just looked at one of our production systems (max_wal_size = 64GB, min_wal_size = 2GB),
the total size of WALs is about 30GB, and contains about 4GB renamed old WAL files.
[postgres@hostxxx pg_xlog]$ ll
...
-rw------- 1 postgres postgres 16777216 Sep 29 14:05 0000000100000BCF00000078
-rw------- 1 postgres postgres 16777216 Sep 29 14:05 0000000100000BCF00000079
-rw------- 1 postgres postgres 16777216 Sep 29 14:05 0000000100000BCF0000007A
-rw------- 1 postgres postgres 16777216 Sep 29 14:05 0000000100000BCF0000007B
-rw------- 1 postgres postgres 16777216 Sep 29 14:05 0000000100000BCF0000007C
-rw------- 1 postgres postgres 16777216 Sep 29 14:05 0000000100000BCF0000007D
-rw------- 1 postgres postgres 16777216 Sep 29 11:22 0000000100000BCF0000007E //after this, there are about 4GB WALs for reuse
-rw------- 1 postgres postgres 16777216 Sep 29 11:08 0000000100000BCF0000007F
-rw------- 1 postgres postgres 16777216 Sep 29 11:06 0000000100000BCF00000080
-rw------- 1 postgres postgres 16777216 Sep 29 12:05 0000000100000BCF00000081
-rw------- 1 postgres postgres 16777216 Sep 29 11:28 0000000100000BCF00000082
-rw------- 1 postgres postgres 16777216 Sep 29 11:06 0000000100000BCF00000083
...
-----
Best Regards,
Chen Huajun
On Fri, Sep 29, 2017 at 10:07 AM, chenhj <chjischj@163.com> wrote:
On 2017-09-29 05:31:51, "Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:On Thu, Sep 28, 2017 at 10:52 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-29 00:43:18,"Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:On Thu, Sep 28, 2017 at 6:44 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-28 01:29:29,"Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:It appears that your patch conflicts with fc49e24f. Please, rebase it.
Yes, i had rebased it, Please check the new patch.
Good, now it applies cleanly.
else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
IsXLogFileName(path + strlen(XLOGDIR"/")) &&
(strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0
||
strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) >
0))According to our conding style, you should leave a space betwen XLOGDIF
and "/".
Also, you do a trick by comparison xlog segment numbers using strcmp().
It's nice, but I would prefer seeing XLogFromFileName() here. It would
improve code readability and be less error prone during further
modifications.Thanks for advice!
I had modified it.OK. Patch becomes better.
I also have more general question. Why do we need upper bound for segment
number (last_source_segno)? I understand the purpose of lower bound
(divergence_segno) which save us from copying extra WAL files, but what is
upper bound for? As I understood, we anyway need to replay most recent WAL
records to reach consistent state after pg_rewind. I propose to
remove last_source_segno unless I'm missing something.Thanks for relay!
When checkpoint occurs, some old WAL files will be renamed as future WAL
files for later use.
The upper bound for segment number (last_source_segno) is used to avoid
copying these extra WAL files.When the parameter max_wal_size or max_min_size is large,these may be many
renamed old WAL files for reused.For example, I have just looked at one of our production systems
(max_wal_size = 64GB, min_wal_size = 2GB),
the total size of WALs is about 30GB, and contains about 4GB renamed old
WAL files.[postgres@hostxxx pg_xlog]$ ll
...
-rw------- 1 postgres postgres 16777216 Sep 29 14:05
0000000100000BCF00000078
-rw------- 1 postgres postgres 16777216 Sep 29 14:05
0000000100000BCF00000079
-rw------- 1 postgres postgres 16777216 Sep 29 14:05
0000000100000BCF0000007A
-rw------- 1 postgres postgres 16777216 Sep 29 14:05
0000000100000BCF0000007B
-rw------- 1 postgres postgres 16777216 Sep 29 14:05
0000000100000BCF0000007C
-rw------- 1 postgres postgres 16777216 Sep 29 14:05
0000000100000BCF0000007D
-rw------- 1 postgres postgres 16777216 Sep 29 11:22
0000000100000BCF0000007E //after this, there are about 4GB WALs for reuse
-rw------- 1 postgres postgres 16777216 Sep 29 11:08
0000000100000BCF0000007F
-rw------- 1 postgres postgres 16777216 Sep 29 11:06
0000000100000BCF00000080
-rw------- 1 postgres postgres 16777216 Sep 29 12:05
0000000100000BCF00000081
-rw------- 1 postgres postgres 16777216 Sep 29 11:28
0000000100000BCF00000082
-rw------- 1 postgres postgres 16777216 Sep 29 11:06
0000000100000BCF00000083
...
OK. That makes sense. Thank you for the explanation.
I still have some minor comments.
/* + * Save the WAL filenames of the divergence and the current WAL insert + * location of the source server. Later only the WAL files between those + * would be copied to the target data directory.
Comment is outdated. We don't save filenames anymore, now we save segment
numbers.
+ * Note:The later generated WAL files in the source server before the end + * of the copy of the data files must be made available when the target + * server is started. This can be done by configuring the target server as + * a standby of the source server. + */
You miss space after "Note:". Also, it seems reasonable for me to leave
empty line before "Note:".
# Setup parameter for WAL reclaim
Parameter*s*, because you're setting up multiple of them.
# The accuracy of imodification from pg_ls_waldir() is seconds, so sleep
one seconds
One second without "s".
Also, please check empty lines in 006_wal_copy.pl to be just empty lines
without tabs.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-09-29 19:29:40,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Fri, Sep 29, 2017 at 10:07 AM, chenhj <chjischj@163.com> wrote:
OK. That makes sense. Thank you for the explanation.
I still have some minor comments.
/*
+ * Save the WAL filenames of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
Comment is outdated. We don't save filenames anymore, now we save segment numbers.
+ * Note:The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
You miss space after "Note:". Also, it seems reasonable for me to leave empty line before "Note:".
# Setup parameter for WAL reclaim
Parameter*s*, because you're setting up multiple of them.
# The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one seconds
One second without "s".
Also, please check empty lines in 006_wal_copy.pl to be just empty lines without tabs.
Thanks for your comments, i had fix above problems.
And also add several line breaks at long line in 006_wal_copy.pl
Please check this patch again.
------
Best Regards
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce_v6.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce_v6.patchDownload
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 67,72 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 68,75 ----
file_action_t action = FILE_ACTION_NONE;
size_t oldsize = 0;
file_entry_t *entry;
+ uint32 tli;
+ XLogSegNo segno;
Assert(map->array == NULL);
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 171,201 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
+ IsXLogFileName(path + strlen(XLOGDIR "/")))
+ {
+ XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
+ if (segno >= divergence_segno && segno <= last_source_segno)
+ {
+ action = FILE_ACTION_COPY;
+ oldsize = 0;
+ }
+ else
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 258,263 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
--- 280,288 ----
file_entry_t *key_ptr;
filemap_t *map = filemap;
file_entry_t *entry;
+ bool reserve = false;
+ uint32 tli;
+ XLogSegNo segno;
snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
if (lstat(localpath, &statbuf) < 0)
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 328,346 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! if(strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
! IsXLogFileName(path + strlen(XLOGDIR "/")))
! {
! XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
! if(segno < divergence_segno)
! reserve = true;
! }
!
! /*
! * Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists && !reserve)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 59,64 **** bool dry_run = false;
--- 60,69 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ XLogSegNo divergence_segno;
+ XLogSegNo last_source_segno;
+
static void
usage(const char *progname)
{
***************
*** 281,286 **** main(int argc, char **argv)
--- 286,313 ----
chkpttli);
/*
+ * Save the WAL segment numbers of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ *
+ * Note: The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ }
+
+ XLByteToSeg(divergerec, divergence_segno, WalSegSz);
+ XLByteToPrevSeg(endrec, last_source_segno, WalSegSz);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 30,35 **** extern int WalSegSz;
--- 30,39 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern XLogSegNo divergence_segno;
+ extern XLogSegNo last_source_segno;
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
*** /dev/null
--- b/src/bin/pg_rewind/t/006_wal_copy.pl
***************
*** 0 ****
--- 1,129 ----
+ #
+ # Test pg_rewind only copy needed WALs from the source.
+ #
+ use strict;
+ use warnings;
+ use TestLib;
+ use Test::More tests => 12;
+
+ use RewindTest;
+
+ sub run_test
+ {
+ my $test_mode = shift;
+
+ RewindTest::setup_cluster($test_mode);
+ RewindTest::start_master();
+
+ # Setup parameters for WAL reclaim
+ master_psql("ALTER SYSTEM SET checkpoint_timeout = '1d'");
+ master_psql("ALTER SYSTEM SET min_wal_size = '80MB'");
+ master_psql("ALTER SYSTEM SET wal_keep_segments = 4");
+ master_psql("ALTER SYSTEM SET log_checkpoints = on");
+ master_psql("SELECT pg_reload_conf()");
+
+ RewindTest::create_standby($test_mode);
+
+ # Create a test table and insert rows in master.
+ master_psql("CREATE TABLE tbl1 (d text)");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 1')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 2')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 3')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 4')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 5, checkpoint')");
+ master_psql("CHECKPOINT");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, before promotion')");
+
+ # Promote standby
+ my $master_divergence_wal = $node_master->safe_psql("postgres",
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ RewindTest::promote_standby();
+
+ # Insert rows in master after promotion
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, after promotion')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 7, after promotion')");
+
+ master_psql("CHECKPOINT");
+
+ # Insert rows in standby after promotion
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 6, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 7, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 8, after promotion')");
+
+ standby_psql("CHECKPOINT");
+
+ # Check WALs before running pg_rewind
+ master_psql("SELECT * from pg_ls_waldir()");
+ print("master_divergence_wal: $master_divergence_wal\n");
+ my $master_wal_count_before_divergence = $node_master->safe_psql("postgres",
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal'");
+ ok( $master_wal_count_before_divergence > 0, 'master_wal_count_before_divergence > 0');
+
+ standby_psql("SELECT * from pg_ls_waldir()");
+ my $standby_current_wal = $node_standby->safe_psql("postgres",
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ print("standby_current_wal: $standby_current_wal\n");
+ my $standby_reclaimed_wal_count = $node_standby->safe_psql("postgres",
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal'");
+ ok( $standby_reclaimed_wal_count > 0, 'standby_reclaimed_wal_count > 0');
+
+ # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one second
+ sleep(1);
+ my $pg_rewind_time = $node_master->safe_psql("postgres", "SELECT now()");
+ print("pg_rewind_time: $pg_rewind_time\n");
+
+ # Run pg_rewind and check
+ RewindTest::run_pg_rewind($test_mode);
+
+ master_psql("SELECT * from pg_ls_waldir()");
+
+ check_query(
+ 'SELECT * FROM tbl1',
+ qq(in master, wal 1
+ in master, wal 2
+ in master, wal 3
+ in master, wal 4
+ in master, wal 5, checkpoint
+ in master, wal 6, before promotion
+ in standby, wal 6, after promotion
+ in standby, wal 7, after promotion
+ in standby, wal 8, after promotion
+ ),
+ 'table content');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal' AND modification >= '$pg_rewind_time'",
+ qq(0
+ ),
+ 'do not copy WALs before divergence');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal' AND modification >= '$pg_rewind_time'",
+ qq(0
+ ),
+ 'do not copy reclaimed WALs from the source server');
+
+ RewindTest::clean_rewind_test();
+ }
+
+ # Run the test in both modes
+ run_test('local');
+ run_test('remote');
+
+ exit(0);
On 2017-09-30 00:53:31,"chenhj" <chjischj@163.com> wrote:
On 2017-09-29 19:29:40,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Fri, Sep 29, 2017 at 10:07 AM, chenhj <chjischj@163.com> wrote:
OK. That makes sense. Thank you for the explanation.
I still have some minor comments.
/*
+ * Save the WAL filenames of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
Comment is outdated. We don't save filenames anymore, now we save segment numbers.
+ * Note:The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
You miss space after "Note:". Also, it seems reasonable for me to leave empty line before "Note:".
# Setup parameter for WAL reclaim
Parameter*s*, because you're setting up multiple of them.
# The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one seconds
One second without "s".
Also, please check empty lines in 006_wal_copy.pl to be just empty lines without tabs.
Thanks for your comments, i had fix above problems.
And also add several line breaks at long line in 006_wal_copy.pl
Please check this patch again.
Sorry, patch v6 did not remove tabs in two empty lines, please use the new one.
Best Regards,
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce_v7.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce_v7.patchDownload
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 67,72 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 68,75 ----
file_action_t action = FILE_ACTION_NONE;
size_t oldsize = 0;
file_entry_t *entry;
+ uint32 tli;
+ XLogSegNo segno;
Assert(map->array == NULL);
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 171,201 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
+ IsXLogFileName(path + strlen(XLOGDIR "/")))
+ {
+ XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
+ if (segno >= divergence_segno && segno <= last_source_segno)
+ {
+ action = FILE_ACTION_COPY;
+ oldsize = 0;
+ }
+ else
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 258,263 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
--- 280,288 ----
file_entry_t *key_ptr;
filemap_t *map = filemap;
file_entry_t *entry;
+ bool reserve = false;
+ uint32 tli;
+ XLogSegNo segno;
snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
if (lstat(localpath, &statbuf) < 0)
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 328,346 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! if(strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
! IsXLogFileName(path + strlen(XLOGDIR "/")))
! {
! XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
! if(segno < divergence_segno)
! reserve = true;
! }
!
! /*
! * Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists && !reserve)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 59,64 **** bool dry_run = false;
--- 60,69 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ XLogSegNo divergence_segno;
+ XLogSegNo last_source_segno;
+
static void
usage(const char *progname)
{
***************
*** 281,286 **** main(int argc, char **argv)
--- 286,313 ----
chkpttli);
/*
+ * Save the WAL segment numbers of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ *
+ * Note: The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ }
+
+ XLByteToSeg(divergerec, divergence_segno, WalSegSz);
+ XLByteToPrevSeg(endrec, last_source_segno, WalSegSz);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 30,35 **** extern int WalSegSz;
--- 30,39 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern XLogSegNo divergence_segno;
+ extern XLogSegNo last_source_segno;
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
*** /dev/null
--- b/src/bin/pg_rewind/t/006_wal_copy.pl
***************
*** 0 ****
--- 1,129 ----
+ #
+ # Test pg_rewind only copy needed WALs from the source.
+ #
+ use strict;
+ use warnings;
+ use TestLib;
+ use Test::More tests => 12;
+
+ use RewindTest;
+
+ sub run_test
+ {
+ my $test_mode = shift;
+
+ RewindTest::setup_cluster($test_mode);
+ RewindTest::start_master();
+
+ # Setup parameters for WAL reclaim
+ master_psql("ALTER SYSTEM SET checkpoint_timeout = '1d'");
+ master_psql("ALTER SYSTEM SET min_wal_size = '80MB'");
+ master_psql("ALTER SYSTEM SET wal_keep_segments = 4");
+ master_psql("ALTER SYSTEM SET log_checkpoints = on");
+ master_psql("SELECT pg_reload_conf()");
+
+ RewindTest::create_standby($test_mode);
+
+ # Create a test table and insert rows in master.
+ master_psql("CREATE TABLE tbl1 (d text)");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 1')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 2')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 3')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 4')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 5, checkpoint')");
+ master_psql("CHECKPOINT");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, before promotion')");
+
+ # Promote standby
+ my $master_divergence_wal = $node_master->safe_psql("postgres",
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ RewindTest::promote_standby();
+
+ # Insert rows in master after promotion
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, after promotion')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 7, after promotion')");
+
+ master_psql("CHECKPOINT");
+
+ # Insert rows in standby after promotion
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 6, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 7, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 8, after promotion')");
+
+ standby_psql("CHECKPOINT");
+
+ # Check WALs before running pg_rewind
+ master_psql("SELECT * from pg_ls_waldir()");
+ print("master_divergence_wal: $master_divergence_wal\n");
+ my $master_wal_count_before_divergence = $node_master->safe_psql("postgres",
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal'");
+ ok( $master_wal_count_before_divergence > 0, 'master_wal_count_before_divergence > 0');
+
+ standby_psql("SELECT * from pg_ls_waldir()");
+ my $standby_current_wal = $node_standby->safe_psql("postgres",
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ print("standby_current_wal: $standby_current_wal\n");
+ my $standby_reclaimed_wal_count = $node_standby->safe_psql("postgres",
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal'");
+ ok( $standby_reclaimed_wal_count > 0, 'standby_reclaimed_wal_count > 0');
+
+ # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one second
+ sleep(1);
+ my $pg_rewind_time = $node_master->safe_psql("postgres", "SELECT now()");
+ print("pg_rewind_time: $pg_rewind_time\n");
+
+ # Run pg_rewind and check
+ RewindTest::run_pg_rewind($test_mode);
+
+ master_psql("SELECT * from pg_ls_waldir()");
+
+ check_query(
+ 'SELECT * FROM tbl1',
+ qq(in master, wal 1
+ in master, wal 2
+ in master, wal 3
+ in master, wal 4
+ in master, wal 5, checkpoint
+ in master, wal 6, before promotion
+ in standby, wal 6, after promotion
+ in standby, wal 7, after promotion
+ in standby, wal 8, after promotion
+ ),
+ 'table content');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal' AND modification >= '$pg_rewind_time'",
+ qq(0
+ ),
+ 'do not copy WALs before divergence');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal' AND modification >= '$pg_rewind_time'",
+ qq(0
+ ),
+ 'do not copy reclaimed WALs from the source server');
+
+ RewindTest::clean_rewind_test();
+ }
+
+ # Run the test in both modes
+ run_test('local');
+ run_test('remote');
+
+ exit(0);
On Fri, Sep 29, 2017 at 8:10 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-30 00:53:31,"chenhj" <chjischj@163.com> wrote:
On 2017-09-29 19:29:40,"Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:On Fri, Sep 29, 2017 at 10:07 AM, chenhj <chjischj@163.com> wrote:
OK. That makes sense. Thank you for the explanation.
I still have some minor comments.
/* + * Save the WAL filenames of the divergence and the current WAL insert + * location of the source server. Later only the WAL files between those + * would be copied to the target data directory.Comment is outdated. We don't save filenames anymore, now we save segment
numbers.+ * Note:The later generated WAL files in the source server before the end + * of the copy of the data files must be made available when the target + * server is started. This can be done by configuring the target server as + * a standby of the source server. + */You miss space after "Note:". Also, it seems reasonable for me to leave
empty line before "Note:".# Setup parameter for WAL reclaim
Parameter*s*, because you're setting up multiple of them.
# The accuracy of imodification from pg_ls_waldir() is seconds, so sleep
one seconds
One second without "s".
Also, please check empty lines in 006_wal_copy.pl to be just empty lines
without tabs.Thanks for your comments, i had fix above problems.
And also add several line breaks at long line in 006_wal_copy.pl
Please check this patch again.Sorry, patch v6 did not remove tabs in two empty lines, please use the new
one.
Great. Now code of this patch looks good for me.
However, we forgot about documentation.
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup, or
tools like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>
At least, this paragraph need to be adjusted, because it states whose files
are copied. And probably latter paragraphs whose state about WAL files.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-09-30 02:17:54,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
Great. Now code of this patch looks good for me.
However, we forgot about documentation.
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup, or
tools like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>
At least, this paragraph need to be adjusted, because it states whose files are copied. And probably latter paragraphs whose state about WAL files.
Your are rigth.
I wrote a draft as following, but i'm afraid whether the english statement is accurate.
----------------------
diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index d5430d4..bcd094b 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -50,9 +50,12 @@ PostgreSQL documentation
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
- all other files are copied in full, including configuration files. The
- advantage of <application>pg_rewind</> over taking a new base backup, or
- tools like <application>rsync</>, is that <application>pg_rewind</> does
+ all other files except WAL are copied in full, including configuration
+ files. Only the WAL files between the point of divergence and the current
+ WAL insert location of the source server are copied, for other WAL files
+ are useless for the target server. The advantage of
+ <application>pg_rewind</> over taking a new base backup, or tools
+ like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
@@ -231,7 +234,7 @@ PostgreSQL documentation
<para>
Copy all other files such as <filename>pg_xact</filename> and
configuration files from the source cluster to the target cluster
- (everything except the relation files).
+ (everything except the relation files and some WAL files).
</para>
</step>
<step>
------
Best Regars,
Chen Huajun
Import Notes
Resolved by subject fallback
On Sat, Sep 30, 2017 at 8:18 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-30 02:17:54,"Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:Great. Now code of this patch looks good for me.
However, we forgot about documentation.<para>
The result is equivalent to replacing the target data directory with
the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup,
or
tools like <application>rsync</>, is that <application>pg_rewind</>
does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>At least, this paragraph need to be adjusted, because it states whose
files are copied. And probably latter paragraphs whose state about WAL
files.Your are rigth.
I wrote a draft as following, but i'm afraid whether the english statement
is accurate.
I'm not native english speaker too :(
Only the WAL files between the point of divergence and the current WAL
insert location of the source server are copied, *for* other WAL files are
useless for the target server.
I'm not sure about this usage of word *for*. For me, it probably should be
just removed. Rest of changes looks good for me. Please, integrate them
into the patch.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-10-01 04:09:19,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Sat, Sep 30, 2017 at 8:18 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-30 02:17:54,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
Great. Now code of this patch looks good for me.
However, we forgot about documentation.
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup, or
tools like <application>rsync</>, is that <application>pg_rewind</> does
not require reading through unchanged blocks in the cluster. This makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>
At least, this paragraph need to be adjusted, because it states whose files are copied. And probably latter paragraphs whose state about WAL files.
Your are rigth.
I wrote a draft as following, but i'm afraid whether the english statement is accurate.
I'm not native english speaker too :(
Only the WAL files between the point of divergence and the current WAL insert location of the source server are copied, *for* other WAL files are useless for the target server.
I'm not sure about this usage of word *for*. For me, it probably should be just removed. Rest of changes looks good for me. Please, integrate them into the patch.
I had removed the *for* , Pleae check the new patch again.
---
Best Regards,
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce_v8.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce_v8.patchDownload
*** a/doc/src/sgml/ref/pg_rewind.sgml
--- b/doc/src/sgml/ref/pg_rewind.sgml
***************
*** 50,61 **** PostgreSQL documentation
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
! all other files are copied in full, including configuration files. The
! advantage of <application>pg_rewind</> over taking a new base backup, or
! tools like <application>rsync</>, is that <application>pg_rewind</> does
! not require reading through unchanged blocks in the cluster. This makes
! it a lot faster when the database is large and only a small
! fraction of blocks differ between the clusters.
</para>
<para>
--- 50,63 ----
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
! all other files except WAL are copied in full, including configuration
! files. Only the WAL files between the point of divergence and the current
! WAL insert location of the source server are copied, other WAL files are
! useless for the target server. The advantage of <application>pg_rewind</>
! over taking a new base backup, or tools like <application>rsync</>,
! is that <application>pg_rewind</> does not require reading through unchanged
! blocks in the cluster. This makes it a lot faster when the database is
! large and only a small fraction of blocks differ between the clusters.
</para>
<para>
***************
*** 231,237 **** PostgreSQL documentation
<para>
Copy all other files such as <filename>pg_xact</filename> and
configuration files from the source cluster to the target cluster
! (everything except the relation files).
</para>
</step>
<step>
--- 233,239 ----
<para>
Copy all other files such as <filename>pg_xact</filename> and
configuration files from the source cluster to the target cluster
! (everything except the relation files and some WAL files).
</para>
</step>
<step>
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 67,72 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 68,75 ----
file_action_t action = FILE_ACTION_NONE;
size_t oldsize = 0;
file_entry_t *entry;
+ uint32 tli;
+ XLogSegNo segno;
Assert(map->array == NULL);
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 171,201 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
+ IsXLogFileName(path + strlen(XLOGDIR "/")))
+ {
+ XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
+ if (segno >= divergence_segno && segno <= last_source_segno)
+ {
+ action = FILE_ACTION_COPY;
+ oldsize = 0;
+ }
+ else
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 258,263 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
--- 280,288 ----
file_entry_t *key_ptr;
filemap_t *map = filemap;
file_entry_t *entry;
+ bool reserve = false;
+ uint32 tli;
+ XLogSegNo segno;
snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
if (lstat(localpath, &statbuf) < 0)
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 328,346 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! if(strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
! IsXLogFileName(path + strlen(XLOGDIR "/")))
! {
! XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
! if(segno < divergence_segno)
! reserve = true;
! }
!
! /*
! * Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists && !reserve)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 59,64 **** bool dry_run = false;
--- 60,69 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ XLogSegNo divergence_segno;
+ XLogSegNo last_source_segno;
+
static void
usage(const char *progname)
{
***************
*** 281,286 **** main(int argc, char **argv)
--- 286,313 ----
chkpttli);
/*
+ * Save the WAL segment numbers of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ *
+ * Note: The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ }
+
+ XLByteToSeg(divergerec, divergence_segno, WalSegSz);
+ XLByteToPrevSeg(endrec, last_source_segno, WalSegSz);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 30,35 **** extern int WalSegSz;
--- 30,39 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern XLogSegNo divergence_segno;
+ extern XLogSegNo last_source_segno;
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
*** /dev/null
--- b/src/bin/pg_rewind/t/006_wal_copy.pl
***************
*** 0 ****
--- 1,129 ----
+ #
+ # Test pg_rewind only copy needed WALs from the source.
+ #
+ use strict;
+ use warnings;
+ use TestLib;
+ use Test::More tests => 12;
+
+ use RewindTest;
+
+ sub run_test
+ {
+ my $test_mode = shift;
+
+ RewindTest::setup_cluster($test_mode);
+ RewindTest::start_master();
+
+ # Setup parameters for WAL reclaim
+ master_psql("ALTER SYSTEM SET checkpoint_timeout = '1d'");
+ master_psql("ALTER SYSTEM SET min_wal_size = '80MB'");
+ master_psql("ALTER SYSTEM SET wal_keep_segments = 4");
+ master_psql("ALTER SYSTEM SET log_checkpoints = on");
+ master_psql("SELECT pg_reload_conf()");
+
+ RewindTest::create_standby($test_mode);
+
+ # Create a test table and insert rows in master.
+ master_psql("CREATE TABLE tbl1 (d text)");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 1')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 2')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 3')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 4')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 5, checkpoint')");
+ master_psql("CHECKPOINT");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, before promotion')");
+
+ # Promote standby
+ my $master_divergence_wal = $node_master->safe_psql("postgres",
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ RewindTest::promote_standby();
+
+ # Insert rows in master after promotion
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, after promotion')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 7, after promotion')");
+
+ master_psql("CHECKPOINT");
+
+ # Insert rows in standby after promotion
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 6, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 7, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 8, after promotion')");
+
+ standby_psql("CHECKPOINT");
+
+ # Check WALs before running pg_rewind
+ master_psql("SELECT * from pg_ls_waldir()");
+ print("master_divergence_wal: $master_divergence_wal\n");
+ my $master_wal_count_before_divergence = $node_master->safe_psql("postgres",
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal'");
+ ok( $master_wal_count_before_divergence > 0, 'master_wal_count_before_divergence > 0');
+
+ standby_psql("SELECT * from pg_ls_waldir()");
+ my $standby_current_wal = $node_standby->safe_psql("postgres",
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ print("standby_current_wal: $standby_current_wal\n");
+ my $standby_reclaimed_wal_count = $node_standby->safe_psql("postgres",
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal'");
+ ok( $standby_reclaimed_wal_count > 0, 'standby_reclaimed_wal_count > 0');
+
+ # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one second
+ sleep(1);
+ my $pg_rewind_time = $node_master->safe_psql("postgres", "SELECT now()");
+ print("pg_rewind_time: $pg_rewind_time\n");
+
+ # Run pg_rewind and check
+ RewindTest::run_pg_rewind($test_mode);
+
+ master_psql("SELECT * from pg_ls_waldir()");
+
+ check_query(
+ 'SELECT * FROM tbl1',
+ qq(in master, wal 1
+ in master, wal 2
+ in master, wal 3
+ in master, wal 4
+ in master, wal 5, checkpoint
+ in master, wal 6, before promotion
+ in standby, wal 6, after promotion
+ in standby, wal 7, after promotion
+ in standby, wal 8, after promotion
+ ),
+ 'table content');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal' AND modification >= '$pg_rewind_time'",
+ qq(0
+ ),
+ 'do not copy WALs before divergence');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal' AND modification >= '$pg_rewind_time'",
+ qq(0
+ ),
+ 'do not copy reclaimed WALs from the source server');
+
+ RewindTest::clean_rewind_test();
+ }
+
+ # Run the test in both modes
+ run_test('local');
+ run_test('remote');
+
+ exit(0);
On Sun, Oct 1, 2017 at 8:27 PM, chenhj <chjischj@163.com> wrote:
On 2017-10-01 04:09:19,"Alexander Korotkov" <a.korotkov@postgrespro.ru>
wrote:On Sat, Sep 30, 2017 at 8:18 PM, chenhj <chjischj@163.com> wrote:
On 2017-09-30 02:17:54,"Alexander Korotkov" <a.korotkov@postgrespro.ru
wrote:
Great. Now code of this patch looks good for me.
However, we forgot about documentation.<para>
The result is equivalent to replacing the target data directory with
the
source one. Only changed blocks from relation files are copied;
all other files are copied in full, including configuration files. The
advantage of <application>pg_rewind</> over taking a new base backup,
or
tools like <application>rsync</>, is that <application>pg_rewind</>
does
not require reading through unchanged blocks in the cluster. This
makes
it a lot faster when the database is large and only a small
fraction of blocks differ between the clusters.
</para>At least, this paragraph need to be adjusted, because it states whose
files are copied. And probably latter paragraphs whose state about WAL
files.Your are rigth.
I wrote a draft as following, but i'm afraid whether the english
statement is accurate.I'm not native english speaker too :(
Only the WAL files between the point of divergence and the current WAL
insert location of the source server are copied, *for* other WAL files are
useless for the target server.I'm not sure about this usage of word *for*. For me, it probably should
be just removed. Rest of changes looks good for me. Please, integrate
them into the patch.I had removed the *for* , Pleae check the new patch again.
Now, this patch looks good for me. It applies cleanly, builds cleanly,
passes regression tests, new functionality is covered by regression tests.
Code is OK for me and docs too.
I'm marking this patch as "Ready for committer". BTW, authors field in the
commitfest app is empty (https://commitfest.postgresql.org/15/1302/).
Please, put your name there.
I hope this patch will be committed during 2017-11 commitfest. Be ready to
rebase this patch if needed. Thank you for your work.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2017-10-02 23:24:30,"Alexander Korotkov" <a.korotkov@postgrespro.ru> wrote:
On Sun, Oct 1, 2017 at 8:27 PM, chenhj <chjischj@163.com> wrote:
Now, this patch looks good for me. It applies cleanly, builds cleanly, passes regression tests, new functionality is covered by regression tests. Code is OK for me and docs too.
I'm marking this patch as "Ready for committer". BTW, authors field in the commitfest app is empty (https://commitfest.postgresql.org/15/1302/). Please, put your name there.
I hope this patch will be committed during 2017-11 commitfest. Be ready to rebase this patch if needed. Thank you for your work.
I had filled the authors field of this patch in commitfest, and will rebase this patch if needed. Thank you for your help!
------
Best Regards,
Chen Huajun
On Tue, Oct 3, 2017 at 1:20 AM, chenhj <chjischj@163.com> wrote:
I had filled the authors field of this patch in commitfest, and will rebase
this patch if needed. Thank you for your help!
The documentation of the patch needs a rebase, so I am moving it to
next CF with "waiting on author" as status.
$ git diff master --check
src/bin/pg_rewind/pg_rewind.c:292: trailing whitespace.
+ *
There are whitespace complains.
--
Michael
At 2017-12-01 12:27:09, "Michael Paquier" <michael.paquier@gmail.com> wrote:
On Tue, Oct 3, 2017 at 1:20 AM, chenhj <chjischj@163.com> wrote:
I had filled the authors field of this patch in commitfest, and will rebase
this patch if needed. Thank you for your help!The documentation of the patch needs a rebase, so I am moving it to
next CF with "waiting on author" as status.$ git diff master --check
src/bin/pg_rewind/pg_rewind.c:292: trailing whitespace.
+ *
There are whitespace complains.
--
Michael
Rebased and removed the whitespace.
----
regards
Chen Huajun
Attachments:
pg_rewind_wal_copy_reduce_v9.patchapplication/octet-stream; name=pg_rewind_wal_copy_reduce_v9.patchDownload
*** a/doc/src/sgml/ref/pg_rewind.sgml
--- b/doc/src/sgml/ref/pg_rewind.sgml
***************
*** 50,61 **** PostgreSQL documentation
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
! all other files are copied in full, including configuration files. The
! advantage of <application>pg_rewind</application> over taking a new base backup, or
! tools like <application>rsync</application>, is that <application>pg_rewind</application> does
! not require reading through unchanged blocks in the cluster. This makes
! it a lot faster when the database is large and only a small
! fraction of blocks differ between the clusters.
</para>
<para>
--- 50,63 ----
<para>
The result is equivalent to replacing the target data directory with the
source one. Only changed blocks from relation files are copied;
! all other files except WAL are copied in full, including configuration
! files. Only the WAL files between the point of divergence and the current
! WAL insert location of the source server are copied, other WAL files are
! useless for the target server. The advantage of <application>pg_rewind</application>
! over taking a new base backup, or tools like <application>rsync</application>,
! is that <application>pg_rewind</application> does not require reading through unchanged
! blocks in the cluster. This makes it a lot faster when the database is
! large and only a small fraction of blocks differ between the clusters.
</para>
<para>
***************
*** 231,237 **** PostgreSQL documentation
<para>
Copy all other files such as <filename>pg_xact</filename> and
configuration files from the source cluster to the target cluster
! (everything except the relation files).
</para>
</step>
<step>
--- 233,239 ----
<para>
Copy all other files such as <filename>pg_xact</filename> and
configuration files from the source cluster to the target cluster
! (everything except the relation files and some WAL files).
</para>
</step>
<step>
*** a/src/bin/pg_rewind/filemap.c
--- b/src/bin/pg_rewind/filemap.c
***************
*** 21,26 ****
--- 21,27 ----
#include "common/string.h"
#include "catalog/pg_tablespace.h"
#include "storage/fd.h"
+ #include "access/xlog_internal.h"
filemap_t *filemap = NULL;
***************
*** 67,72 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 68,75 ----
file_action_t action = FILE_ACTION_NONE;
size_t oldsize = 0;
file_entry_t *entry;
+ uint32 tli;
+ XLogSegNo segno;
Assert(map->array == NULL);
***************
*** 168,179 **** process_source_file(const char *path, file_type_t type, size_t newsize,
--- 171,201 ----
*
* An exception: PG_VERSIONs should be identical, but avoid
* overwriting it for paranoia.
+ *
+ * Another exception: Do not copy WAL files before the
+ * divergence and the WAL files after the current WAL insert
+ * location of the source server for performance reasons.
*/
if (pg_str_endswith(path, "PG_VERSION"))
{
action = FILE_ACTION_NONE;
oldsize = statbuf.st_size;
}
+ else if (strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
+ IsXLogFileName(path + strlen(XLOGDIR "/")))
+ {
+ XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
+ if (segno >= divergence_segno && segno <= last_source_segno)
+ {
+ action = FILE_ACTION_COPY;
+ oldsize = 0;
+ }
+ else
+ {
+ action = FILE_ACTION_NONE;
+ oldsize = exists ? statbuf.st_size : 0;
+ }
+ }
else
{
action = FILE_ACTION_COPY;
***************
*** 258,263 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
--- 280,288 ----
file_entry_t *key_ptr;
filemap_t *map = filemap;
file_entry_t *entry;
+ bool reserve = false;
+ uint32 tli;
+ XLogSegNo segno;
snprintf(localpath, sizeof(localpath), "%s/%s", datadir_target, path);
if (lstat(localpath, &statbuf) < 0)
***************
*** 303,310 **** process_target_file(const char *path, file_type_t type, size_t oldsize,
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! /* Remove any file or folder that doesn't exist in the source system. */
! if (!exists)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
--- 328,346 ----
exists = (bsearch(&key_ptr, map->array, map->narray, sizeof(file_entry_t *),
path_cmp) != NULL);
! if(strncmp(path, XLOGDIR "/", strlen(XLOGDIR "/")) == 0 &&
! IsXLogFileName(path + strlen(XLOGDIR "/")))
! {
! XLogFromFileName(path + strlen(XLOGDIR "/"), &tli, &segno, WalSegSz);
! if(segno < divergence_segno)
! reserve = true;
! }
!
! /*
! * Remove any file or folder that doesn't exist in the source system
! * except the WAL files before the divergence.
! */
! if (!exists && !reserve)
{
entry = pg_malloc(sizeof(file_entry_t));
entry->path = pg_strdup(path);
*** a/src/bin/pg_rewind/pg_rewind.c
--- b/src/bin/pg_rewind/pg_rewind.c
***************
*** 27,32 ****
--- 27,33 ----
#include "common/restricted_token.h"
#include "getopt_long.h"
#include "storage/bufpage.h"
+ #include "access/xlog_internal.h"
static void usage(const char *progname);
***************
*** 59,64 **** bool dry_run = false;
--- 60,69 ----
TimeLineHistoryEntry *targetHistory;
int targetNentries;
+ /* WAL location */
+ XLogSegNo divergence_segno;
+ XLogSegNo last_source_segno;
+
static void
usage(const char *progname)
{
***************
*** 281,286 **** main(int argc, char **argv)
--- 286,313 ----
chkpttli);
/*
+ * Save the WAL segment numbers of the divergence and the current WAL insert
+ * location of the source server. Later only the WAL files between those
+ * would be copied to the target data directory.
+ *
+ * Note: The later generated WAL files in the source server before the end
+ * of the copy of the data files must be made available when the target
+ * server is started. This can be done by configuring the target server as
+ * a standby of the source server.
+ */
+ if (connstr_source)
+ {
+ endrec = libpqGetCurrentXlogInsertLocation();
+ }
+ else
+ {
+ endrec = ControlFile_source.checkPoint;
+ }
+
+ XLByteToSeg(divergerec, divergence_segno, WalSegSz);
+ XLByteToPrevSeg(endrec, last_source_segno, WalSegSz);
+
+ /*
* Build the filemap, by comparing the source and target data directories.
*/
filemap_create();
*** a/src/bin/pg_rewind/pg_rewind.h
--- b/src/bin/pg_rewind/pg_rewind.h
***************
*** 30,35 **** extern int WalSegSz;
--- 30,39 ----
extern TimeLineHistoryEntry *targetHistory;
extern int targetNentries;
+ /* WAL location */
+ extern XLogSegNo divergence_segno;
+ extern XLogSegNo last_source_segno;
+
/* in parsexlog.c */
extern void extractPageMap(const char *datadir, XLogRecPtr startpoint,
int tliIndex, XLogRecPtr endpoint);
*** /dev/null
--- b/src/bin/pg_rewind/t/006_wal_copy.pl
***************
*** 0 ****
--- 1,129 ----
+ #
+ # Test pg_rewind only copy needed WALs from the source.
+ #
+ use strict;
+ use warnings;
+ use TestLib;
+ use Test::More tests => 12;
+
+ use RewindTest;
+
+ sub run_test
+ {
+ my $test_mode = shift;
+
+ RewindTest::setup_cluster($test_mode);
+ RewindTest::start_master();
+
+ # Setup parameters for WAL reclaim
+ master_psql("ALTER SYSTEM SET checkpoint_timeout = '1d'");
+ master_psql("ALTER SYSTEM SET min_wal_size = '80MB'");
+ master_psql("ALTER SYSTEM SET wal_keep_segments = 4");
+ master_psql("ALTER SYSTEM SET log_checkpoints = on");
+ master_psql("SELECT pg_reload_conf()");
+
+ RewindTest::create_standby($test_mode);
+
+ # Create a test table and insert rows in master.
+ master_psql("CREATE TABLE tbl1 (d text)");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 1')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 2')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 3')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 4')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 5, checkpoint')");
+ master_psql("CHECKPOINT");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, before promotion')");
+
+ # Promote standby
+ my $master_divergence_wal = $node_master->safe_psql("postgres",
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ RewindTest::promote_standby();
+
+ # Insert rows in master after promotion
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 6, after promotion')");
+
+ master_psql("SELECT pg_switch_wal()");
+ master_psql("INSERT INTO tbl1 VALUES ('in master, wal 7, after promotion')");
+
+ master_psql("CHECKPOINT");
+
+ # Insert rows in standby after promotion
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 6, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 7, after promotion')");
+
+ standby_psql("SELECT pg_switch_wal()");
+ standby_psql("INSERT INTO tbl1 VALUES ('in standby, wal 8, after promotion')");
+
+ standby_psql("CHECKPOINT");
+
+ # Check WALs before running pg_rewind
+ master_psql("SELECT * from pg_ls_waldir()");
+ print("master_divergence_wal: $master_divergence_wal\n");
+ my $master_wal_count_before_divergence = $node_master->safe_psql("postgres",
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal'");
+ ok( $master_wal_count_before_divergence > 0, 'master_wal_count_before_divergence > 0');
+
+ standby_psql("SELECT * from pg_ls_waldir()");
+ my $standby_current_wal = $node_standby->safe_psql("postgres",
+ "SELECT pg_walfile_name(pg_current_wal_insert_lsn())");
+ print("standby_current_wal: $standby_current_wal\n");
+ my $standby_reclaimed_wal_count = $node_standby->safe_psql("postgres",
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal'");
+ ok( $standby_reclaimed_wal_count > 0, 'standby_reclaimed_wal_count > 0');
+
+ # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep one second
+ sleep(1);
+ my $pg_rewind_time = $node_master->safe_psql("postgres", "SELECT now()");
+ print("pg_rewind_time: $pg_rewind_time\n");
+
+ # Run pg_rewind and check
+ RewindTest::run_pg_rewind($test_mode);
+
+ master_psql("SELECT * from pg_ls_waldir()");
+
+ check_query(
+ 'SELECT * FROM tbl1',
+ qq(in master, wal 1
+ in master, wal 2
+ in master, wal 3
+ in master, wal 4
+ in master, wal 5, checkpoint
+ in master, wal 6, before promotion
+ in standby, wal 6, after promotion
+ in standby, wal 7, after promotion
+ in standby, wal 8, after promotion
+ ),
+ 'table content');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name < '$master_divergence_wal' AND modification >= '$pg_rewind_time'",
+ qq(0
+ ),
+ 'do not copy WALs before divergence');
+
+ check_query(
+ "SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}\$' AND name > '$standby_current_wal' AND modification >= '$pg_rewind_time'",
+ qq(0
+ ),
+ 'do not copy reclaimed WALs from the source server');
+
+ RewindTest::clean_rewind_test();
+ }
+
+ # Run the test in both modes
+ run_test('local');
+ run_test('remote');
+
+ exit(0);
Greetings!
* chenhj (chjischj@163.com) wrote:
Rebased and removed the whitespace.
Thanks for working on this, I agree that it seems like a pretty cool
optimization for pg_rewind.
I've only read through the thread to try and understand what's going on
and the first thing that comes to mind is that you're changing
pg_rewind to not remove the WAL from before the divergence (split)
point, but I'm not sure why. As noted, that WAL isn't needed for
anything (it's from before the split, after all), so why keep it? Is
there something in this optimization that depends on the old WAL being
there and, if so, what and why?
That's also different from how pg_basebackup works, which I don't think
is good (seems like pg_rewind should operate in a pretty similar manner
to pg_basebackup).
Setting this back to Waiting for Author but hopefully you can reply soon
and clarify that, possibly adjusting the patch accordingly.
Thanks!
Stephen
At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote:
I've only read through the thread to try and understand what's going on
and the first thing that comes to mind is that you're changing
pg_rewind to not remove the WAL from before the divergence (split)
point, but I'm not sure why. As noted, that WAL isn't needed for
anything (it's from before the split, after all), so why keep it? Is
there something in this optimization that depends on the old WAL being
there and, if so, what and why?
After run pg_rewind, the first startup of postgres will do crash recovery.
And crash recovery will begin from the previous redo point preceding the divergence.
So, the WAL after the redo point and before the divergence is needed.
Of course, the WAL before the redo point is not needed, but in my point of view,
recycling unwanted WAL does not have to be done by pg_rewind.
reference code:
pg_rewind.c:
main(int argc, char **argv)
{
...
findLastCheckpoint(datadir_target, divergerec,
lastcommontliIndex,
&chkptrec, &chkpttli, &chkptredo);
...
createBackupLabel(chkptredo, chkpttli, chkptrec);
...
}
...
createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli, XLogRecPtr checkpointloc)
{
...
len = snprintf(buf, sizeof(buf),
"START WAL LOCATION: %X/%X (file %s)\n"
"CHECKPOINT LOCATION: %X/%X\n"
"BACKUP METHOD: pg_rewind\n"
"BACKUP FROM: standby\n"
"START TIME: %s\n",
/* omit LABEL: line */
(uint32) (startpoint >> 32), (uint32) startpoint, xlogfilename,
(uint32) (checkpointloc >> 32), (uint32) checkpointloc,
strfbuf);
...
}
That's also different from how pg_basebackup works, which I don't think
is good (seems like pg_rewind should operate in a pretty similar manner
to pg_basebackup).
Thanks for your comments!
I also considered copy WAL just like how pg_basebackup does,but a implement similar to pg_basebackup's manner may be not so simple.
Twice transport of files from source to target may be needed,first for data files, and another for WAL.
And the WAL which contains the previous redo point preceding the divergence may be only exists in target server and had been recycled in source. That's different between pg_rewind and pg_basebackup.
Regards,
Chen Huajun
Greetings,
* chenhj (chjischj@163.com) wrote:
At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote:
I've only read through the thread to try and understand what's going on
and the first thing that comes to mind is that you're changing
pg_rewind to not remove the WAL from before the divergence (split)
point, but I'm not sure why. As noted, that WAL isn't needed for
anything (it's from before the split, after all), so why keep it? Is
there something in this optimization that depends on the old WAL being
there and, if so, what and why?After run pg_rewind, the first startup of postgres will do crash recovery.
And crash recovery will begin from the previous redo point preceding the divergence.
So, the WAL after the redo point and before the divergence is needed.
Right.
Of course, the WAL before the redo point is not needed, but in my point of view,
recycling unwanted WAL does not have to be done by pg_rewind.
That's what pg_rewind has been doing though, isn't it? And it's not
like that WAL is useful for anything, is it? That's also how
pg_basebackup works.
That's also different from how pg_basebackup works, which I don't think
is good (seems like pg_rewind should operate in a pretty similar manner
to pg_basebackup).Thanks for your comments!
I also considered copy WAL just like how pg_basebackup does,but a implement similar to pg_basebackup's manner may be not so simple.
Using the replication protocol to fetch WAL would be a good thing to do
(actually, making pg_rewind entirely work through a connection to the
current primary would be great) but that's independent of what I'm
asking for here. Here I'm just suggesting that we not change what
pg_rewind is doing today when it comes to the existing WAL on the
old-primary.
And the WAL which contains the previous redo point preceding the divergence may be only exists in target server and had been recycled in source. That's different between pg_rewind and pg_basebackup.
Hm, pg_rewind was removing that and expecting it to be on the new
primary? If that's the case then I could see an argument for keeping
WAL that's from the divergence point onward, but I still don't think we
should have pg_rewind just leave all of the prior WAL in place.
Thanks!
Stephen
On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote:
* chenhj (chjischj@163.com) wrote:
At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote:
I've only read through the thread to try and understand what's going on
and the first thing that comes to mind is that you're changing
pg_rewind to not remove the WAL from before the divergence (split)
point, but I'm not sure why. As noted, that WAL isn't needed for
anything (it's from before the split, after all), so why keep it? Is
there something in this optimization that depends on the old WAL being
there and, if so, what and why?After run pg_rewind, the first startup of postgres will do crash recovery.
And crash recovery will begin from the previous redo point preceding the divergence.
So, the WAL after the redo point and before the divergence is needed.Right.
Most of the time, and particularly since v11 has removed the need to
retain more past segments than one completed checkpoint, those segments
have less chances to be on the source server, limiting more the impact
of the patch discussed on this thread.
Of course, the WAL before the redo point is not needed, but in my point of view,
recycling unwanted WAL does not have to be done by pg_rewind.That's what pg_rewind has been doing though, isn't it? And it's not
like that WAL is useful for anything, is it? That's also how
pg_basebackup works.
As of HEAD, pg_rewind handles data in pg_wal similarly to other
paths which are not relation files: the files from the source are just
blindly copied to the target. After the rewind and once recovery begins,
we just let the startup process do the cleanup instead of
pg_rewind. Regarding that pg_basebackup is different, you get the choice
to do what you want using --wal-method, and you actually get the
segments that you only need to get a self-contained base backup.
That's also different from how pg_basebackup works, which I don't think
is good (seems like pg_rewind should operate in a pretty similar manner
to pg_basebackup).Thanks for your comments!
I also considered copy WAL just like how pg_basebackup does,but a
implement similar to pg_basebackup's manner may be not so simple.Using the replication protocol to fetch WAL would be a good thing to do
(actually, making pg_rewind entirely work through a connection to the
current primary would be great) but that's independent of what I'm
asking for here. Here I'm just suggesting that we not change what
pg_rewind is doing today when it comes to the existing WAL on the
old-primary.
Yes, superuser is necessary now, if we could get to a point where only a
replication permission is needed that would be nice. Now we could do
things differently. We could have a system role dedicated to pg_rewind
which works only on the functions from genfile.c that pg_rewind needs,
in order to leverage the need of a superuser.
And the WAL which contains the previous redo point preceding the
divergence may be only exists in target server and had been recycled
in source. That's different between pg_rewind and pg_basebackup.Hm, pg_rewind was removing that and expecting it to be on the new
primary? If that's the case then I could see an argument for keeping
WAL that's from the divergence point onward, but I still don't think
we should have pg_rewind just leave all of the prior WAL in place.
Another thing that we could as well do is simply not fetching any WAL
files at all during a rewind, then let the startup process of the
rewound server decide by itself what it needs. This would leverage the
data transfered in all cases. It is easy to define the start point of
WAL segments needed for a rewound server because the last checkpoint
record before WAL forked is calculated before transferring any data. The
finish point cannot be exact though because you don't know up to which
point you should transfer it. In some ways, this is close to a base
backup. We could as well define an end point to minimize the amount of
WAL as the last completed segment before data transfer begins, but then
you need to worry about WAL segment holes and such. At the end of the
day, just not transferring any data from pg_wal looks more solid to me
as a first step if we need to worry about data that is transferred but
finishes by being useless.
--
Michael
Michael, all,
* Michael Paquier (michael.paquier@gmail.com) wrote:
On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote:
* chenhj (chjischj@163.com) wrote:
At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote:
I've only read through the thread to try and understand what's going on
and the first thing that comes to mind is that you're changing
pg_rewind to not remove the WAL from before the divergence (split)
point, but I'm not sure why. As noted, that WAL isn't needed for
anything (it's from before the split, after all), so why keep it? Is
there something in this optimization that depends on the old WAL being
there and, if so, what and why?After run pg_rewind, the first startup of postgres will do crash recovery.
And crash recovery will begin from the previous redo point preceding the divergence.
So, the WAL after the redo point and before the divergence is needed.Right.
Most of the time, and particularly since v11 has removed the need to
retain more past segments than one completed checkpoint, those segments
have less chances to be on the source server, limiting more the impact
of the patch discussed on this thread.
Good point.
That's also different from how pg_basebackup works, which I don't think
is good (seems like pg_rewind should operate in a pretty similar manner
to pg_basebackup).Thanks for your comments!
I also considered copy WAL just like how pg_basebackup does,but a
implement similar to pg_basebackup's manner may be not so simple.Using the replication protocol to fetch WAL would be a good thing to do
(actually, making pg_rewind entirely work through a connection to the
current primary would be great) but that's independent of what I'm
asking for here. Here I'm just suggesting that we not change what
pg_rewind is doing today when it comes to the existing WAL on the
old-primary.Yes, superuser is necessary now, if we could get to a point where only a
replication permission is needed that would be nice. Now we could do
things differently. We could have a system role dedicated to pg_rewind
which works only on the functions from genfile.c that pg_rewind needs,
in order to leverage the need of a superuser.
Actually, the other work I'm doing nearby wrt removing the explicit
superuser() checks in those functions would allow a non-superuser role
to be created which could be used by pg_rewind. We could even add an
explicit 'pg_rewind' default role if we wanted to, but is that the route
we want to go with this or should we be thinking about changing
pg_rewind to use the replication protocol and a replication user
instead..? The only reason that it doesn't today is that there isn't an
easy way for it to do so with the existing replication protocol, as I
understand it.
Then again, there's this whole question of if we should even keep the
replication protocol or if we should be getting rid of it in favor of
have regular connections that can support replication. There's been
discussion of that recently, as I recall, though I can't remember where.
And the WAL which contains the previous redo point preceding the
divergence may be only exists in target server and had been recycled
in source. That's different between pg_rewind and pg_basebackup.Hm, pg_rewind was removing that and expecting it to be on the new
primary? If that's the case then I could see an argument for keeping
WAL that's from the divergence point onward, but I still don't think
we should have pg_rewind just leave all of the prior WAL in place.Another thing that we could as well do is simply not fetching any WAL
files at all during a rewind, then let the startup process of the
rewound server decide by itself what it needs. This would leverage the
data transfered in all cases. It is easy to define the start point of
WAL segments needed for a rewound server because the last checkpoint
record before WAL forked is calculated before transferring any data. The
finish point cannot be exact though because you don't know up to which
point you should transfer it. In some ways, this is close to a base
backup. We could as well define an end point to minimize the amount of
WAL as the last completed segment before data transfer begins, but then
you need to worry about WAL segment holes and such. At the end of the
day, just not transferring any data from pg_wal looks more solid to me
as a first step if we need to worry about data that is transferred but
finishes by being useless.
Having the rewound server decide by itself what it needs actually sounds
like it would be better and would allow whatever the restore command is
to fetch any missing WAL necessary (which it might be able to do more
efficiently, from an archive server and possibly even in parallel as
compared to pg_rewind doing it serially from the primary...). We don't
have any particularly easy way to prevent needed WAL from disappearing
off of the primary anyway, so it seems like it would be necessary to
have a working restore command for pg_rewind to be reliable. While we
could create a physical replication slot for pg_rewind when it starts,
that may already be too late.
Thanks!
Stephen
On Thu, Jan 25, 2018 at 07:38:37AM -0500, Stephen Frost wrote:
Michael, all,
* Michael Paquier (michael.paquier@gmail.com) wrote:
On Wed, Jan 24, 2018 at 12:43:51PM -0500, Stephen Frost wrote:
* chenhj (chjischj@163.com) wrote:
At 2018-01-23 09:56:48, "Stephen Frost" <sfrost@snowman.net> wrote:
I've only read through the thread to try and understand what's going on
and the first thing that comes to mind is that you're changing
pg_rewind to not remove the WAL from before the divergence (split)
point, but I'm not sure why. As noted, that WAL isn't needed for
anything (it's from before the split, after all), so why keep it? Is
there something in this optimization that depends on the old WAL being
there and, if so, what and why?After run pg_rewind, the first startup of postgres will do crash recovery.
And crash recovery will begin from the previous redo point preceding the divergence.
So, the WAL after the redo point and before the divergence is needed.Right.
Most of the time, and particularly since v11 has removed the need to
retain more past segments than one completed checkpoint, those segments
have less chances to be on the source server, limiting more the impact
of the patch discussed on this thread.Good point.
Actually, that is worse that than, because after doing a promotion one
has to issue a checkpoint on the promoted server so as to update its
control file (pg_rewind fetches the new TLI number from the on-disk
file), so we can never ensure that needed WAL segments will be on the
primary. Hence the proposed patch loses most of its value, because there
will always be a WAL segment hole anyway.
It can also help in reducing the load in creating new segments if
min_wal_size is large enough on the rewound server but that is a narrow
benefit.
In short, it seems really to me that we should reject the approach as
proposed, and replace it with something that prevents the fetching of
any WAL segments from the source server. I think that we could consider
as well removing all WAL segments on the primary from the point WAL
forked, as those created between the last checkpoint before WAL forked
up to the point where WAL forked are useful anyway. But those are bonus
points to keep a minimalistic amount of space for the rewound node as
they finish by being recycled anyway. For those reasons I think that the
patch should be marked as returned with feedback.
Yes, superuser is necessary now, if we could get to a point where only a
replication permission is needed that would be nice. Now we could do
things differently. We could have a system role dedicated to pg_rewind
which works only on the functions from genfile.c that pg_rewind needs,
in order to leverage the need of a superuser.Actually, the other work I'm doing nearby wrt removing the explicit
superuser() checks in those functions would allow a non-superuser role
to be created which could be used by pg_rewind. We could even add an
explicit 'pg_rewind' default role if we wanted to, but is that the route
we want to go with this or should we be thinking about changing
pg_rewind to use the replication protocol and a replication user
instead..? The only reason that it doesn't today is that there isn't an
easy way for it to do so with the existing replication protocol, as I
understand it.
Yes, actually with the piece of work we are doing, we don't need to work
more on the replication protocol, and pg_rewind does not require heavy
refactoring either, so from a maintenance point of view we would gain
much more in the long term by using a dedicated role. If your patch to
have the file-read catalog user gets in, we should add in the docs of
pg_rewind a small set of ACL commands to allow pg_rewind to work with a
non-superuser role as long is it gains access to the minimal set of
functions necessary.
Then again, there's this whole question of if we should even keep the
replication protocol or if we should be getting rid of it in favor of
have regular connections that can support replication. There's been
discussion of that recently, as I recall, though I can't remember
where.
Hm. I don't recall this one..
Having the rewound server decide by itself what it needs actually sounds
like it would be better and would allow whatever the restore command is
to fetch any missing WAL necessary (which it might be able to do more
efficiently, from an archive server and possibly even in parallel as
compared to pg_rewind doing it serially from the primary...). We don't
have any particularly easy way to prevent needed WAL from disappearing
off of the primary anyway, so it seems like it would be necessary to
have a working restore command for pg_rewind to be reliable. While we
could create a physical replication slot for pg_rewind when it starts,
that may already be too late.
The only approach I can think of that makes unnecessary an external
archive is to have a slot that guarantees that WAL segments down to the
previous checkpoint before promotion are kept around to allow a rewound
node to use them. For planned failovers, things can be controlled even
today as you know when to create a slot. Unplanned failovers are where
the problems come. One way I think can address any problems is to have
what I would call a failover slot. The concept is simple: when a node is
in recovery have it create a new physical replication slot at each
restart point which has a user-defined name, controlled by a GUC, and do
the creation automatically. If the parameter is empty, no slots are
created. At the beginning of a restart point, the slot is removed, so
once the node is promoted the slot will not get removed, and it would
keep around WAL segments down to the last redo point before promotion
automatically.
When you don't want to set up a WAL archive, but still want to ensure
that a rewound node is able to catch up with its promoted primary, just
enable this failover slot and then set up primary_slot_name in
recovery.conf to use it.
--
Michael
On Fri, Jan 26, 2018 at 11:36:09AM +0900, Michael Paquier wrote:
In short, it seems really to me that we should reject the approach as
proposed, and replace it with something that prevents the fetching of
any WAL segments from the source server. I think that we could consider
as well removing all WAL segments on the primary from the point WAL
forked, as those created between the last checkpoint before WAL forked
up to the point where WAL forked are useful anyway. But those are bonus
points to keep a minimalistic amount of space for the rewound node as
they finish by being recycled anyway. For those reasons I think that the
patch should be marked as returned with feedback.
Hearing nothing and as the commit fest is coming to a close, I am
marking this patch as returned with feedback. Feel free to correct me
if you think this is not adapted.
--
Michael