[PoC] pg_upgrade: allow to upgrade publisher node
Dear hackers,
(CC: Amit and Julien)
This is a fork thread of Julien's thread, which allows to upgrade subscribers
without losing changes [1]/messages/by-id/20230217075433.u5mjly4d5cr4hcfe@jrouhaud.
I briefly implemented a prototype for allowing to upgrade publisher node.
IIUC the key lack was that replication slots used for logical replication could
not be copied to new node by pg_upgrade command, so this patch allows that.
This feature can be used when '--include-replication-slot' is specified. Also,
I added a small test for the typical case. It may be helpful to understand.
Pg_upgrade internally executes pg_dump for dumping a database object from the old.
This feature follows this, adds a new option '--slot-only' to pg_dump command.
When specified, it extracts needed info from old node and generate an SQL file
that executes pg_create_logical_replication_slot().
The notable deference from pre-existing is that restoring slots are done at the
different time. Currently pg_upgrade works with following steps:
...
1. dump schema from old nodes
2. do pg_resetwal several times to new node
3. restore schema to new node
4. do pg_resetwal again to new node
...
The probem is that if we create replication slots at step 3, the restart_lsn and
confirmed_flush_lsn are set to current_wal_insert_lsn at that time, whereas
pg_resetwal discards the WAL file. Such slots cannot extracting changes.
To handle the issue the resotring is seprarated into two phases. At the first phase
restoring is done at step 3, excepts replicatin slots. At the second phase
replication slots are restored at step 5, after doing pg_resetwal.
Before upgrading a publisher node, all the changes gerenated on publisher must
be sent and applied on subscirber. This is because restart_lsn and confirmed_flush_lsn
of copied replication slots is same as current_wal_insert_lsn. New node resets
the information which WALs are really applied on subscriber and restart.
Basically it is not problematic because before shutting donw the publisher, its
walsender processes confirm all data is replicated. See WalSndDone() and related code.
Currently physical slots are ignored because this is out-of-scope for me.
I did not any analysis about it.
[1]: /messages/by-id/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
0001-pg_upgrade-Add-include-replication-slot-option.patchapplication/octet-stream; name=0001-pg_upgrade-Add-include-replication-slot-option.patchDownload+370-3
Hi Kuroda-san.
This is a WIP review. I'm yet to do more testing and more study of the
POC patch's design.
While reading the code I kept a local list of my review comments.
Meanwhile, there is a long weekend coming up here, so I thought it
would be better to pass these to you now rather than next week in case
you want to address them.
======
General
1.
Since these two new options are made to work together, I think the
names should be more similar. e.g.
pg_dump: "--slot_only" --> "--replication-slots-only"
pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"
help/comments/commit-message all should change accordingly, but I did
not give separate review comments for each of these.
~~~
2.
I felt there maybe should be some pg_dump test cases for that new
option, rather than the current patch where it only seems to be
testing the new pg_dump option via the pg_upgrade TAP tests.
======
Commit message
3.
This commit introduces a new option called "--include-replication-slot".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.
~
"new option" --> "new pg_upgrade" option
~~~
4.
For pg_upgrade, when '--include-replication-slot' is specified, it
executes pg_dump
with added option and restore from the dump. Apart from restoring
schema, pg_resetwal
must not be called after restoring replicaiton slots. This is because
the command
discards WAL files and starts from a new segment, even if they are required by
replication slots. This leads an ERROR: "requested WAL segment XXX has already
been removed". To avoid this, replication slots are restored at a different time
than other objects, after running pg_resetwal.
~
4a.
"with added option and restore from the dump" --> "with the new
"--slot-only" option and restores from the dump"
~
4b.
Typo: /replicaiton/replication/
~
4c
"leads an ERROR" --> "leads to an ERROR"
======
doc/src/sgml/ref/pg_dump.sgml
5.
+ <varlistentry>
+ <term><option>--slot-only</option></term>
+ <listitem>
+ <para>
+ Dump only replication slots, neither the schema (data definitions) nor
+ data. Mainly this is used for upgrading nodes.
+ </para>
+ </listitem>
SUGGESTION
Dump only replication slots; not the schema (data definitions), nor
data. This is mainly used when upgrading nodes.
======
doc/src/sgml/ref/pgupgrade.sgml
6.
+ <para>
+ Transport replication slots. Currently this can work only for logical
+ slots, and temporary slots are ignored. Note that pg_upgrade does not
+ check the installation of plugins.
+ </para>
SUGGESTION
Upgrade replication slots. Only logical replication slots are
currently supported, and temporary slots are ignored. Note that...
======
src/bin/pg_dump/pg_dump.c
7. main
{"exclude-table-data-and-children", required_argument, NULL, 14},
-
+ {"slot-only", no_argument, NULL, 15},
{NULL, 0, NULL, 0}
The blank line is misplaced.
~~~
8. main
+ case 15: /* dump onlu replication slot(s) */
+ dopt.slot_only = true;
+ dopt.include_everything = false;
+ break;
typo: /onlu/only/
~~~
9. main
+ if (dopt.slot_only && dopt.dataOnly)
+ pg_fatal("options --replicatin-slots and -a/--data-only cannot be
used together");
+ if (dopt.slot_only && dopt.schemaOnly)
+ pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
used together");
+
9a.
typo: /replicatin/replication/
~
9b.
I am wondering if these checks are enough. E.g. is "slots-only"
compatible with "no-publications" ?
~~~
10. main
+ /*
+ * If dumping replication slots are request, dumping them and skip others.
+ */
+ if (dopt.slot_only)
+ {
+ getRepliactionSlots(fout);
+ goto dump;
+ }
10a.
SUGGESTION
If dump replication-slots-only was requested, dump only them and skip
everything else.
~
10b.
This code seems mutually exclusive to every other option. I'm
wondering if this code even needs 'collectRoleNames', or should the
slots option check be moved above that (and also above the 'Dumping
LOs' etc...)
~~~
11. help
+ printf(_(" --slot-only dump only replication
slots, no schema and data\n"));
11a.
SUGGESTION
"no schema and data" --> "no schema or data"
~
11b.
This help is misplaced. It should be in alphabetical order consistent
with all the other help.
~~~
12. getRepliactionSlots
+/*
+ * getRepliactionSlots
+ * get information about replication slots
+ */
+static void
+getRepliactionSlots(Archive *fout)
Function name typo / getRepliactionSlots/ getReplicationSlots/
(also in the comment)
~~~
13. getRepliactionSlots
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000 && !dopt->slot_only)
+ return;
Hmmm, is that condition correct? Shouldn't the && be || here?
~~~
14. dumpReplicationSlot
+static void
+dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+ PQExpBuffer query;
+ char *slotname;
+
+ if (!dopt->slot_only)
+ return;
+
+ slotname = pg_strdup(slotinfo->dobj.name);
+ query = createPQExpBuffer();
+
+ /*
+ * XXX: For simplification, pg_create_logical_replication_slot() is used.
+ * Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+ slotname);
+ appendStringLiteralAH(query, slotinfo->plugin, fout);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralAH(query, slotinfo->twophase, fout);
+ appendPQExpBuffer(query, ");");
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+ ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+ ARCHIVE_OPTS(.tag = slotname,
+ .description = "REPICATION SLOT",
+ .section = SECTION_POST_DATA,
+ .createStmt = query->data));
+
+ /* XXX: do we have to dump security label? */
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "REPICATION SLOT", slotname,
+ NULL, NULL,
+ slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId);
+
+ pfree(slotname);
+ destroyPQExpBuffer(query);
+}
14a.
Wouldn't it be better to check the "slotinfo->dobj.dump &
DUMP_COMPONENT_DEFINITION" condition first, before building the query?
For example, see other function dumpIndexAttach().
~
14b.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the ARCHIVE_OPTS description.
~
14c.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the dumpComment parameter.
======
src/bin/pg_dump/pg_dump.h
15. DumpableObjectType
@@ -82,7 +82,8 @@ typedef enum
DO_PUBLICATION,
DO_PUBLICATION_REL,
DO_PUBLICATION_TABLE_IN_SCHEMA,
- DO_SUBSCRIPTION
+ DO_SUBSCRIPTION,
+ DO_REPICATION_SLOT
} DumpableObjectType;
Typo /DO_REPICATION_SLOT/DO_REPLICATION_SLOT/
======
src/bin/pg_upgrade/dump.c
16. generate_old_dump
+ /*
+ * Dump replicaiton slots if needed.
+ *
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring replication
+ * slots and other objects. Replication slots, in particular, should
+ * not be restored before executing the pg_resetwal command because it
+ * will remove WALs that are required by the slots.
+ */
Typo: /replicaiton/replication/
======
src/bin/pg_upgrade/pg_upgrade.c
17. main
+ /*
+ * Create replication slots if requested.
+ *
+ * XXX This must be done after doing pg_resetwal command because the
+ * command will remove required WALs.
+ */
+ if (user_opts.include_slots)
+ {
+ start_postmaster(&new_cluster, true);
+ create_replicaiton_slots();
+ stop_postmaster(false);
+ }
+
I don't think that warrants a "XXX" style comment. It is just a "Note:".
~~~
18. create_replicaiton_slots
+
+/*
+ * create_replicaiton_slots()
+ *
+ * Similar to create_new_objects() but only restores replication slots.
+ */
+static void
+create_replicaiton_slots(void)
Typo: /create_replicaiton_slots/create_replication_slots/
(Function name and comment)
~~~
19. create_replicaiton_slots
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ char slots_file_name[MAXPGPATH],
+ log_file_name[MAXPGPATH];
+ DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];
+ char *opts;
+
+ pg_log(PG_STATUS, "%s", old_db->db_name);
+
+ snprintf(slots_file_name, sizeof(slots_file_name),
+ DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
+ snprintf(log_file_name, sizeof(log_file_name),
+ DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+ opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
+
+ parallel_exec_prog(log_file_name,
+ NULL,
+ "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"",
+ new_cluster.bindir,
+ cluster_conn_opts(&new_cluster),
+ opts,
+ old_db->db_name,
+ log_opts.dumpdir,
+ slots_file_name);
+ }
That 'opts' variable seems unnecessary. Why not just pass the string
literal directly when invoking parallel_exec_prog()?
Or if not removed, then at make it const char psql_opts =
"--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
======
src/bin/pg_upgrade/pg_upgrade.h
20.
+#define DB_DUMP_FILE_MASK_FOR_SLOTS "pg_upgrade_dump_%u_slots.custom"
20a.
For consistency with other mask names (e.g. DB_DUMP_LOG_FILE_MASK)
probably this should be called DB_DUMP_SLOTS_FILE_MASK.
~
20b.
Because the content of this dump/restore file is SQL (not custom
binary) wouldn't a filename suffix ".sql" be better?
======
.../pg_upgrade/t/003_logical_replication.pl
21.
Some parts (formatting, comments, etc) in this file are inconsistent.
21a
");" is sometimes alone on a line, sometimes not
~
21b.
"Init" versus "Create" nodes.
~
21c.
# Check whether changes on new publisher are shipped to subscriber
SUGGESTION
Check whether changes on the new publisher get replicated to the subscriber
~
21d.
$result =
$subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
is($result, qq(20),
'check changes are shipped to subscriber');
For symmetry with before/after, I think it would be better to do this
same command before the upgrade to confirm q(10) rows.
------
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi,
On Tue, Apr 04, 2023 at 07:00:01AM +0000, Hayato Kuroda (Fujitsu) wrote:
Dear hackers,
(CC: Amit and Julien)
(thanks for the Cc)
This is a fork thread of Julien's thread, which allows to upgrade subscribers
without losing changes [1].I briefly implemented a prototype for allowing to upgrade publisher node.
IIUC the key lack was that replication slots used for logical replication could
not be copied to new node by pg_upgrade command, so this patch allows that.
This feature can be used when '--include-replication-slot' is specified. Also,
I added a small test for the typical case. It may be helpful to understand.Pg_upgrade internally executes pg_dump for dumping a database object from the old.
This feature follows this, adds a new option '--slot-only' to pg_dump command.
When specified, it extracts needed info from old node and generate an SQL file
that executes pg_create_logical_replication_slot().The notable deference from pre-existing is that restoring slots are done at the
different time. Currently pg_upgrade works with following steps:...
1. dump schema from old nodes
2. do pg_resetwal several times to new node
3. restore schema to new node
4. do pg_resetwal again to new node
...The probem is that if we create replication slots at step 3, the restart_lsn and
confirmed_flush_lsn are set to current_wal_insert_lsn at that time, whereas
pg_resetwal discards the WAL file. Such slots cannot extracting changes.
To handle the issue the resotring is seprarated into two phases. At the first phase
restoring is done at step 3, excepts replicatin slots. At the second phase
replication slots are restored at step 5, after doing pg_resetwal.Before upgrading a publisher node, all the changes gerenated on publisher must
be sent and applied on subscirber. This is because restart_lsn and confirmed_flush_lsn
of copied replication slots is same as current_wal_insert_lsn. New node resets
the information which WALs are really applied on subscriber and restart.
Basically it is not problematic because before shutting donw the publisher, its
walsender processes confirm all data is replicated. See WalSndDone() and related code.
As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied". Is that a hard requirement for the feature to work reliably? If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped? I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it. It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.
Currently physical slots are ignored because this is out-of-scope for me.
I did not any analysis about it.
Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?
Dear Julien,
Thank you for giving comments!
As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied". Is that a hard requirement for the feature to work reliably?
I think the requirement is needed because the existing WALs on old node cannot be
transported on new instance. The WAL hole from confirmed_flush to current position
could not be filled by newer instance.
If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped? I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it. It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.
Hmm, good point. Current patch could not be handled the case because walsenders
for the such slots do not exist. I have tested your approach, however, I found that
CHECKPOINT_SHUTDOWN record were generated twice when publisher was
shutted down and started. It led that the confirmed_lsn of slots always was behind
from WAL insert location and failed to upgrade every time.
Now I do not have good idea to solve it... Do anyone have for this?
Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?
Seems right. Will be fixed in next version. Maybe "--logical-replication-slots-only"
will be used, per Peter's suggestion [1]/messages/by-id/CAHut+PvpBsyxj9SrB1ZZ9gP7r1AA5QoTYjpzMcVSjQO2xQy7aw@mail.gmail.com.
[1]: /messages/by-id/CAHut+PvpBsyxj9SrB1ZZ9gP7r1AA5QoTYjpzMcVSjQO2xQy7aw@mail.gmail.com
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Julien,
Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?Seems right. Will be fixed in next version. Maybe
"--logical-replication-slots-only"
will be used, per Peter's suggestion [1].
After considering more, I decided not to include the word "logical" in the option
at this point. This is because we have not decided yet whether we dumps physical
replication slots or not. Current restriction has been occurred because of just
lack of analysis and considerations, If we decide not to do that, then they will
be renamed accordingly.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Peter,
Thank you for reviewing briefly. PSA new version.
If you can I want to ask the opinion about the checking by pg_upgrade [1]/messages/by-id/20230407024823.3j2s4doslsjemvis@jrouhaud.
======
General1.
Since these two new options are made to work together, I think the
names should be more similar. e.g.pg_dump: "--slot_only" --> "--replication-slots-only"
pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"help/comments/commit-message all should change accordingly, but I did
not give separate review comments for each of these.
OK, I renamed. By the way, how do you think the suggestion raised by Julien?
Currently I did not address it because the restriction was caused by just lack of
analysis, and this may be not agreed in the community.
Or, should we keep the name anyway?
2.
I felt there maybe should be some pg_dump test cases for that new
option, rather than the current patch where it only seems to be
testing the new pg_dump option via the pg_upgrade TAP tests.
Hmm, I supposed that the option shoul be used only for upgrading, so I'm not sure
it must be tested by only pg_dump.
Commit message
3.
This commit introduces a new option called "--include-replication-slot".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.~
"new option" --> "new pg_upgrade" option
Fixed.
4.
For pg_upgrade, when '--include-replication-slot' is specified, it
executes pg_dump
with added option and restore from the dump. Apart from restoring
schema, pg_resetwal
must not be called after restoring replicaiton slots. This is because
the command
discards WAL files and starts from a new segment, even if they are required by
replication slots. This leads an ERROR: "requested WAL segment XXX has already
been removed". To avoid this, replication slots are restored at a different time
than other objects, after running pg_resetwal.~
4a.
"with added option and restore from the dump" --> "with the new
"--slot-only" option and restores from the dump"
Fixed.
4b.
Typo: /replicaiton/replication/
Fixed.
4c
"leads an ERROR" --> "leads to an ERROR"
Fixed.
doc/src/sgml/ref/pg_dump.sgml
5. + <varlistentry> + <term><option>--slot-only</option></term> + <listitem> + <para> + Dump only replication slots, neither the schema (data definitions) nor + data. Mainly this is used for upgrading nodes. + </para> + </listitem>SUGGESTION
Dump only replication slots; not the schema (data definitions), nor
data. This is mainly used when upgrading nodes.
Fixed.
doc/src/sgml/ref/pgupgrade.sgml
6. + <para> + Transport replication slots. Currently this can work only for logical + slots, and temporary slots are ignored. Note that pg_upgrade does not + check the installation of plugins. + </para>SUGGESTION
Upgrade replication slots. Only logical replication slots are
currently supported, and temporary slots are ignored. Note that...
Fixed.
src/bin/pg_dump/pg_dump.c
7. main
{"exclude-table-data-and-children", required_argument, NULL, 14},
-
+ {"slot-only", no_argument, NULL, 15},
{NULL, 0, NULL, 0}The blank line is misplaced.
Fixed.
8. main + case 15: /* dump onlu replication slot(s) */ + dopt.slot_only = true; + dopt.include_everything = false; + break;typo: /onlu/only/
Fixed.
9. main + if (dopt.slot_only && dopt.dataOnly) + pg_fatal("options --replicatin-slots and -a/--data-only cannot be used together"); + if (dopt.slot_only && dopt.schemaOnly) + pg_fatal("options --replicatin-slots and -s/--schema-only cannot be used together"); +9a.
typo: /replicatin/replication/
Fixed. Additionally, wrong parameter reference was also fixed.
9b.
I am wondering if these checks are enough. E.g. is "slots-only"
compatible with "no-publications" ?
I think there are something what should be checked more. But I'm not sure about
"no-publication". There is a possibility that non-core logical replication is used,
and at that time these options are not contradicted.
10. main + /* + * If dumping replication slots are request, dumping them and skip others. + */ + if (dopt.slot_only) + { + getRepliactionSlots(fout); + goto dump; + }10a.
SUGGESTION
If dump replication-slots-only was requested, dump only them and skip
everything else.
Fixed.
10b.
This code seems mutually exclusive to every other option. I'm
wondering if this code even needs 'collectRoleNames', or should the
slots option check be moved above that (and also above the 'Dumping
LOs' etc...)
I read again, and I found that collected username are used to check the owner of
objects. IIUC replicaiton slots are not owned by database users, so it is not
needed. Also, the LOs should not dumped here. Based on them, I moved getRepliactionSlots()
above them.
11. help
+ printf(_(" --slot-only dump only replication
slots, no schema and data\n"));11a.
SUGGESTION
"no schema and data" --> "no schema or data"
Fixed.
11b.
This help is misplaced. It should be in alphabetical order consistent
with all the other help.~~~
12. getRepliactionSlots+/* + * getRepliactionSlots + * get information about replication slots + */ +static void +getRepliactionSlots(Archive *fout)Function name typo / getRepliactionSlots/ getReplicationSlots/
(also in the comment)
Fixed.
13. getRepliactionSlots
+ /* Check whether we should dump or not */ + if (fout->remoteVersion < 160000 && !dopt->slot_only) + return;Hmmm, is that condition correct? Shouldn't the && be || here?
Right, fixed.
14. dumpReplicationSlot
+static void +dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo) +{ + DumpOptions *dopt = fout->dopt; + PQExpBuffer query; + char *slotname; + + if (!dopt->slot_only) + return; + + slotname = pg_strdup(slotinfo->dobj.name); + query = createPQExpBuffer(); + + /* + * XXX: For simplification, pg_create_logical_replication_slot() is used. + * Is it sufficient? + */ + appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ", + slotname); + appendStringLiteralAH(query, slotinfo->plugin, fout); + appendPQExpBuffer(query, ", "); + appendStringLiteralAH(query, slotinfo->twophase, fout); + appendPQExpBuffer(query, ");"); + + if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION) + ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId, + ARCHIVE_OPTS(.tag = slotname, + .description = "REPICATION SLOT", + .section = SECTION_POST_DATA, + .createStmt = query->data)); + + /* XXX: do we have to dump security label? */ + + if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT) + dumpComment(fout, "REPICATION SLOT", slotname, + NULL, NULL, + slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId); + + pfree(slotname); + destroyPQExpBuffer(query); +}14a.
Wouldn't it be better to check the "slotinfo->dobj.dump &
DUMP_COMPONENT_DEFINITION" condition first, before building the query?
For example, see other function dumpIndexAttach().
The style was chosen because previously I referred dumpSubscription(). But I read
PG manual and understood that COMMENT and SECURITY LABEL cannot be set to replication
slots. Therefore, I removed comments and dump for DUMP_COMPONENT_COMMENT, then
followed the style.
14b.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the ARCHIVE_OPTS
description.~
14c.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the dumpComment parameter.
Both of them were fixed.
src/bin/pg_dump/pg_dump.h
15. DumpableObjectType
@@ -82,7 +82,8 @@ typedef enum DO_PUBLICATION, DO_PUBLICATION_REL, DO_PUBLICATION_TABLE_IN_SCHEMA, - DO_SUBSCRIPTION + DO_SUBSCRIPTION, + DO_REPICATION_SLOT } DumpableObjectType;Typo /DO_REPICATION_SLOT/DO_REPLICATION_SLOT/
Fixed.
src/bin/pg_upgrade/dump.c
16. generate_old_dump
+ /* + * Dump replicaiton slots if needed. + * + * XXX We cannot dump replication slots at the same time as the schema + * dump because we need to separate the timing of restoring replication + * slots and other objects. Replication slots, in particular, should + * not be restored before executing the pg_resetwal command because it + * will remove WALs that are required by the slots. + */Typo: /replicaiton/replication/
Fixed.
src/bin/pg_upgrade/pg_upgrade.c
17. main
+ /* + * Create replication slots if requested. + * + * XXX This must be done after doing pg_resetwal command because the + * command will remove required WALs. + */ + if (user_opts.include_slots) + { + start_postmaster(&new_cluster, true); + create_replicaiton_slots(); + stop_postmaster(false); + } +I don't think that warrants a "XXX" style comment. It is just a "Note:".
Fixed. Could you please tell me the classification of them if you can?
18. create_replicaiton_slots + +/* + * create_replicaiton_slots() + * + * Similar to create_new_objects() but only restores replication slots. + */ +static void +create_replicaiton_slots(void)Typo: /create_replicaiton_slots/create_replication_slots/
(Function name and comment)
All of them were replaced.
19. create_replicaiton_slots
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) + { + char slots_file_name[MAXPGPATH], + log_file_name[MAXPGPATH]; + DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum]; + char *opts; + + pg_log(PG_STATUS, "%s", old_db->db_name); + + snprintf(slots_file_name, sizeof(slots_file_name), + DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid); + snprintf(log_file_name, sizeof(log_file_name), + DB_DUMP_LOG_FILE_MASK, old_db->db_oid); + + opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc"; + + parallel_exec_prog(log_file_name, + NULL, + "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"", + new_cluster.bindir, + cluster_conn_opts(&new_cluster), + opts, + old_db->db_name, + log_opts.dumpdir, + slots_file_name); + }That 'opts' variable seems unnecessary. Why not just pass the string
literal directly when invoking parallel_exec_prog()?Or if not removed, then at make it const char psql_opts =
"--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
I had tried to follow the prepare_new_globals() style, but
I preferred your suggestion. Fixed.
src/bin/pg_upgrade/pg_upgrade.h
20.
+#define DB_DUMP_FILE_MASK_FOR_SLOTS
"pg_upgrade_dump_%u_slots.custom"20a.
For consistency with other mask names (e.g. DB_DUMP_LOG_FILE_MASK)
probably this should be called DB_DUMP_SLOTS_FILE_MASK.
Fixed.
20b.
Because the content of this dump/restore file is SQL (not custom
binary) wouldn't a filename suffix ".sql" be better?
Right, fixed.
.../pg_upgrade/t/003_logical_replication.pl
21.
Some parts (formatting, comments, etc) in this file are inconsistent.21a
");" is sometimes alone on a line, sometimes not
I ran pgperltidy and lonely ");" is removed.
21b.
"Init" versus "Create" nodes.
"Initialize" was chosen.
21c.
# Check whether changes on new publisher are shipped to subscriberSUGGESTION
Check whether changes on the new publisher get replicated to the subscriber
Fixed.
21d.
$result =
$subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
is($result, qq(20),
'check changes are shipped to subscriber');For symmetry with before/after, I think it would be better to do this
same command before the upgrade to confirm q(10) rows.
Added.
[1]: /messages/by-id/20230407024823.3j2s4doslsjemvis@jrouhaud
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
v2-0001-pg_upgrade-Add-include-replication-slots-option.patchapplication/octet-stream; name=v2-0001-pg_upgrade-Add-include-replication-slots-option.patchDownload+363-2
On Fri, Apr 07, 2023 at 09:40:14AM +0000, Hayato Kuroda (Fujitsu) wrote:
As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied". Is that a hard requirement for the feature to work reliably?I think the requirement is needed because the existing WALs on old node cannot be
transported on new instance. The WAL hole from confirmed_flush to current position
could not be filled by newer instance.
I see, that was also the first blocker I could think of when Amit mentioned
that feature weeks ago and I also don't see how that whole could be filled
either.
If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped? I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it. It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.Hmm, good point. Current patch could not be handled the case because walsenders
for the such slots do not exist. I have tested your approach, however, I found that
CHECKPOINT_SHUTDOWN record were generated twice when publisher was
shutted down and started. It led that the confirmed_lsn of slots always was behind
from WAL insert location and failed to upgrade every time.
Now I do not have good idea to solve it... Do anyone have for this?
I'm wondering if we could just check that each slot's LSN is exactly
sizeof(CHECKPOINT_SHUTDOWN) ago or something like that? That's hackish, but if
pg_upgrade can run it means it was a clean shutdown so it should be safe to
assume that what's the last record in the WAL was. For the double
shutdown checkpoint, I'm not sure that I get the problem. The check should
only be done at the very beginning of pg_upgrade, so there should have been
only one shutdown checkpoint done right?
On Fri, Apr 07, 2023 at 12:51:51PM +0000, Hayato Kuroda (Fujitsu) wrote:
Dear Julien,
Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?Seems right. Will be fixed in next version. Maybe
"--logical-replication-slots-only"
will be used, per Peter's suggestion [1].After considering more, I decided not to include the word "logical" in the option
at this point. This is because we have not decided yet whether we dumps physical
replication slots or not. Current restriction has been occurred because of just
lack of analysis and considerations, If we decide not to do that, then they will
be renamed accordingly.
Well, even if physical replication slots were eventually preserved during
pg_upgrade, maybe users would like to only keep one kind of the others so
having both options could make sense.
That being said, I have a hard time believing that we could actually preserve
physical replication slots. I don't think that pg_upgrade final state is fully
reproducible: not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication. Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower. And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.
Dear Julien,
Well, even if physical replication slots were eventually preserved during
pg_upgrade, maybe users would like to only keep one kind of the others so
having both options could make sense.
You meant to say that we can rename options like "logical-*" and later add a new
option for physical slots if needed, right? PSA the new patch which handled the comment.
That being said, I have a hard time believing that we could actually preserve
physical replication slots. I don't think that pg_upgrade final state is fully
reproducible: not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication. Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower. And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.
Your point made sense to me. Thank you for giving your opinion.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
v3-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v3-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload+370-2
Dear Julien,
Thank you for giving idea! I have analyzed about it.
If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped? I guess you could add a check in pg_upgrade tomake
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it. It would be good to
cover that in the TAP test, and also cover some corner cases, like any newrow
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.Hmm, good point. Current patch could not be handled the case because
walsenders
for the such slots do not exist. I have tested your approach, however, I found that
CHECKPOINT_SHUTDOWN record were generated twice when publisher was
shutted down and started. It led that the confirmed_lsn of slots always wasbehind
from WAL insert location and failed to upgrade every time.
Now I do not have good idea to solve it... Do anyone have for this?I'm wondering if we could just check that each slot's LSN is exactly
sizeof(CHECKPOINT_SHUTDOWN) ago or something like that? That's hackish,
but if
pg_upgrade can run it means it was a clean shutdown so it should be safe to
assume that what's the last record in the WAL was. For the double
shutdown checkpoint, I'm not sure that I get the problem. The check should
only be done at the very beginning of pg_upgrade, so there should have been
only one shutdown checkpoint done right?
I have analyzed about the point but it seemed to be difficult. This is because
some additional records like followings may be inserted. PSA the script which is
used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
so I wanted to withdraw it once. Sorry for noise.
* HEAP/HEAP2 records. These records may be inserted by checkpointer.
IIUC, if there are tuples which have not been flushed yet when shutdown is requested,
the checkpointer writes back all of them into heap file. At that time many WAL
records are generated. I think we cannot predict the number of records beforehand.
* INVALIDATION(S) records. These records may be inserted by VACUUM.
There is a possibility that autovacuum runs and generate WAL records. I think we
cannot predict the number of records beforehand because it depends on the number
of objects.
* RUNNING_XACTS record
It might be a timing issue, but I found that sometimes background writer generated
a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it will be
generated when the process spends 15 seconds since last logging and there are
important records. I think it is difficult to predict whether this will be appeared or not.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
Here are a few more review comments for patch v3-0001.
======
doc/src/sgml/ref/pgupgrade.sgml
1.
+ <varlistentry>
+ <term><option>--include-logical-replication-slots</option></term>
+ <listitem>
+ <para>
+ Upgrade logical replication slots. Only permanent replication slots
+ included. Note that pg_upgrade does not check the installation of
+ plugins.
+ </para>
+ </listitem>
+ </varlistentry>
Missing word.
"Only permanent replication slots included." --> "Only permanent
replication slots are included."
======
src/bin/pg_dump/pg_dump.c
2. help
@@ -1119,6 +1145,8 @@ help(const char *progname)
printf(_(" --no-unlogged-table-data do not dump unlogged table data\n"));
printf(_(" --on-conflict-do-nothing add ON CONFLICT DO NOTHING
to INSERT commands\n"));
printf(_(" --quote-all-identifiers quote all identifiers, even
if not key words\n"));
+ printf(_(" --logical-replication-slots-only\n"
+ " dump only logical replication slots,
no schema or data\n"));
printf(_(" --rows-per-insert=NROWS number of rows per INSERT;
implies --inserts\n"));
A previous review comment ([1] #11b) seems to have been missed. This
help is misplaced. It should be in alphabetical order consistent with
all the other help.
======
src/bin/pg_dump/pg_dump.h
3. _LogicalReplicationSlotInfo
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+ DumpableObject dobj;
+ char *plugin;
+ char *slottype;
+ char *twophase;
+} LogicalReplicationSlotInfo;
+
4a.
The indent of the 'LogicalReplicationSlotInfo' looks a bit strange,
unlike others in this file. Is it OK?
~
4b.
There was no typedefs.list file in this patch. Maybe the above
whitespace problem is a result of that omission.
======
.../pg_upgrade/t/003_logical_replication.pl
5.
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $bindir,
+ '-B', $bindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode, '--include-logical-replication-slot'
+ ],
+ 'run of pg_upgrade for new publisher');
5a.
How can this test even be working as-expected with those options?
Here it is passing option '--include-logical-replication-slot' but
AFAIK the proper option name everywhere else in this patch is
'--include-logical-replication-slots' (with the 's')
~
5b.
I'm not sure that "pg_upgrade for new publisher" makes sense.
It's more like "pg_upgrade of old publisher", or simply "pg_upgrade of
publisher"
------
[1]: /messages/by-id/TYCPR01MB5870E212F5012FD6272CE1E3F5969@TYCPR01MB5870.jpnprd01.prod.outlook.com
Kind Regards,
Peter Smith.
Fujitsu Australia
On Sat, Apr 8, 2023 at 12:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
...
17. main
+ /* + * Create replication slots if requested. + * + * XXX This must be done after doing pg_resetwal command because the + * command will remove required WALs. + */ + if (user_opts.include_slots) + { + start_postmaster(&new_cluster, true); + create_replicaiton_slots(); + stop_postmaster(false); + } +I don't think that warrants a "XXX" style comment. It is just a "Note:".
Fixed. Could you please tell me the classification of them if you can?
Hopefully, someone will correct me if this explanation is wrong, but
my understanding of the different prefixes is like this --
"XXX" is used as a marker for future developers to consider maybe
revisiting/improving something that the comment refers to
e.g.
/* XXX - it would be better to code this using blah but for now we did
not.... */
/* XXX - option 'foo' is not currently supported but... */
/* XXX - it might be worth considering adding more checks or an assert
here because... */
OTOH, "Note" is just for highlighting why something is the way it is,
but with no implication that it should be revisited/changed in the
future.
e.g.
/* Note: We deliberately do not test the state here because... */
/* Note: This memory must be zeroed because... */
/* Note: This string has no '\0' terminator so... */
------
Kind Regards,
Peter Smith.
Fujitsu Australia
Dear Peter,
Thank you for giving comments! PSA new version.
======
doc/src/sgml/ref/pgupgrade.sgml1. + <varlistentry> + <term><option>--include-logical-replication-slots</option></term> + <listitem> + <para> + Upgrade logical replication slots. Only permanent replication slots + included. Note that pg_upgrade does not check the installation of + plugins. + </para> + </listitem> + </varlistentry>Missing word.
"Only permanent replication slots included." --> "Only permanent
replication slots are included."
Fixed.
======
src/bin/pg_dump/pg_dump.c2. help
@@ -1119,6 +1145,8 @@ help(const char *progname) printf(_(" --no-unlogged-table-data do not dump unlogged table data\n")); printf(_(" --on-conflict-do-nothing add ON CONFLICT DO NOTHING to INSERT commands\n")); printf(_(" --quote-all-identifiers quote all identifiers, even if not key words\n")); + printf(_(" --logical-replication-slots-only\n" + " dump only logical replication slots, no schema or data\n")); printf(_(" --rows-per-insert=NROWS number of rows per INSERT; implies --inserts\n")); A previous review comment ([1] #11b) seems to have been missed. This help is misplaced. It should be in alphabetical order consistent with all the other help.
Sorry, fixed.
src/bin/pg_dump/pg_dump.h
3. _LogicalReplicationSlotInfo
+/* + * The LogicalReplicationSlotInfo struct is used to represent replication + * slots. + * XXX: add more attrbutes if needed + */ +typedef struct _LogicalReplicationSlotInfo +{ + DumpableObject dobj; + char *plugin; + char *slottype; + char *twophase; +} LogicalReplicationSlotInfo; +4a.
The indent of the 'LogicalReplicationSlotInfo' looks a bit strange,
unlike others in this file. Is it OK?
I was betrayed by pgindent because of the reason you pointed out.
Fixed.
4b.
There was no typedefs.list file in this patch. Maybe the above
whitespace problem is a result of that omission.
Your analysis is correct. Added.
.../pg_upgrade/t/003_logical_replication.pl
5.
+# Run pg_upgrade. pg_upgrade_output.d is removed at the end +command_ok( + [ + 'pg_upgrade', '--no-sync', + '-d', $old_publisher->data_dir, + '-D', $new_publisher->data_dir, + '-b', $bindir, + '-B', $bindir, + '-s', $new_publisher->host, + '-p', $old_publisher->port, + '-P', $new_publisher->port, + $mode, '--include-logical-replication-slot' + ], + 'run of pg_upgrade for new publisher');5a.
How can this test even be working as-expected with those options?Here it is passing option '--include-logical-replication-slot' but
AFAIK the proper option name everywhere else in this patch is
'--include-logical-replication-slots' (with the 's')
This is because getopt_long implemented by GNU can accept incomplete options if
collect one can be predicted from input. E.g. pg_upgrade on linux can accept
`--ve` as `--verbose`, whereas the binary built on Windows cannot.
Anyway, the difference was not my expectation. Fixed.
5b.
I'm not sure that "pg_upgrade for new publisher" makes sense.It's more like "pg_upgrade of old publisher", or simply "pg_upgrade of
publisher"
Fixed.
Additionally, I fixed two bugs which are detected by AddressSanitizer.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
v4-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v4-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload+376-4
Dear Peter,
Thank you for giving explanation.
Hopefully, someone will correct me if this explanation is wrong, but
my understanding of the different prefixes is like this --"XXX" is used as a marker for future developers to consider maybe
revisiting/improving something that the comment refers to
e.g.
/* XXX - it would be better to code this using blah but for now we did
not.... */
/* XXX - option 'foo' is not currently supported but... */
/* XXX - it might be worth considering adding more checks or an assert
here because... */OTOH, "Note" is just for highlighting why something is the way it is,
but with no implication that it should be revisited/changed in the
future.
e.g.
/* Note: We deliberately do not test the state here because... */
/* Note: This memory must be zeroed because... */
/* Note: This string has no '\0' terminator so... */
I confirmed that current "XXX" comments must be removed by improving
or some decision. Therefore I kept current annotation.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Dear hackers,
My PoC does not read and copy logical mappings files to new node, but I
did not analyzed in detail whether it is correct. Now I have done this and
considered that they do not have to be copied because transactions which executed
at the same time as rewriting are no longer decoded. How do you think?
Followings my analysis.
## What is logical mappings files?
Logical mappings file is used to track the system catalogs while logical decoding
even if its heap file is written. Sometimes catalog heaps files are modified, or
completely replaced to new files via VACUUM FULL or CLUSTER, but reorder buffer
cannot not track new one as-is. Mappings files allow to do them.
The file contains key-value relations for old-to-new tuples. Also, the name of
files contains the LSN where the triggered event is happen.
Mappings files are needed when transactions which modify catalogs are decoded.
If the LSN of files are older than restart_lsn, they are no longer needed then
removed. Please see CheckPointLogicalRewriteHeap().
## Is it needed?
I think this is not needed.
Currently pg_upgrade dumps important information from old publisher and then
execute pg_create_logical_replication_slot() on new one. Apart from
pg_copy_logical_replication_slot(), retart_lsn and confirmed_flush_lsn for old
slot is not taken over to the new slot. They are recalculated on new node while
creating. This means that transactions which have modified catalog heaps on the
old publisher are no longer decoded on new publisher.
Therefore, the mappings files on old publisher are not needed for new one.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
FYI, here are some minor review comments for v4-0001
======
src/bin/pg_dump/pg_backup.h
1.
+ int logical_slot_only;
The field should be plural - "logical_slots_only"
======
src/bin/pg_dump/pg_dump.c
2.
+ appendPQExpBufferStr(query,
+ "SELECT r.slot_name, r.plugin, r.two_phase "
+ "FROM pg_replication_slots r "
+ "WHERE r.database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");
The alias 'r' may not be needed at all here, but since you already
have it IMO it looks a bit strange that you used it for only some of
the columns but not others.
~~~
3.
+
+ /* FIXME: force dumping */
+ slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
Why the "FIXME" here? Are you intending to replace this code with
something else?
------
Kind Regards,
Peter Smith.
Fujitsu Australia
Dear Peter,
Thank you for giving comments. PSA new version.
src/bin/pg_dump/pg_backup.h
1.
+ int logical_slot_only;The field should be plural - "logical_slots_only"
Fixed.
src/bin/pg_dump/pg_dump.c
2. + appendPQExpBufferStr(query, + "SELECT r.slot_name, r.plugin, r.two_phase " + "FROM pg_replication_slots r " + "WHERE r.database = current_database() AND temporary = false " + "AND wal_status IN ('reserved', 'extended');");The alias 'r' may not be needed at all here, but since you already
have it IMO it looks a bit strange that you used it for only some of
the columns but not others.
Right, I removed alias. Moreover, the namespace 'pg_catalog' is now specified.
3. + + /* FIXME: force dumping */ + slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;Why the "FIXME" here? Are you intending to replace this code with
something else?
I was added FIXME because I was not sure whether we must add selectDumpable...()
function was needed or not. Now I have been thinking that such a functions are not
needed, so replaced comments. More detail, please see following:
Replication slots cannot be a member of extension because pg_create_logical_replication_slot()
cannot be called within the install script. This means that checkExtensionMembership()
is not needed. Moreover, we do not have have any options to include/exclude slots
in dumping, so checking their name like selectDumpableExtension() is not needed.
Based on them, I think the function is not needed.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
v5-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v5-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload+378-4
Hi Kuroda-san.
I do not have any more review comments for the v5 patch, but here are
a few remaining nitpick items.
======
General
1.
There were a couple of comments that I thought would appear less
squished (aka more readable) if there was a blank line preceding the
XXX.
1a. This one is in getLogicalReplicationSlots
+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ * XXX: Do we have to support physical slots?
+ */
~
1b. This one is for the LogicalReplicationSlotInfo typedef
+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */
BTW -- I just noticed there is a typo in that comment. /attrbutes/attributes/
======
src/bin/pg_dump/pg_dump_sort.c
2. describeDumpableObject
+ case DO_LOGICAL_REPLICATION_SLOT:
+ snprintf(buf, bufsize,
+ "REPLICATION SLOT (ID %d NAME %s)",
+ obj->dumpId, obj->name);
+ return;
Since everything else was changed to say logical replication slot,
should this string be changed to "LOGICAL REPLICATION SLOT (ID %d NAME
%s)"?
======
.../pg_upgrade/t/003_logical_replication.pl
3.
Should the name of this TAP test file really be 03_logical_replication_slots.pl?
------
Kind Regards,
Peter Smith.
Fujitsu Australia
Dear Peter,
Thank you for checking. Then we can wait comments from others.
PSA modified version.
1.
There were a couple of comments that I thought would appear less
squished (aka more readable) if there was a blank line preceding the
XXX.1a. This one is in getLogicalReplicationSlots
+ /* + * Get replication slots. + * + * XXX: Which information must be extracted from old node? Currently three + * attributes are extracted because they are used by + * pg_create_logical_replication_slot(). + * XXX: Do we have to support physical slots? + */
Added.
1b. This one is for the LogicalReplicationSlotInfo typedef
+/* + * The LogicalReplicationSlotInfo struct is used to represent replication + * slots. + * XXX: add more attrbutes if needed + */
Added.
BTW -- I just noticed there is a typo in that comment. /attrbutes/attributes/
Good finding, replaced.
src/bin/pg_dump/pg_dump_sort.c
2. describeDumpableObject
+ case DO_LOGICAL_REPLICATION_SLOT: + snprintf(buf, bufsize, + "REPLICATION SLOT (ID %d NAME %s)", + obj->dumpId, obj->name); + return;Since everything else was changed to say logical replication slot,
should this string be changed to "LOGICAL REPLICATION SLOT (ID %d NAME
%s)"?
I missed to replace, changed.
.../pg_upgrade/t/003_logical_replication.pl
3.
Should the name of this TAP test file really be 03_logical_replication_slots.pl?
Hmm, not sure. Currently I renamed once according to your advice, but personally
another feature which allows to upgrade subscriber[1]/messages/by-id/20230217075433.u5mjly4d5cr4hcfe@jrouhaud should be tested in the same
perl file. That's why I named as "003_logical_replication.pl"
[1]: /messages/by-id/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
v6-0001-pg_upgrade-Add-include-logical-replication-slots-.patchapplication/octet-stream; name=v6-0001-pg_upgrade-Add-include-logical-replication-slots-.patchDownload+381-4
Hi,
Sorry for the delay, I didn't had time to come back to it until this afternoon.
On Mon, Apr 10, 2023 at 09:18:46AM +0000, Hayato Kuroda (Fujitsu) wrote:
I have analyzed about the point but it seemed to be difficult. This is because
some additional records like followings may be inserted. PSA the script which is
used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
so I wanted to withdraw it once. Sorry for noise.* HEAP/HEAP2 records. These records may be inserted by checkpointer.
IIUC, if there are tuples which have not been flushed yet when shutdown is requested,
the checkpointer writes back all of them into heap file. At that time many WAL
records are generated. I think we cannot predict the number of records beforehand.* INVALIDATION(S) records. These records may be inserted by VACUUM.
There is a possibility that autovacuum runs and generate WAL records. I think we
cannot predict the number of records beforehand because it depends on the number
of objects.* RUNNING_XACTS record
It might be a timing issue, but I found that sometimes background writer generated
a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it will be
generated when the process spends 15 seconds since last logging and there are
important records. I think it is difficult to predict whether this will be appeared or not.
I don't think that your analysis is correct. Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.
What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown. But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk. You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.
In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.
It's probably totally harmless to throw away that value for now (and probably
also doesn't lead to crazy amount of work after restart, I really don't know
much about the logical slot code), but clearly becomes problematic with your
usecase. One easy way to fix this is to teach the checkpoint code to force
saving the logical slots to disk even if they're not marked as dirty during a
shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
interfere with the cfbot). With this patch applied I reliably only see a final
shutdown checkpoint record with your scenario.
Now such a change will make shutdown a bit more expensive when using logical
replication, even if in 99% of cases you will not need to save the
confirmed_flush_lsn value, so I don't know if that's acceptable or not.