Document default values for pgoutput options + fix missing initialization for "origin"

Started by Fujii Masao8 months ago6 messages
#1Fujii Masao
masao.fujii@oss.nttdata.com
2 attachment(s)

Hi,

The pgoutput plugin options are documented in the logical streaming
replication protocol, but their default values are not mentioned.
This can be inconvenient for users - for example, when using pg_recvlogical
with pgoutput plugin and needing to know the default behavior of each option.
https://www.postgresql.org/docs/devel/protocol-logical-replication.html

I'd like to propose adding the default values to the documentation to
improve clarity and usability. Patch attached (0001 patch).

While working on this, I also noticed that although most optional parameters
(like "binary") are explicitly initialized in parse_output_parameters(),
the "origin" parameter is not. Its value (PGOutputData->publish_no_origin)
is implicitly set to false due to zero-initialization, but unlike other
parameters, it lacks an explicit default assignment.

To ensure consistency and make the behavior clearer, I propose explicitly
initializing the "origin" parameter as well. A patch for this is also attached
(0002 patch).

Thoughts?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v1-0001-doc-Document-default-values-for-pgoutput-options-.patchtext/plain; charset=UTF-8; name=v1-0001-doc-Document-default-values-for-pgoutput-options-.patchDownload
From 60b83897c786c89a1060d9c41adced566b7189b5 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Fri, 16 May 2025 23:09:55 +0900
Subject: [PATCH v1 1/2] doc: Document default values for pgoutput options in
 protocol.sgml.

The pgoutput plugin options are described in the logical streaming
replication protocol documentation, but their default values were
previously not mentioned. This made it less convenient for users,
for example, when specifying those options to use pg_recvlogical
with pgoutput plugin.

This commit adds the explanations of the default values for pgoutput
options to improve clarity and usability.
---
 doc/src/sgml/protocol.sgml | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index c4d3853cbf2..b9fac360fbf 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -3482,6 +3482,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
       <para>
        Boolean option to use binary transfer mode.  Binary mode is faster
        than the text mode but slightly less robust.
+       The default is <literal>false</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -3494,6 +3495,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
       <para>
        Boolean option to enable sending the messages that are written
        by <function>pg_logical_emit_message</function>.
+       The default is <literal>false</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -3505,10 +3507,11 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
      <listitem>
       <para>
        Boolean option to enable streaming of in-progress transactions.
-       It accepts an additional value "parallel" to enable sending extra
+       It accepts an additional value <literal>parallel</literal> to enable sending extra
        information with some messages to be used for parallelisation.
        Minimum protocol version 2 is required to turn it on.  Minimum protocol
-       version 4 is required for the "parallel" option.
+       version 4 is required for the <literal>parallel</literal> option.
+       The default is <literal>false</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -3521,6 +3524,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
       <para>
        Boolean option to enable two-phase transactions.   Minimum protocol
        version 3 is required to turn it on.
+       The default is <literal>false</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -3537,6 +3541,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
        to send the changes regardless of their origin.  This can be used
        to avoid loops (infinite replication of the same data) among
        replication nodes.
+       The default is <literal>any</literal>.
       </para>
      </listitem>
     </varlistentry>
-- 
2.49.0

v1-0002-pgoutput-Initialize-missing-default-for-origin-pa.patchtext/plain; charset=UTF-8; name=v1-0002-pgoutput-Initialize-missing-default-for-origin-pa.patchDownload
From 762396c608892a90a6943f9f8bfa017d289075a9 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Fri, 16 May 2025 23:53:08 +0900
Subject: [PATCH v1 2/2] pgoutput: Initialize missing default for "origin"
 parameter.

The pgoutput plugin initializes optional parameters like "binary" with
default values at the start of processing. However, the "origin"
parameter was previously missed and left without explicit initialization.

Although the PGOutputData struct, which holds these settings,
is zero-initialized at allocation (resulting in publish_no_origin field
for "origin" parameter being false by default), this default was not
set explicitly, unlike other parameters.

This commit adds explicit initialization of the "origin" parameter to
ensure consistency and clarity in how defaults are handled.
---
 src/backend/replication/pgoutput/pgoutput.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 693a766e6d7..d7099c060d9 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -297,10 +297,12 @@ parse_output_parameters(List *options, PGOutputData *data)
 	bool		two_phase_option_given = false;
 	bool		origin_option_given = false;
 
+	/* Initialize optional parameters to defaults */
 	data->binary = false;
 	data->streaming = LOGICALREP_STREAM_OFF;
 	data->messages = false;
 	data->two_phase = false;
+	data->publish_no_origin = false;
 
 	foreach(lc, options)
 	{
-- 
2.49.0

#2Euler Taveira
euler@eulerto.com
In reply to: Fujii Masao (#1)
Re: Document default values for pgoutput options + fix missing initialization for "origin"

On Fri, May 16, 2025, at 12:06 PM, Fujii Masao wrote:

The pgoutput plugin options are documented in the logical streaming
replication protocol, but their default values are not mentioned.
This can be inconvenient for users - for example, when using pg_recvlogical
with pgoutput plugin and needing to know the default behavior of each option.
https://www.postgresql.org/docs/devel/protocol-logical-replication.html

I'd like to propose adding the default values to the documentation to
improve clarity and usability. Patch attached (0001 patch).

Good catch.

Should we use "on" and "off" as other enum GUCs (wal_compression,
recovery_prefetch, compute_query_id)? The current convention is to support
other ways (true / false / 1 / 0) to write boolean but only document one way
(on / off).

Since you are changing this page, I would like to suggest removing "Boolean"
from streaming option. It is not a boolean anymore since protocol version 4.
The suggested description is:

+       Option to enable streaming of in-progress transactions. Valid values are
+       <literal>off</literal> (the default), <literal>on</literal> and
+       <literal>parallel</literal>. The setting <literal>parallel</literal>
+       enables sending extra information with some messages to be used for
+       parallelization. Minimum protocol version 2 is required to turn it
+       <literal>on</literal>.  Minimum protocol version 4 is required for the
+       <literal>parallel</literal> value.

While working on this, I also noticed that although most optional parameters
(like "binary") are explicitly initialized in parse_output_parameters(),
the "origin" parameter is not. Its value (PGOutputData->publish_no_origin)
is implicitly set to false due to zero-initialization, but unlike other
parameters, it lacks an explicit default assignment.

To ensure consistency and make the behavior clearer, I propose explicitly
initializing the "origin" parameter as well. A patch for this is also attached
(0002 patch).

LGTM. It seems an oversight from the original commit 366283961ac0.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#2)
Re: Document default values for pgoutput options + fix missing initialization for "origin"

On Tue, May 20, 2025 at 8:11 AM Euler Taveira <euler@eulerto.com> wrote:

On Fri, May 16, 2025, at 12:06 PM, Fujii Masao wrote:

The pgoutput plugin options are documented in the logical streaming
replication protocol, but their default values are not mentioned.
This can be inconvenient for users - for example, when using pg_recvlogical
with pgoutput plugin and needing to know the default behavior of each option.
https://www.postgresql.org/docs/devel/protocol-logical-replication.html

I'd like to propose adding the default values to the documentation to
improve clarity and usability. Patch attached (0001 patch).

Good catch.

Should we use "on" and "off" as other enum GUCs (wal_compression,
recovery_prefetch, compute_query_id)? The current convention is to support
other ways (true / false / 1 / 0) to write boolean but only document one way
(on / off).

Since you are changing this page, I would like to suggest removing "Boolean"
from streaming option. It is not a boolean anymore since protocol version 4.
The suggested description is:

+       Option to enable streaming of in-progress transactions. Valid values are
+       <literal>off</literal> (the default), <literal>on</literal> and
+       <literal>parallel</literal>. The setting <literal>parallel</literal>
+       enables sending extra information with some messages to be used for
+       parallelization. Minimum protocol version 2 is required to turn it
+       <literal>on</literal>.  Minimum protocol version 4 is required for the
+       <literal>parallel</literal> value.

One point to note about this is that we change the default value for
the streaming option to parallel for a subscription in the commit
1bf1140be8. But pgoutput still considers the default value to be off.
I thought about this, but not sure if there is any clear value in
changing the default of pgoutput. Would you have any thoughts on the
same?

--
With Regards,
Amit Kapila.

#4Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Euler Taveira (#2)
3 attachment(s)
Re: Document default values for pgoutput options + fix missing initialization for "origin"

On 2025/05/20 11:40, Euler Taveira wrote:

On Fri, May 16, 2025, at 12:06 PM, Fujii Masao wrote:

The pgoutput plugin options are documented in the logical streaming
replication protocol, but their default values are not mentioned.
This can be inconvenient for users - for example, when using pg_recvlogical
with pgoutput plugin and needing to know the default behavior of each option.
https://www.postgresql.org/docs/devel/protocol-logical-replication.html <https://www.postgresql.org/docs/devel/protocol-logical-replication.html&gt;

I'd like to propose adding the default values to the documentation to
improve clarity and usability. Patch attached (0001 patch).

Good catch.

Should we use "on" and "off" as other enum GUCs (wal_compression,
recovery_prefetch, compute_query_id)? The current convention is to support
other ways (true / false / 1 / 0) to write boolean but only document one way
(on / off).

Thanks for the review!

+1. I've updated the patch as you suggested. 0002 patch.

Since you are changing this page, I would like to suggest removing "Boolean"
from streaming option. It is not a boolean anymore since protocol version 4.
The suggested description is:

+       Option to enable streaming of in-progress transactions. Valid values are
+       <literal>off</literal> (the default), <literal>on</literal> and
+       <literal>parallel</literal>. The setting <literal>parallel</literal>
+       enables sending extra information with some messages to be used for
+       parallelization. Minimum protocol version 2 is required to turn it
+       <literal>on</literal>.  Minimum protocol version 4 is required for the
+       <literal>parallel</literal> value.

Sounds good to me. I created a separate patch for this change
so it can be back-patched independently. 0001 patch. I think
it should be back-patched to v16, where the streaming option
became non-boolean. Thought?

While working on this, I also noticed that although most optional parameters
(like "binary") are explicitly initialized in parse_output_parameters(),
the "origin" parameter is not. Its value (PGOutputData->publish_no_origin)
is implicitly set to false due to zero-initialization, but unlike other
parameters, it lacks an explicit default assignment.

To ensure consistency and make the behavior clearer, I propose explicitly
initializing the "origin" parameter as well. A patch for this is also attached
(0002 patch).

LGTM. It seems an oversight from the original commit 366283961ac0.

Yes.
While this issue doesn't cause any practical problems, I think
it's worth back-patching to v16 (where that commit was added)
for clarity. Thoughts?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v2-0001-doc-Fix-confusing-description-of-streaming-option.patchtext/plain; charset=UTF-8; name=v2-0001-doc-Fix-confusing-description-of-streaming-option.patchDownload
From fd989b2d36c69dd3de7c5982fa0e7d6963cf7e58 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Wed, 21 May 2025 15:46:08 +0900
Subject: [PATCH v2 1/3] doc: Fix confusing description of streaming option in
 START_REPLICATION.

Previously, the documentation described the streaming option as a boolean,
which is outdated since it's no longer a boolean as of protocol version 4.
This could confuse users.

This commit updates the description to remove the "boolean" reference and
clearly list the valid values for the streaming option.

Back-patch to v16, where the streaming option changed to a non-boolean.

Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/8d21fb98-5c25-4dee-8387-e5a62b01ea7d@app.fastmail.com
Backpatch-through: 16
---
 doc/src/sgml/protocol.sgml | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index c4d3853cbf2..3c423995aa1 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -3504,11 +3504,13 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
      </term>
      <listitem>
       <para>
-       Boolean option to enable streaming of in-progress transactions.
-       It accepts an additional value "parallel" to enable sending extra
-       information with some messages to be used for parallelisation.
-       Minimum protocol version 2 is required to turn it on.  Minimum protocol
-       version 4 is required for the "parallel" option.
+       Option to enable streaming of in-progress transactions. Valid values are
+       <literal>off</literal> (the default), <literal>on</literal> and
+       <literal>parallel</literal>. The setting <literal>parallel</literal>
+       enables sending extra information with some messages to be used for
+       parallelization. Minimum protocol version 2 is required to turn it
+       <literal>on</literal>.  Minimum protocol version 4 is required for the
+       <literal>parallel</literal> value.
       </para>
      </listitem>
     </varlistentry>
-- 
2.49.0

v2-0002-doc-Document-default-values-for-pgoutput-options-.patchtext/plain; charset=UTF-8; name=v2-0002-doc-Document-default-values-for-pgoutput-options-.patchDownload
From 9e7ba5e02165fad8579628259b1406bf78ce2dc3 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Wed, 21 May 2025 15:50:43 +0900
Subject: [PATCH v2 2/3] doc: Document default values for pgoutput options in
 protocol.sgml.

The pgoutput plugin options are described in the logical streaming
replication protocol documentation, but their default values were
previously not mentioned. This made it less convenient for users,
for example, when specifying those options to use pg_recvlogical
with pgoutput plugin.

This commit adds the explanations of the default values for pgoutput
options to improve clarity and usability.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/d2790f10-238d-4cb5-a743-d9d2a9dd900f@oss.nttdata.com
---
 doc/src/sgml/protocol.sgml | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3c423995aa1..576c2d11b29 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -3482,6 +3482,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
       <para>
        Boolean option to use binary transfer mode.  Binary mode is faster
        than the text mode but slightly less robust.
+       The default is <literal>off</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -3494,6 +3495,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
       <para>
        Boolean option to enable sending the messages that are written
        by <function>pg_logical_emit_message</function>.
+       The default is <literal>off</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -3523,6 +3525,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
       <para>
        Boolean option to enable two-phase transactions.   Minimum protocol
        version 3 is required to turn it on.
+       The default is <literal>off</literal>.
       </para>
      </listitem>
     </varlistentry>
@@ -3539,6 +3542,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
        to send the changes regardless of their origin.  This can be used
        to avoid loops (infinite replication of the same data) among
        replication nodes.
+       The default is <literal>any</literal>.
       </para>
      </listitem>
     </varlistentry>
-- 
2.49.0

v2-0003-pgoutput-Initialize-missing-default-for-origin-pa.patchtext/plain; charset=UTF-8; name=v2-0003-pgoutput-Initialize-missing-default-for-origin-pa.patchDownload
From cbce49d968c6737d02e5c0d903250bc384848135 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Fri, 16 May 2025 23:53:08 +0900
Subject: [PATCH v2 3/3] pgoutput: Initialize missing default for "origin"
 parameter.

The pgoutput plugin initializes optional parameters like "binary" with
default values at the start of processing. However, the "origin"
parameter was previously missed and left without explicit initialization.

Although the PGOutputData struct, which holds these settings,
is zero-initialized at allocation (resulting in publish_no_origin field
for "origin" parameter being false by default), this default was not
set explicitly, unlike other parameters.

This commit adds explicit initialization of the "origin" parameter to
ensure consistency and clarity in how defaults are handled.

Back-patch to v16, where "origin" parameter was added into pgoutput.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/d2790f10-238d-4cb5-a743-d9d2a9dd900f@oss.nttdata.com
Backpatch-through: 16
---
 src/backend/replication/pgoutput/pgoutput.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 693a766e6d7..d7099c060d9 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -297,10 +297,12 @@ parse_output_parameters(List *options, PGOutputData *data)
 	bool		two_phase_option_given = false;
 	bool		origin_option_given = false;
 
+	/* Initialize optional parameters to defaults */
 	data->binary = false;
 	data->streaming = LOGICALREP_STREAM_OFF;
 	data->messages = false;
 	data->two_phase = false;
+	data->publish_no_origin = false;
 
 	foreach(lc, options)
 	{
-- 
2.49.0

#5Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Amit Kapila (#3)
Re: Document default values for pgoutput options + fix missing initialization for "origin"

On 2025/05/20 13:09, Amit Kapila wrote:

+       Option to enable streaming of in-progress transactions. Valid values are
+       <literal>off</literal> (the default), <literal>on</literal> and
+       <literal>parallel</literal>. The setting <literal>parallel</literal>
+       enables sending extra information with some messages to be used for
+       parallelization. Minimum protocol version 2 is required to turn it
+       <literal>on</literal>.  Minimum protocol version 4 is required for the
+       <literal>parallel</literal> value.

One point to note about this is that we change the default value for
the streaming option to parallel for a subscription in the commit
1bf1140be8. But pgoutput still considers the default value to be off.
I thought about this, but not sure if there is any clear value in
changing the default of pgoutput. Would you have any thoughts on the
same?

Using "off" as the pgoutput's default seems better to me,
since it works regardless of protocol version.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#6Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Fujii Masao (#4)
Re: Document default values for pgoutput options + fix missing initialization for "origin"

On 2025/05/21 16:27, Fujii Masao wrote:

On 2025/05/20 11:40, Euler Taveira wrote:

On Fri, May 16, 2025, at 12:06 PM, Fujii Masao wrote:

The pgoutput plugin options are documented in the logical streaming
replication protocol, but their default values are not mentioned.
This can be inconvenient for users - for example, when using pg_recvlogical
with pgoutput plugin and needing to know the default behavior of each option.
https://www.postgresql.org/docs/devel/protocol-logical-replication.html <https://www.postgresql.org/docs/devel/protocol-logical-replication.html&gt;

I'd like to propose adding the default values to the documentation to
improve clarity and usability. Patch attached (0001 patch).

Good catch.

Should we use "on" and "off" as other enum GUCs (wal_compression,
recovery_prefetch, compute_query_id)? The current convention is to support
other ways (true / false / 1 / 0) to write boolean but only document one way
(on / off).

Thanks for the review!

+1. I've updated the patch as you suggested. 0002 patch.

Pushed. Thanks!

Since you are changing this page, I would like to suggest removing "Boolean"
from streaming option. It is not a boolean anymore since protocol version 4.
The suggested description is:

+       Option to enable streaming of in-progress transactions. Valid values are
+       <literal>off</literal> (the default), <literal>on</literal> and
+       <literal>parallel</literal>. The setting <literal>parallel</literal>
+       enables sending extra information with some messages to be used for
+       parallelization. Minimum protocol version 2 is required to turn it
+       <literal>on</literal>.  Minimum protocol version 4 is required for the
+       <literal>parallel</literal> value.

Sounds good to me. I created a separate patch for this change
so it can be back-patched independently. 0001 patch. I think
it should be back-patched to v16, where the streaming option
became non-boolean. Thought?

I've applied the patch to master and back-patched it to v16.

While working on this, I also noticed that although most optional parameters
(like "binary") are explicitly initialized in parse_output_parameters(),
the "origin" parameter is not. Its value (PGOutputData->publish_no_origin)
is implicitly set to false due to zero-initialization, but unlike other
parameters, it lacks an explicit default assignment.

To ensure consistency and make the behavior clearer, I propose explicitly
initializing the "origin" parameter as well. A patch for this is also attached
(0002 patch).

LGTM. It seems an oversight from the original commit 366283961ac0.

Yes.
While this issue doesn't cause any practical problems, I think
it's worth back-patching to v16 (where that commit was added)
for clarity. Thoughts?

Since this is an improvement rather than a bug fix, I only applied it to master.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation