POC: enable logical decoding when wal_level = 'replica' without a server restart

Started by Masahiko Sawadaover 1 year ago373 messageshackers

sawada.mshk@gmail.com

over 1 year ago

Hi all,

Logical decoding (and logical replication) are available only when
wal_level = logical. As the documentation says[1]https://www.postgresql.org/docs/devel/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS, Using the 'logical'
level increases the WAL volume which could negatively affect the
performance. For that reason, users might want to start with using
'replica', but when they want to use logical decoding they need a
server restart to increase wal_level to 'logical'. My goal is to allow
users who are using 'replica' level to use logical decoding without a
server restart. There are other GUC parameters related to logical
decoding and logical replication such as max_wal_senders,
max_logical_replication_workers, and max_replication_slots, but even
if users set these parameters >0, there would not be a noticeable
performance impact. And their default values are already >0. So I'd
like to focus on making only the wal_level dynamic GUC parameter.
There are several earlier discussions[2]/messages/by-id/CAKU4AWrv6zuywe1VBv6kwFmtaxyi5XYqpBkAG_B46cp4s4KoSw@mail.gmail.com[3]/messages/by-id/20200608213215.mgk3cctlzvfuaqm6@alap3.anarazel.de but no one has submitted
patches unless I'm missing something.

The first idea I came up with is to make the wal_level a PGC_SIGHUP
parameter. However, it affects not only setting 'replica' to 'logical'
but also setting 'minimal' to 'replica' or higher. I'm not sure the
latter case is common and it might require a checkpoint. I don't want
to make the patch complex for uncommon cases.

The second idea is to somehow allow both WAL-logging logical info and
logical decoding even when wal_level is 'replica'. I've attached a PoC
patch for that. The patch introduces new SQL functions such as
pg_activate_logical_decoding() and pg_deactivate_logical_decoding().
These functions are available only when wal_level is 'repilca'(or
higher). In pg_activate_logical_decoding(), we set the status of
logical decoding stored on the shared memory from 'disabled' to
'xlog-logical-info', allowing all processes to write logical
information to WAL records for logical decoding. But the logical
decoding is still not allowed. Once we confirm all in-progress
transactions completed, we switch the status to
'logical-decoding-ready', meaning that users can create logical
replication slots and use logical decoding.

Overall, with the patch, there are two ways to enable logical
decoding: setting wal_level to 'logical' and calling
pg_activate_logical_decoding() when wal_level is 'replica'. I left the
'logical' level for backward compatibility and for users who want to
enable the logical decoding without calling that SQL function. If we
can automatically enable the logical decoding when creating the first
logical replication slot, probably we no longer need the 'logical'
level. There is room to discuss the user interface. Feedback is very
welcome.

Regards,

[1]: https://www.postgresql.org/docs/devel/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
[2]: /messages/by-id/CAKU4AWrv6zuywe1VBv6kwFmtaxyi5XYqpBkAG_B46cp4s4KoSw@mail.gmail.com
[3]: /messages/by-id/20200608213215.mgk3cctlzvfuaqm6@alap3.anarazel.de

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 1 year ago

In reply to: Masahiko Sawada (#1)

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

Hi,

On Mon, Dec 30, 2024 at 10:44:38PM -0600, Masahiko Sawada wrote:

Hi all,

Logical decoding (and logical replication) are available only when
wal_level = logical. As the documentation says[1], Using the 'logical'
level increases the WAL volume which could negatively affect the
performance. For that reason, users might want to start with using
'replica', but when they want to use logical decoding they need a
server restart to increase wal_level to 'logical'. My goal is to allow
users who are using 'replica' level to use logical decoding without a
server restart.

Thanks for starting that thread and +1 for the idea!

The first idea I came up with is to make the wal_level a PGC_SIGHUP
parameter. However, it affects not only setting 'replica' to 'logical'
but also setting 'minimal' to 'replica' or higher. I'm not sure the
latter case is common and it might require a checkpoint. I don't want
to make the patch complex for uncommon cases.

The second idea is to somehow allow both WAL-logging logical info and
logical decoding even when wal_level is 'replica'. I've attached a PoC
patch for that. The patch introduces new SQL functions such as
pg_activate_logical_decoding() and pg_deactivate_logical_decoding().
These functions are available only when wal_level is 'repilca'(or
higher). In pg_activate_logical_decoding(), we set the status of
logical decoding stored on the shared memory from 'disabled' to
'xlog-logical-info', allowing all processes to write logical
information to WAL records for logical decoding. But the logical
decoding is still not allowed. Once we confirm all in-progress
transactions completed, we switch the status to
'logical-decoding-ready', meaning that users can create logical
replication slots and use logical decoding.

Overall, with the patch, there are two ways to enable logical
decoding: setting wal_level to 'logical' and calling
pg_activate_logical_decoding() when wal_level is 'replica'. I left the
'logical' level for backward compatibility and for users who want to
enable the logical decoding without calling that SQL function. If we
can automatically enable the logical decoding when creating the first
logical replication slot, probably we no longer need the 'logical'
level. There is room to discuss the user interface. Feedback is very
welcome.

If we don't want to force wal_level = logical to enable logical decoding (your
second idea) then I think that it would be better to "hide" everything behind
logical replication slot creation / deletion. That would mean that having at
least one logical replication slot created would be synonym to "activate logical
decoding" and zero logical replication slot created would be synonym to "deactivate
logical decoding".

That way:

1. an end user don't need to manipulate new functions and would just rely on
replication slots existence
2. we ensure that no extra WAL data is generated if not absolutely "needed"

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Euler Taveira

euler@eulerto.com

over 1 year ago

In reply to: Bertrand Drouvot (#2)

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

On Fri, Jan 3, 2025, at 10:14 AM, Bertrand Drouvot wrote:

If we don't want to force wal_level = logical to enable logical decoding (your
second idea) then I think that it would be better to "hide" everything behind
logical replication slot creation / deletion. That would mean that having at
least one logical replication slot created would be synonym to "activate logical
decoding" and zero logical replication slot created would be synonym to "deactivate
logical decoding".

I like this idea. The logical replication slot existence already has the
required protections and guarantees (no running transactions from the past while
creating) for logical decoding.

ERROR: logical decoding requires "wal_level" >= "logical"
STATEMENT: select pg_create_logical_replication_slot('test', 'pgoutput');

FATAL: logical replication slot "test" exists, but "wal_level" < "logical"
HINT: Change "wal_level" to be "logical" or higher.

Having said that, you are basically folding 'logical' machinery into 'replica'.
The 'logical' option can still exists but it will be less attractive because it
increases the WAL volume even if you are not using logical replication. I don't
know if the current 'logical' behavior (WAL records for logical decoding even
if there is no active logical replication) has any use case (maybe someone
inspects these extra records for analysis) but one suggestion (separate patch)
is to make 'logical' synonymous with the new 'replica' behavior (logical
decoding capability). This opens the door to remove 'logical' in future
releases (accepted as synonym but not documented).

POC: enable logical decoding when wal_level = 'replica' without a server restart

Attachments:

Attachments: