Control flow in logical replication walsender

Started by Christophe Pettusover 1 year ago7 messages
#1Christophe Pettus
xof@thebuild.com

Hi,

I wanted to check my understanding of how control flows in a walsender doing logical replication. My understanding is that the (single) thread in each walsender process, in the simplest case, loops on:

1. Pull a record out of the WAL.
2. Pass it to the recorder buffer code, which,
3. Sorts it out into the appropriate in-memory structure for that transaction (spilling to disk as required), and then continues with #1, or,
4. If it's a commit record, it iteratively passes the transaction data one change at a time to,
5. The logical decoding plugin, which returns the output format of that plugin, and then,
6. The walsender sends the output from the plugin to the client. It cycles on passing the data to the plugin and sending it to the client until it runs out of changes in that transaction, and then resumes reading the WAL in #1.

In particular, I wanted to confirm that while it is pulling the reordered transaction and sending it to the plugin (and thence to the client), that particular walsender is *not* reading new WAL records or putting them in the reorder buffer.

The specific issue I'm trying to track down is an enormous pileup of spill files. This is in a non-supported version of PostgreSQL (v11), so an upgrade may fix it, but at the moment, I'm trying to find a cause and a mitigation.

#2Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Christophe Pettus (#1)
Re: Control flow in logical replication walsender

On Tue, Apr 30, 2024 at 11:28 PM Christophe Pettus <xof@thebuild.com> wrote:

Hi,

I wanted to check my understanding of how control flows in a walsender
doing logical replication. My understanding is that the (single) thread in
each walsender process, in the simplest case, loops on:

1. Pull a record out of the WAL.
2. Pass it to the recorder buffer code, which,
3. Sorts it out into the appropriate in-memory structure for that
transaction (spilling to disk as required), and then continues with #1, or,
4. If it's a commit record, it iteratively passes the transaction data one
change at a time to,
5. The logical decoding plugin, which returns the output format of that
plugin, and then,
6. The walsender sends the output from the plugin to the client. It cycles
on passing the data to the plugin and sending it to the client until it
runs out of changes in that transaction, and then resumes reading the WAL
in #1.

This is correct barring some details on master.

In particular, I wanted to confirm that while it is pulling the reordered
transaction and sending it to the plugin (and thence to the client), that
particular walsender is *not* reading new WAL records or putting them in
the reorder buffer.

This is correct.

The specific issue I'm trying to track down is an enormous pileup of spill
files. This is in a non-supported version of PostgreSQL (v11), so an
upgrade may fix it, but at the moment, I'm trying to find a cause and a
mitigation.

Is there a large transaction which is failing to be replicated repeatedly -
timeouts, crashes on upstream or downstream?

--
Best Wishes,
Ashutosh Bapat

#3Christophe Pettus
xof@thebuild.com
In reply to: Ashutosh Bapat (#2)
Re: Control flow in logical replication walsender

Thank you for the reply!

On May 1, 2024, at 02:18, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Is there a large transaction which is failing to be replicated repeatedly - timeouts, crashes on upstream or downstream?

AFAIK, no, although I am doing this somewhat by remote control (I don't have direct access to the failing system). This did bring up one other question, though:

Are subtransactions written to their own individual reorder buffers (and thus potentially spill files), or are they appended to the topmost transaction's reorder buffer?

#4Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Christophe Pettus (#3)
Re: Control flow in logical replication walsender

On Tue, May 7, 2024 at 12:00 AM Christophe Pettus <xof@thebuild.com> wrote:

Thank you for the reply!

On May 1, 2024, at 02:18, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

wrote:

Is there a large transaction which is failing to be replicated

repeatedly - timeouts, crashes on upstream or downstream?

AFAIK, no, although I am doing this somewhat by remote control (I don't
have direct access to the failing system). This did bring up one other
question, though:

Are subtransactions written to their own individual reorder buffers (and
thus potentially spill files), or are they appended to the topmost
transaction's reorder buffer?

IIRC, they have their own RB, but once they commit, they are transferred to
topmost transaction's RB. So they can spill files.

--
Best Wishes,
Ashutosh Bapat

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Christophe Pettus (#1)
Re: Control flow in logical replication walsender

On Tue, Apr 30, 2024 at 11:28 PM Christophe Pettus <xof@thebuild.com> wrote:

I wanted to check my understanding of how control flows in a walsender doing logical replication. My understanding is that the (single) thread in each walsender process, in the simplest case, loops on:

1. Pull a record out of the WAL.
2. Pass it to the recorder buffer code, which,
3. Sorts it out into the appropriate in-memory structure for that transaction (spilling to disk as required), and then continues with #1, or,
4. If it's a commit record, it iteratively passes the transaction data one change at a time to,
5. The logical decoding plugin, which returns the output format of that plugin, and then,
6. The walsender sends the output from the plugin to the client. It cycles on passing the data to the plugin and sending it to the client until it runs out of changes in that transaction, and then resumes reading the WAL in #1.

In particular, I wanted to confirm that while it is pulling the reordered transaction and sending it to the plugin (and thence to the client), that particular walsender is *not* reading new WAL records or putting them in the reorder buffer.

The specific issue I'm trying to track down is an enormous pileup of spill files. This is in a non-supported version of PostgreSQL (v11), so an upgrade may fix it, but at the moment, I'm trying to find a cause and a mitigation.

In PG-14, we have added a feature in logical replication to stream
long in-progress transactions which should reduce spilling to a good
extent. You might want to try that.

--
With Regards,
Amit Kapila.

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#4)
Re: Control flow in logical replication walsender

On Tue, May 7, 2024 at 9:51 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, May 7, 2024 at 12:00 AM Christophe Pettus <xof@thebuild.com> wrote:

Thank you for the reply!

On May 1, 2024, at 02:18, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Is there a large transaction which is failing to be replicated repeatedly - timeouts, crashes on upstream or downstream?

AFAIK, no, although I am doing this somewhat by remote control (I don't have direct access to the failing system). This did bring up one other question, though:

Are subtransactions written to their own individual reorder buffers (and thus potentially spill files), or are they appended to the topmost transaction's reorder buffer?

IIRC, they have their own RB,

Right.

but once they commit, they are transferred to topmost transaction's RB.

I don't think they are transferred to topmost transaction's RB. We
perform a k-way merge between transactions/subtransactions to retrieve
the changes. See comments: "Support for efficiently iterating over a
transaction's and its subtransactions' changes..." in reorderbuffer.c

--
With Regards,
Amit Kapila.

#7Christophe Pettus
xof@thebuild.com
In reply to: Amit Kapila (#5)
Re: Control flow in logical replication walsender

On May 7, 2024, at 05:02, Amit Kapila <amit.kapila16@gmail.com> wrote:

In PG-14, we have added a feature in logical replication to stream
long in-progress transactions which should reduce spilling to a good
extent. You might want to try that.

That's been my principal recommendation (since that would also allow controlling the amount of logical replication working memory). Thank you!