[PATCH] Present all committed transaction to the output plugin
Hi,
attached is a patch that I think is cleaning up the API between Postgres
and the logical decoding plugin. Up until now, not only transactions
rolled back, but also some committed transactions were filtered and not
presented to the output plugin. While it is documented that aborted
transactions are not decoded, the second exception has not been documented.
The difference is with committed empty transactions that have a snapshot
versus those that do not. I think that's arbitrary and propose to
remove this distinction, so that all committed transactions are decoded.
In the case of decoding a two-phase transaction, I argue that this is
even more important, as the gid potentially carries information.
Please consider the attached patch, which drops the mentioned filter.
It also adjusts tests to show the difference and provides a minor
clarification to the documentation.
Regards
Markus
Attachments:
0001-Present-committed-transactions-to-output-plugin.patchtext/x-patch; charset=UTF-8; name=0001-Present-committed-transactions-to-output-plugin.patchDownload+147-35
On Fri, Feb 19, 2021 at 6:06 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
Hi,
attached is a patch that I think is cleaning up the API between Postgres
and the logical decoding plugin. Up until now, not only transactions
rolled back, but also some committed transactions were filtered and not
presented to the output plugin. While it is documented that aborted
transactions are not decoded, the second exception has not been documented.The difference is with committed empty transactions that have a snapshot
versus those that do not. I think that's arbitrary and propose to
remove this distinction, so that all committed transactions are decoded.
What exactly is the use case to send empty transactions with or
without prepared? In the past, there was a complaint [1]/messages/by-id/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com that such
transactions increase the network traffic.
[1]: /messages/by-id/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
--
With Regards,
Amit Kapila.
On 20.02.21 12:15, Amit Kapila wrote:
What exactly is the use case to send empty transactions with or
without prepared?
I'm not saying that output plugins should *send* empty transactions to
the replica. I rather agree that this indeed is not wanted in most cases.
However, that's not what the patch changes. It just moves the decision
to the output plugin, giving it more flexibility. And possibly allowing
it to still take action. For example, in case of a distributed
two-phase commit scenario, where the publisher waits after its local
PREPARE for replicas to also PREPARE. If such a prepare doesn't even
get to the output plugin, that won't work. Not even thinking of a
PREPARE on one node followed by a COMMIT PREPARED from a different node.
It simply is not the business of the decoder to decide what to do with
empty transactions.
Plus, given the decoder does not manage to reliably filter all empty
transactions, an output plugin might want to implement its own
filtering, anyway (point in case: contrib/test_decoding and its
'skip_empty_xacts' option - that actually kind of implies it would be
possible to not skip them - as does the documentation). So I'm rather
wondering: what's the use case of filtering some, but not all empty
transactions (on the decoder side)?
Regards
Markus
Hi,
On 2021-02-20 13:48:49 +0100, Markus Wanner wrote:
However, that's not what the patch changes. It just moves the decision to
the output plugin, giving it more flexibility. And possibly allowing it to
still take action.
It's not free though - there's plenty workloads where there's an xid but
no other WAL records for transactions. Threading those through the
output plugin does increase the runtime cost. And because such
transactions will typically not incur a high cost on the primary
(e.g. in case of unlogged tables, there'll be a commit record, but often
the transaction will not wait for the commit record to be flushed to
disk), increasing the replication overhead isn't great.
For example, in case of a distributed two-phase commit
scenario, where the publisher waits after its local PREPARE for replicas to
also PREPARE.
Why is that ever interesting to do in the case of empty transactions?
Due to the cost of doing remote PREPAREs ISTM you'd always want to
implement the optimization of not doing so for empty transactions.
So I'm rather wondering: what's the use case of filtering some, but
not all empty transactions (on the decoder side)?
I'm wondering the opposite: What's a potential use case for handing
"trivially empty" transactions to the output plugin that's worth
incurring some cost for everyone?
Greetings,
Andres Freund
On 20.02.21 21:08, Andres Freund wrote:
It's not free though
Agreed. It's an additional call to a callback. Do you think that's
acceptable if limited to two-phase transactions only?
I'm wondering the opposite: What's a potential use case for handing
"trivially empty" transactions to the output plugin that's worth
incurring some cost for everyone?
Outlined in my previous mail: prepare the transaction on one node,
commit it on another one. The PREPARE of a transaction is an event a
user may well want to have replicated, without having to worry about
whether or not the transaction happens to be empty.
[ Imagine: ERROR: transaction cannot be replicated because it's empty.
HINT: add a dummy UPDATE so that Postgres always has
something to replicate, whatever else your app does
or does not do in the transaction. ]
Regards
Markus
Hi,
On 2021-02-20 21:44:30 +0100, Markus Wanner wrote:
On 20.02.21 21:08, Andres Freund wrote:
It's not free though
Agreed. It's an additional call to a callback.
If it were just a single indirection function call I'd not be
bothered. But we need to do a fair bit mroe than that
(c.f. ReorderBufferProcessTXN()).
Do you think that's acceptable if limited to two-phase transactions
only?
Cost-wise, yes - a 2pc prepare/commit is expensive enough that
comparatively the replay cost is unlikely to be relevant. Behaviourally
I'm still not convinced it's useful.
I'm wondering the opposite: What's a potential use case for handing
"trivially empty" transactions to the output plugin that's worth
incurring some cost for everyone?Outlined in my previous mail: prepare the transaction on one node, commit it
on another one. The PREPARE of a transaction is an event a user may well
want to have replicated, without having to worry about whether or not the
transaction happens to be empty.
I read the previous mails in this thread, and I don't really see an
explanation for why this is something actually useful. When is a
transaction without actual contents interesting to replicate? I don't
find the "gid potentially carries information" particularly convincing.
[ Imagine: ERROR: transaction cannot be replicated because it's empty.
HINT: add a dummy UPDATE so that Postgres always has
something to replicate, whatever else your app does
or does not do in the transaction. ]
Meh.
Greetings,
Andres Freund
On 21.02.21 03:04, Andres Freund wrote:
Cost-wise, yes - a 2pc prepare/commit is expensive enough that
comparatively the replay cost is unlikely to be relevant.
Good. I attached an updated patch eliminating only the filtering for
empty two-phase transactions.
Behaviourally I'm still not convinced it's useful.
I don't have any further argument than: If you're promising to replicate
two phases, I expect the first phase to be replicated individually.
A database state with a transaction prepared and identified by
'woohoo-roll-me-back-if-you-can' is not the same as a state without it.
Even if the transaction is empty, or if you're actually going to roll
it back. And therefore possibly ending up at the very same state without
any useful effect.
Regards
Markus
Attachments:
0001-Present-empty-prepares-to-the-output-plugin.patchtext/x-patch; charset=UTF-8; name=0001-Present-empty-prepares-to-the-output-plugin.patchDownload+29-17
On 2/21/21 11:05 AM, Markus Wanner wrote:
On 21.02.21 03:04, Andres Freund wrote:
Cost-wise, yes - a 2pc prepare/commit is expensive enough that
comparatively the replay cost is unlikely to be relevant.Good. I attached an updated patch eliminating only the filtering for
empty two-phase transactions.Behaviourally I'm still not convinced it's useful.
I don't have any further argument than: If you're promising to replicate
two phases, I expect the first phase to be replicated individually.A database state with a transaction prepared and identified by
'woohoo-roll-me-back-if-you-can' is not the same as a state without it.
Even if the transaction is empty, or if you're actually going to roll
it back. And therefore possibly ending up at the very same state without
any useful effect.
IMHO it's quite weird to handle the 2PC and non-2PC cases differently.
If the argument is that this is expensive, it'd be good to quantify
that, somehow. If there's a workload with significant fraction of such
empty transactions, does that mean +1% CPU usage, +10% or more?
Why not to make this configurable, i.e. the output plugin might indicate
whether it's interested in empty transactions or not. If not, we can do
what we do now. Otherwise the empty transactions would be passed to the
output plugin.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company