Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row
Hello,
I have a postgres database (major version 13) and am doing CDC by using a
replication slot with pgoutput. In our data lake, we see that there are
multiple updates (3 in this case) happening to the same row as part of the
same transaction. This doesn't make sense if we look at them as separate
updates. The expectation was that the row was updated once and a single CDC
record was pushed out. I haven't completely ruled out whether multiple
updates happened in that transaction but I wanted to ask the community if
it is possible that a single update statement could spill over as multiple
CDC update records by pgoutput / postgres ?
If yes, could you possibly point to the testcases or code where this might
happen? Any blogs or suggestions are welcome. LMK if you need further
information
P.S: Each update seen in the data lake changes at least one field in the
row. The entire row with all the columns are pushed out for the 3 updates
seen in the data lake. From the application perspective, it makes sense if
the 3 updates were merged into a single update record
Regards,
Akashkiran
Can someone take a look at this ? Any help would be appreciated
Regards,
Akashkiran
On Tue, Jul 29, 2025 at 3:37 PM Akashkiran Shivakumar <
akashkiran.foss@gmail.com> wrote:
Show quoted text
Hello,
I have a postgres database (major version 13) and am doing CDC by using a
replication slot with pgoutput. In our data lake, we see that there are
multiple updates (3 in this case) happening to the same row as part of the
same transaction. This doesn't make sense if we look at them as separate
updates. The expectation was that the row was updated once and a single CDC
record was pushed out. I haven't completely ruled out whether multiple
updates happened in that transaction but I wanted to ask the community if
it is possible that a single update statement could spill over as multiple
CDC update records by pgoutput / postgres ?If yes, could you possibly point to the testcases or code where this might
happen? Any blogs or suggestions are welcome. LMK if you need further
informationP.S: Each update seen in the data lake changes at least one field in the
row. The entire row with all the columns are pushed out for the 3 updates
seen in the data lake. From the application perspective, it makes sense if
the 3 updates were merged into a single update recordRegards,
Akashkiran
possible that a single update statement could spill over as multiple CDC
update records by pgoutput / postgres ?
Basically, no. If I understand your questions correctly, the answer seems
to point to your application making multiple updates, which will lead to
multiple UPDATE records in the WAL, and thus multiple UPDATEs via pgoutput.
But the basic mapping will still be one heap update = one WAL entry = one
pgoutput update. (It would be good to get familiar with the pg_waldump
program and examine the WAL yourself to get an idea of the exact changes
that are generated for various situations.) While there are cases in which
a single SQL update can generate multiple WAL entries (e.g. indexes,
TOAST), each individual update is always going to generate at least one WAL
record. Due to the way that Postgres works, each update gets put into WAL
immediately - Postgres cannot merge updates that happen to occur in the
same transaction. Nor does pgoutput combine those. So even though at the
end of the transaction a single row gets changed at the logical level,
there can be quite a number of changes at the WAL level. If you want to
smush together pgoutput's updates into a single call, you are welcome to do
so, of course - as long as it's in the same transaction. But I'd not do
that, as you are now messing with the order of events as they happened on
your primary. In other words, there may have been a reason it was updated
col by col on the primary, and if that bothers you, the primary is the
place to fix it, not stitching it together after the fact.
Cheers,
Greg