PgOutput Replication Message Format - Differentiate between explicit NULL and Omitted Columns during Insert

Started by Pranav Hegdeabout 3 years ago2 messagesgeneral
Jump to latest
#1Pranav Hegde
pranavh4@gmail.com

Currently in the PgOutput messages for Insert statements, there is no way
to differentiate if the column was explicitly set to null in the insert
statement, or whether that column was omitted in the insert statement and
thus set to null (or default value)

For example, consider the below table:

*CREATE TABLE test_table ( id int primary key, text_col_1 text
default 'default_text_1', text_col_2 text default 'default_text_2');*

Now if we insert a row using:

*INSERT INTO test_table (id, text_col_1) VALUES (1, null);*
In the above query I am explicitly setting *text_col_1* to null, and I am
omitting *text_col_2* which will be populated with the default value.
The PgOutput replication message classifies both *text_col_1* and
*text_col_2* values as NULL. (Refer Logical Replication Message Format
<https://www.postgresql.org/docs/current/protocol-logicalrep-message-formats.html&gt;
)
Thus we are not able to identify which column was explicitly set to null
and which column was set to the default value.
This causes issues when we are parsing the message, especially in libraries
such as Debezium, where it returns the default value for the column, even
though in the actual SQL query we set that column to be explicitly null.
Thus if there was a way to differentiate between the two cases in the
replication message itself, then the appropriate action can be taken in
downstream pipelines (set to null or set to default value)

Was wondering if there is a way to overcome this issue in the current
pgoutput plugin or the code needs to be changed to account for this case.
Any help would be appreciated

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pranav Hegde (#1)
Re: PgOutput Replication Message Format - Differentiate between explicit NULL and Omitted Columns during Insert

Pranav Hegde <pranavh4@gmail.com> writes:

Currently in the PgOutput messages for Insert statements, there is no way
to differentiate if the column was explicitly set to null in the insert
statement, or whether that column was omitted in the insert statement and
thus set to null (or default value)

Nope. Replication works at the data level, it has no idea how that
data was formed originally.

Thus we are not able to identify which column was explicitly set to null
and which column was set to the default value.
This causes issues when we are parsing the message, especially in libraries
such as Debezium, where it returns the default value for the column, even
though in the actual SQL query we set that column to be explicitly null.

If you have code that changes the data supplied, that code is wrong.
We are not likely to change the replication protocol to support it.

regards, tom lane