Object identifier types in logical replication binary mode

Started by Emre Hasegeliabout 1 year ago4 messages
#1Emre Hasegeli
emre@hasegeli.com

I encountered a problem with logical replication.

The object identifier types, regclass, regproc, regtype, etc. are
transferred as an oid in the binary mode. However, the subscriber
expects them as text which makes sense because it cannot do anything
with the oids. I am getting "invalid byte sequence for encoding
"UTF8": 0x00" when I try this.

I think the publisher should not transfer these in binary just like
the data types from extensions. Any opinions?

#2Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Emre Hasegeli (#1)
1 attachment(s)
Re: Object identifier types in logical replication binary mode

Hi,

On Thu, 14 Nov 2024 at 21:22, Emre Hasegeli <emre@hasegeli.com> wrote:

I encountered a problem with logical replication.

The object identifier types, regclass, regproc, regtype, etc. are
transferred as an oid in the binary mode. However, the subscriber
expects them as text which makes sense because it cannot do anything
with the oids. I am getting "invalid byte sequence for encoding
"UTF8": 0x00" when I try this.

I think the publisher should not transfer these in binary just like
the data types from extensions. Any opinions?

I tried to test the scenario described. Steps I followed:
1. Create a logical replication setup on table t1 with some initial data.
2. Columns in table t1 are of types regclass, regproc and regtype
3. Now, I inserted some records in the table t1 and also did some
updates on the table t1.

I tested with both 'binary = true' and 'binary = false' option while
creating a subscription. For me replication is working fine and I am
not getting any errors in both the cases.
I have also attached the test script.

Am I trying the correct steps? Can you share a reproducible test case?
Also, which version are you facing this issue?
Have you set any GUC parameters/ Encoding/ Collation?

Thanks and Regards,
Shlok Kyal

Attachments:

test.shtext/x-sh; charset=US-ASCII; name=test.shDownload
#3Emre Hasegeli
emre@hasegeli.com
In reply to: Shlok Kyal (#2)
1 attachment(s)
Re: Object identifier types in logical replication binary mode

I tested with both 'binary = true' and 'binary = false' option while
creating a subscription. For me replication is working fine and I am
not getting any errors in both the cases.
I have also attached the test script.

I modified your test script to demonstrate the problem.

I created another table before the tested one to make sure oids are
different between the nodes, and used the user-created table in the
test.

I also had to change the subscriber data type to text to get "invalid
byte sequence" error. Still the same script works without the error
with (binary = false).

My analysis on the original post was wrong. The subscriber handles it
as an oid. Though, it's still not clear to me this is a desirable
behaviour for the users, because oid's of the objects differ.

Attachments:

test.shtext/x-sh; charset=US-ASCII; name=test.shDownload
#4Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Emre Hasegeli (#3)
1 attachment(s)
RE: Object identifier types in logical replication binary mode

Dear Emre,

I modified your test script to demonstrate the problem.

I could reproduce the error with your script, thanks. I also could reproduce the
same error with the simplified version, see attached.

I feel this is a normal behavior which can happen without the logical replication.
I could reproduce only by a single instance [1]``` postgres=# CREATE TABLE t1 (a regclass); -- creates t1 with regclass datatype CREATE TABLE postgres=# INSERT INTO t1 VALUES ('t1'); -- insert a tuple INSERT 0 1 postgres=# COPY t1 TO '/path/to/binary.dat' WITH ( FORMAT binary ); -- copy the tuple to somewhere COPY 1 postgres=# CREATE TABLE t2 (a text); -- creates t1 with TEXT datatype CREATE TABLE postgres=# COPY t2 FROM '/path/to/binary.dat' WITH ( FORMAT binary ); -- do copy from ERROR: invalid byte sequence for encoding "UTF8": 0x00 CONTEXT: COPY t2, line 1, column a ```.

My analysis on the original post was wrong. The subscriber handles it
as an oid.

IIUC, this is what happened here. Maybe you have the same picture...

1. Publisher exported a data with the binary format. The data (regclass) was
represented as the oid.
2. Subscriber received the data. Since the destination table had text datatype,
it tried to understand as the text format.
3. Unfortunately, the original data had a byte string like "0x00". This could not
be used for the normal text so that an error happened.

The documentation has already pointed out that binary format has lesser portability
than textual one [2]https://www.postgresql.org/docs/devel/sql-createsubscription.html#SQL-CREATESUBSCRIPTION-PARAMS-WITH-BINARY, and you seem to encounter the issue.

Though, it's still not clear to me this is a desirable
behaviour for the users, because oid's of the objects differ.

But it is not also clear that it is OK to transform the data to the string - it
is quite depends on the use-case. Personally, OID is meaningful only on the
instance so such tables should not be replicated to others. Can you exclude such
tables or attributes by the CREATE PUBLICATION?

[1]: ``` postgres=# CREATE TABLE t1 (a regclass); -- creates t1 with regclass datatype CREATE TABLE postgres=# INSERT INTO t1 VALUES ('t1'); -- insert a tuple INSERT 0 1 postgres=# COPY t1 TO '/path/to/binary.dat' WITH ( FORMAT binary ); -- copy the tuple to somewhere COPY 1 postgres=# CREATE TABLE t2 (a text); -- creates t1 with TEXT datatype CREATE TABLE postgres=# COPY t2 FROM '/path/to/binary.dat' WITH ( FORMAT binary ); -- do copy from ERROR: invalid byte sequence for encoding "UTF8": 0x00 CONTEXT: COPY t2, line 1, column a ```
```
postgres=# CREATE TABLE t1 (a regclass); -- creates t1 with regclass datatype
CREATE TABLE
postgres=# INSERT INTO t1 VALUES ('t1'); -- insert a tuple
INSERT 0 1
postgres=# COPY t1 TO '/path/to/binary.dat' WITH ( FORMAT binary ); -- copy the tuple to somewhere
COPY 1
postgres=# CREATE TABLE t2 (a text); -- creates t1 with TEXT datatype
CREATE TABLE
postgres=# COPY t2 FROM '/path/to/binary.dat' WITH ( FORMAT binary ); -- do copy from
ERROR: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: COPY t2, line 1, column a
```
[2]: https://www.postgresql.org/docs/devel/sql-createsubscription.html#SQL-CREATESUBSCRIPTION-PARAMS-WITH-BINARY

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

test_1225.shapplication/octet-stream; name=test_1225.shDownload