PostgreSQL logical decoder output plugin - unchanged toast data

Started by Georgy Buranovover 7 years ago9 messageshackers
Jump to latest
#1Georgy Buranov
gburanov@gmail.com

Hi. I have a question about PostgreSQL logical decoder output plugin.

I am not specialist in Postgres at all, so maybe I miss some very
basic point. In the plugin, I want to always get all the values (even
those that are unchanged toast data)

When I try to get the datum that is internal on disk (here is code)

```
struct varlena *s = (struct varlena *)DatumGetPointer(origval);
struct varlena * ret = heap_tuple_untoast_attr(s);
Datum result = PointerGetDatum(PG_DETOAST_DATUM(ret));
```
it fails with no known snapshots error (from heap_tuple_untoast_attr).
My question is why is it that.

So, even theoretically it is not possible to the the varlena on disk
from logical replication plugin?

Mit freundlichen Grüßen,
Georgy Buranov

#2Andres Freund
andres@anarazel.de
In reply to: Georgy Buranov (#1)
Re: PostgreSQL logical decoder output plugin - unchanged toast data

Hi,

Hi,

On 2018-08-31 15:36:26 +0200, Georgy Buranov wrote:

I am not specialist in Postgres at all, so maybe I miss some very
basic point. In the plugin, I want to always get all the values (even
those that are unchanged toast data)

When I try to get the datum that is internal on disk (here is code)

```
struct varlena *s = (struct varlena *)DatumGetPointer(origval);
struct varlena * ret = heap_tuple_untoast_attr(s);
Datum result = PointerGetDatum(PG_DETOAST_DATUM(ret));
```
it fails with no known snapshots error (from heap_tuple_untoast_attr).
My question is why is it that.

Yes, that's not possible in general. On-disk toasted data for tuples
from the WAL are not guaranteed in any way to be retained. If that
weren't the case database tables would bloat while logical replication
is behind, and the sequential reads (i.e. fast) reads of logical
decoding would turn into random IO.

You can however alter the replication identity of tables to FULL. Then
the "old" tuple in change callbacks will have the full old tuple. But
that will increase the size of the WAL stream obviously.

So, even theoretically it is not possible to the the varlena on disk
from logical replication plugin?

Correct.

Greetings,

Andres Freund

#3Georgy Buranov
gburanov@gmail.com
In reply to: Andres Freund (#2)
Re: PostgreSQL logical decoder output plugin - unchanged toast data

Ok, thank you very much for your explanation,

maybe I need something else in my case.

As far as I understand, "On-disk toasted data for tuples from the WAL
are not guaranteed in any way to be retain", but still, the LATEST
value for the same cell should exist in postgres (in on-disk toast if
it is huge, or not).

If I cannot get access to the on-disk toasted data for tuple from the
WAL, can I have the access to the _latest_ value in this case
(hopefully I describe it correct)

Yes, that's not possible in general. On-disk toasted data for tuples
from the WAL are not guaranteed in any way to be retained. If that
weren't the case database tables would bloat while logical replication
is behind, and the sequential reads (i.e. fast) reads of logical
decoding would turn into random IO.

Mit freundlichen Grüßen,
Georgy Buranov

#4Andres Freund
andres@anarazel.de
In reply to: Georgy Buranov (#3)
Re: PostgreSQL logical decoder output plugin - unchanged toast data

Hi,

On 2018-08-31 16:55:37 +0200, Georgy Buranov wrote:

Ok, thank you very much for your explanation,

maybe I need something else in my case.

As far as I understand, "On-disk toasted data for tuples from the WAL
are not guaranteed in any way to be retain", but still, the LATEST
value for the same cell should exist in postgres (in on-disk toast if
it is huge, or not).

No, there's absolutely no such guarantee. The tuple could since have
been deleted, the column could have been updated to non-toasted, the
table could have been dropped, ...

If I cannot get access to the on-disk toasted data for tuple from the
WAL, can I have the access to the _latest_ value in this case
(hopefully I describe it correct)

Again, you can set REPLICA IDENTITY to FULL and it'll be there.

Greetings,

Andres Freund

#5Georgy Buranov
gburanov@gmail.com
In reply to: Andres Freund (#4)
Re: PostgreSQL logical decoder output plugin - unchanged toast data

Again, you can set REPLICA IDENTITY to FULL and it'll be there.

So, why I think this is complicated

* We use primary keys for all tables, so we do not need REPLICA
IDENTITY full actually. As far as I understand, it will make
master/slave replication ineffective as well
* I need the information about this primary key in rd_replidindex of
relation (to send it later to kafka). As far as I understand, with
REPLICA IDENTITY FULL the rd_replidindex will not be the primary key

#6Andres Freund
andres@anarazel.de
In reply to: Georgy Buranov (#5)
Re: PostgreSQL logical decoder output plugin - unchanged toast data

On 2018-08-31 17:34:02 +0200, Georgy Buranov wrote:

Again, you can set REPLICA IDENTITY to FULL and it'll be there.

So, why I think this is complicated

* We use primary keys for all tables, so we do not need REPLICA
IDENTITY full actually. As far as I understand, it will make
master/slave replication ineffective as well

What do you mean with "ineffective"? That there's more data in the WAL?
Sure. Otherwise I don't know what you could mean.

There's no free lunch :/

* I need the information about this primary key in rd_replidindex of
relation (to send it later to kafka). As far as I understand, with
REPLICA IDENTITY FULL the rd_replidindex will not be the primary key

If you want the pkey, that's in rd_pkindex. rd_replidindex will only
differ if the identity is manually set to another candidate key
(possibly because there's no pkey).

Greetings,

Andres Freund

#7Georgy Buranov
gburanov@gmail.com
In reply to: Andres Freund (#6)
Re: PostgreSQL logical decoder output plugin - unchanged toast data

Ok, I got false understanding that REPLICA IDENTITY is used for
something more than a WAL. This is basically not true. So, what I can
do

* Set the REPLICA IDENTITY to full, and in this case I can still get
the pk from rd_pkindex. In this case the WAL will be bigger, but we
will have all the values in it.
* Other solution is basically (since we know exact primary key) - get
the row, and take the latest value from there using SQL query on the
client side (other part of kafka)

Thank you very much fro your help

Show quoted text

What do you mean with "ineffective"? That there's more data in the WAL?
Sure. Otherwise I don't know what you could mean.

There's no free lunch :/

#8Georgy Buranov
gburanov@gmail.com
In reply to: Andres Freund (#6)
Re: PostgreSQL logical decoder output plugin - unchanged toast data

Hi Andres. Thank you very much for your help.

I tried the following solution and I have some problems.

* I have 9.6 postgres and I do not have separate field for rd_pkindex
* All I have is rd_replidindex field. Usually (when REPLICA IDENTITY
is NOT FULL), it still contains the primary key
* But in the case of REPLICA IDENTITY FULL - rd_replidindex is NULL
and rd_pkindex does not exist

So, is there a way to get the primary key on 9.6 postgres?

Show quoted text

If you want the pkey, that's in rd_pkindex. rd_replidindex will only
differ if the identity is manually set to another candidate key
(possibly because there's no pkey).

#9Andres Freund
andres@anarazel.de
In reply to: Georgy Buranov (#8)
Re: PostgreSQL logical decoder output plugin - unchanged toast data

Hi,

On 2018-09-04 11:45:10 +0200, Georgy Buranov wrote:

Hi Andres. Thank you very much for your help.

I tried the following solution and I have some problems.

* I have 9.6 postgres and I do not have separate field for rd_pkindex
* All I have is rd_replidindex field. Usually (when REPLICA IDENTITY
is NOT FULL), it still contains the primary key
* But in the case of REPLICA IDENTITY FULL - rd_replidindex is NULL
and rd_pkindex does not exist

So, is there a way to get the primary key on 9.6 postgres?

You have to to query the syscache yourself, in that case. I know
nothing about what you're doing in your output plugin. In the ones I
wrote there's a side-cache for per-relation output plugin specific data,
do you have something like that? You can probably just crib the code
for the lookup from dblink.c:get_pkey_attnames(). If you have a cache,
itmight be worth caching the results.

Greetings,

Andres Freund