REPLICA IDENTITY FULL

Started by Tatsuo Ishiialmost 9 years ago21 messageshackers
Jump to latest
#1Tatsuo Ishii
t-ishii@sra.co.jp

While playing around with logical replication, I am confused by the
behavior of REPLICA IDENTITY FULL.

First I created a table having 2 INT columns with no keys. If I
execute ALTER TABLE REPLICA IDENTITY FULL, replication for
UPDATE/DELETE works.

In the session below, port 11002 is the publisher side, while 11003 is
the subscriber side.

+ psql -e -p 11002 -c create table t3(i int, j int); test
create table t3(i int, j int);
CREATE TABLE
+ psql -e -p 11003 -c create table t3(i int, j int); test
create table t3(i int, j int);
CREATE TABLE
+ psql -e -p 11002 -c alter table t3 replica identity full test
alter table t3 replica identity full
ALTER TABLE
+ psql -e -p 11002 -c insert into t3 values(1,1); test
insert into t3 values(1,1);
INSERT 0 1
+ psql -e -p 11002 -c insert into t3 values(2,2); test
insert into t3 values(2,2);
INSERT 0 1
+ psql -e -p 11002 -c insert into t3 values(2,2); test
insert into t3 values(2,2);
INSERT 0 1
+ psql -e -p 11003 -c ALTER SUBSCRIPTION mysub REFRESH PUBLICATION; test
ALTER SUBSCRIPTION mysub REFRESH PUBLICATION;
NOTICE:  added subscription for table public.t3
ALTER SUBSCRIPTION
+ sleep 3
+ psql -e -p 11003 -c select * from t3; test
select * from t3;
 i | j 
---+---
 1 | 1
 2 | 2
 2 | 2
(3 rows)
+ psql -e -p 11002 -c update t3 set j = 10 where i = 2 and j = 2; test
update t3 set j = 10 where i = 2 and j = 2;
UPDATE 2
+ psql -e -p 11003 -c select * from t3; test
select * from t3;
 i | j  
---+----
 1 |  1
 2 | 10
 2 | 10
(3 rows)
+ psql -e -p 11002 -c delete from t3 where i = 2; test
delete from t3 where i = 2;
DELETE 2
+ psql -e -p 11003 -c select * from t3; test
Pager usage is off.
select * from t3;
 i | j 
---+---
 1 | 1
(1 row)

However, if a table has text columns, UPDATE/DELETE replication does
not work any more. Am I missing something?

+ psql -e -p 11002 -c create table t4(i text, j text); test
create table t4(i text, j text);
CREATE TABLE
+ psql -e -p 11003 -c create table t4(i text, j text); test
create table t4(i text, j text);
CREATE TABLE
+ psql -e -p 11002 -c alter table t4 replica identity full test
alter table t4 replica identity full
ALTER TABLE
+ psql -e -p 11002 -c insert into t4 values('a','a'); test
insert into t4 values('a','a');
INSERT 0 1
+ psql -e -p 11002 -c insert into t4 values('b','b'); test
insert into t4 values('b','b');
INSERT 0 1
+ psql -e -p 11002 -c insert into t4 values('b','b'); test
insert into t4 values('b','b');
INSERT 0 1
+ psql -e -p 11003 -c ALTER SUBSCRIPTION mysub REFRESH PUBLICATION; test
ALTER SUBSCRIPTION mysub REFRESH PUBLICATION;
NOTICE:  added subscription for table public.t4
ALTER SUBSCRIPTION
+ sleep 3
+ psql -e -p 11003 -c select * from t4; test
select * from t4;
 i | j 
---+---
 a | a
 b | b
 b | b
(3 rows)
+ psql -e -p 11002 -c update t4 set j = 'c' where i = 'b' and j = 'b'; test
update t4 set j = 'c' where i = 'b' and j = 'b';
UPDATE 2
+ psql -e -p 11003 -c select * from t4; test
select * from t4;
 i | j 
---+---
 a | a
 b | b
 b | b
(3 rows)
+ psql -e -p 11002 -c delete from t4 where i = 'b'; test
delete from t4 where i = 'b';
DELETE 2
+ psql -e -p 11003 -c select * from t4; test
select * from t4;
 i | j 
---+---
 a | a
 b | b
 b | b
(3 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Tatsuo Ishii (#1)
Re: REPLICA IDENTITY FULL

On 6/18/17 23:11, Tatsuo Ishii wrote:

While playing around with logical replication, I am confused by the
behavior of REPLICA IDENTITY FULL.

However, if a table has text columns, UPDATE/DELETE replication does
not work any more. Am I missing something?

This is apparently because for replica identity full the comparison of
the search key against the tuple value goes through datumIsEqual(),
which doesn't work for TOAST values.

We might be able to refine that, but there is a general problem that
without an index and an operator class, we are just doing our random
best to match the values.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Peter Eisentraut (#2)
Re: REPLICA IDENTITY FULL

This is apparently because for replica identity full the comparison of
the search key against the tuple value goes through datumIsEqual(),
which doesn't work for TOAST values.

We might be able to refine that, but there is a general problem that
without an index and an operator class, we are just doing our random
best to match the values.

In other word, pass-by-value types work in this case? If so, we can
document it or throw an error while executing ALTER REPLICA IDENTITY
FULL on tables consisting of non pass-by-values column types to
mitigate the problem.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Peter Eisentraut (#2)
Re: REPLICA IDENTITY FULL

On 20 June 2017 at 06:53, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 6/18/17 23:11, Tatsuo Ishii wrote:

While playing around with logical replication, I am confused by the
behavior of REPLICA IDENTITY FULL.

However, if a table has text columns, UPDATE/DELETE replication does
not work any more. Am I missing something?

This is apparently because for replica identity full the comparison of
the search key against the tuple value goes through datumIsEqual(),
which doesn't work for TOAST values.

Personally I think REPLICA IDENTITY FULL conflates two related things.

One is "record the whole old value of the tuple in xlog so logical
decoding can access it".

Quite separately, there is "treat the full tuple as the replica identity key".

I frequently want to be able to use REPLICA IDENTITY DEFAULT, but also
record the whole old tuple not just keys, so they can be used in
conflict processing etc.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Craig Ringer (#4)
Re: REPLICA IDENTITY FULL

On 2017-06-20 09:45:27 +0800, Craig Ringer wrote:

I frequently want to be able to use REPLICA IDENTITY DEFAULT, but also
record the whole old tuple not just keys, so they can be used in
conflict processing etc.

What stops you from automatically using a candidate key if available?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Craig Ringer
craig@2ndquadrant.com
In reply to: Andres Freund (#5)
Re: REPLICA IDENTITY FULL

On 20 June 2017 at 09:47, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-20 09:45:27 +0800, Craig Ringer wrote:

I frequently want to be able to use REPLICA IDENTITY DEFAULT, but also
record the whole old tuple not just keys, so they can be used in
conflict processing etc.

What stops you from automatically using a candidate key if available?

Nothing, and that's what I do. I just think it's a bit fuzzy. Maybe
I'm misunderstanding the purpose of REPLICA IDENTITY, but I read it as
saying "this is the replication key for this relation". If you use
REPLICA IDENTITY FULL then the replication tool goes "nah, I think
I'll pick the PK instead" is that really right?

It's not a major issue.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Craig Ringer (#6)
Re: REPLICA IDENTITY FULL

Nothing, and that's what I do. I just think it's a bit fuzzy. Maybe
I'm misunderstanding the purpose of REPLICA IDENTITY, but I read it as
saying "this is the replication key for this relation".

Yes, that's my understanding too. However, the feature may or may not
work depending on the data types of columns, probably I will not
recommend users/my customers to use it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Tatsuo Ishii (#7)
Re: REPLICA IDENTITY FULL

On 2017-06-20 11:29:06 +0900, Tatsuo Ishii wrote:

Nothing, and that's what I do. I just think it's a bit fuzzy. Maybe
I'm misunderstanding the purpose of REPLICA IDENTITY, but I read it as
saying "this is the replication key for this relation".

Yes, that's my understanding too. However, the feature may or may not
work depending on the data types of columns, probably I will not
recommend users/my customers to use it.

I'm not sure how datatypes are playing into this?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Andres Freund (#8)
Re: REPLICA IDENTITY FULL

Yes, that's my understanding too. However, the feature may or may not
work depending on the data types of columns, probably I will not
recommend users/my customers to use it.

I'm not sure how datatypes are playing into this?

For example, if the table consists of only INT types, REPLICA IDENTITY
FULL works with UPDATE/DELETE (i.e. replicated), but if some of them
are TEXT types, then UPDATE/DELETE does not work.

See up thread for more details.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Tatsuo Ishii (#9)
Re: REPLICA IDENTITY FULL

On 2017-06-20 11:46:13 +0900, Tatsuo Ishii wrote:

Yes, that's my understanding too. However, the feature may or may not
work depending on the data types of columns, probably I will not
recommend users/my customers to use it.

I'm not sure how datatypes are playing into this?

For example, if the table consists of only INT types, REPLICA IDENTITY
FULL works with UPDATE/DELETE (i.e. replicated), but if some of them
are TEXT types, then UPDATE/DELETE does not work.

See up thread for more details.

Right, but that's just a bug, nothing else.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Andres Freund (#10)
Re: REPLICA IDENTITY FULL

For example, if the table consists of only INT types, REPLICA IDENTITY
FULL works with UPDATE/DELETE (i.e. replicated), but if some of them
are TEXT types, then UPDATE/DELETE does not work.

See up thread for more details.

Right, but that's just a bug, nothing else.

If my understanding is correct, it would not be easy to fix, no?

We might be able to refine that, but there is a general problem that
without an index and an operator class, we are just doing our random
best to match the values.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Tatsuo Ishii (#11)
Re: REPLICA IDENTITY FULL

On 2017-06-20 11:52:10 +0900, Tatsuo Ishii wrote:

If my understanding is correct, it would not be easy to fix, no?

We might be able to refine that, but there is a general problem that
without an index and an operator class, we are just doing our random
best to match the values.

I don't see the problem as being big. We should just look up the
default btree opclass and use the relevant operator. That's a how a
number of things already work.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Peter Eisentraut
peter_e@gmx.net
In reply to: Andres Freund (#12)
Re: REPLICA IDENTITY FULL

On 6/20/17 00:10, Andres Freund wrote:

On 2017-06-20 11:52:10 +0900, Tatsuo Ishii wrote:

If my understanding is correct, it would not be easy to fix, no?

We might be able to refine that, but there is a general problem that
without an index and an operator class, we are just doing our random
best to match the values.

I don't see the problem as being big. We should just look up the
default btree opclass and use the relevant operator. That's a how a
number of things already work.

Patch for that.

Any thoughts about keeping datumAsEqual() as a first check? I did some
light performance tests, but it was inconclusive.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Fix-replication-with-replica-identity-full.patchtext/plain; charset=UTF-8; name=0001-Fix-replication-with-replica-identity-full.patch; x-mac-creator=0; x-mac-type=0Download+64-10
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#13)
Re: REPLICA IDENTITY FULL

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Any thoughts about keeping datumAsEqual() as a first check? I did some
light performance tests, but it was inconclusive.

Seems like it would tend to be a win if, in fact, the values are
usually equal. If they're usually not, then it's a loser. Do
we have any feeling for which case is more common?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#14)
Re: REPLICA IDENTITY FULL

Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Any thoughts about keeping datumAsEqual() as a first check? I did some
light performance tests, but it was inconclusive.

Seems like it would tend to be a win if, in fact, the values are
usually equal. If they're usually not, then it's a loser. Do
we have any feeling for which case is more common?

What about keeping the datumIsEqual test for fixed length pass-by-value
types (I'm mostly thinking about fixed-width integers here ...) and
always use the full blown equality comparison for anything more
elaborate than that?

Though, thinking about it, maybe the datumIsEqual test would give the
wrong answer for floating point values, and there'd be no fallback to
equality with the logic I propose. But then maybe that's all right ---
who in their right minds would use floating point columns as part of
replica identity ...?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#15)
Re: REPLICA IDENTITY FULL

On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:

Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Any thoughts about keeping datumAsEqual() as a first check? I did some
light performance tests, but it was inconclusive.

Seems like it would tend to be a win if, in fact, the values are
usually equal. If they're usually not, then it's a loser. Do
we have any feeling for which case is more common?

Seems like a premature optimization to me - if you care about
performance and do this frequently, you're not going to end up using
FULL. If we want to performance optimize, it'd probably better to
lookup candidate keys and use those if available.

Though, thinking about it, maybe the datumIsEqual test would give the
wrong answer for floating point values, and there'd be no fallback to
equality with the logic I propose. But then maybe that's all
right ---

I don't think it'd be ok, we shouldn't just do the wrong thing because
we think it's unlikely to happen.

who in their right minds would use floating point columns as part of
replica identity ...?

Since this is FULL, it'll be all columns...

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#16)
Re: REPLICA IDENTITY FULL

Andres Freund wrote:

On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:

Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Any thoughts about keeping datumAsEqual() as a first check? I did some
light performance tests, but it was inconclusive.

Seems like it would tend to be a win if, in fact, the values are
usually equal. If they're usually not, then it's a loser. Do
we have any feeling for which case is more common?

Seems like a premature optimization to me - if you care about
performance and do this frequently, you're not going to end up using
FULL. If we want to performance optimize, it'd probably better to
lookup candidate keys and use those if available.

I can get behind that argument.

who in their right minds would use floating point columns as part of
replica identity ...?

Since this is FULL, it'll be all columns...

Yeah, I was thinking you shouldn't have floating point columns if you're
going to use FULL as identity. But you're tautologically right: doing
the wrong thing is likely not the right thing to do.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#17)
Re: REPLICA IDENTITY FULL

On 6/23/17 13:14, Alvaro Herrera wrote:

Andres Freund wrote:

On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:

Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Any thoughts about keeping datumAsEqual() as a first check? I did some
light performance tests, but it was inconclusive.

Seems like it would tend to be a win if, in fact, the values are
usually equal. If they're usually not, then it's a loser. Do
we have any feeling for which case is more common?

Seems like a premature optimization to me - if you care about
performance and do this frequently, you're not going to end up using
FULL. If we want to performance optimize, it'd probably better to
lookup candidate keys and use those if available.

I can get behind that argument.

Thanks for the feedback. I have committed it after removing the
datumIsEqual() call.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Peter Eisentraut (#18)
Re: REPLICA IDENTITY FULL

Thanks for the feedback. I have committed it after removing the
datumIsEqual() call.

Thanks for the patch! I confirmed my examples now work.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Noah Misch
noah@leadboat.com
In reply to: Peter Eisentraut (#18)
Re: REPLICA IDENTITY FULL

On Fri, Jun 23, 2017 at 03:45:48PM -0400, Peter Eisentraut wrote:

On 6/23/17 13:14, Alvaro Herrera wrote:

Andres Freund wrote:

On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:

Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Any thoughts about keeping datumAsEqual() as a first check? I did some
light performance tests, but it was inconclusive.

Seems like it would tend to be a win if, in fact, the values are
usually equal. If they're usually not, then it's a loser. Do
we have any feeling for which case is more common?

Seems like a premature optimization to me - if you care about
performance and do this frequently, you're not going to end up using
FULL. If we want to performance optimize, it'd probably better to
lookup candidate keys and use those if available.

I can get behind that argument.

Thanks for the feedback. I have committed it after removing the
datumIsEqual() call.

While reviewing this patch, I noticed a couple of nearby defects:

- RelationFindReplTupleSeq() says "Note that this stops on the first matching
tuple.", but that's not the case. It visits every row in the table, and it
uses the last match. The claimed behavior sounds more attractive.

- RelationFindReplTupleSeq() has comment "/* Start an index scan. */", an
inapplicable copy-paste from RelationFindReplTupleByIndex().

#21Peter Eisentraut
peter_e@gmx.net
In reply to: Noah Misch (#20)