logical decoding issue with concurrent ALTER TYPE
Hi all,
A colleague Drew Callahan (in CC) has discovered that the logical
decoding doesn't handle syscache invalidation messages properly that
are generated by other transactions. Here is example (I've attached a
patch for isolation test),
-- Setup
CREATE TYPE comp AS (f1 int, f2 text);
CREATE TABLE tbl3(val1 int, val2 comp);
SELECT pg_create_logical_replication_slot('s', 'test_decoding');
-- Session 1
BEGIN;
INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2'));
-- Session 2
ALTER TYPE comp ADD ATTRIBUTE f3 int;
INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2', 3));
COMMIT;
pg_logical_slot_get_changes() returns:
BEGIN
table public.tbl3: INSERT: val1[integer]:1 val2[comp]:'(1,2)'
table public.tbl3: INSERT: val1[integer]:1 val2[comp]:'(1,2)'
COMMIT
However, the logical decoding should reflect the result of ALTER TYPE
and the val2 of the second INSERT output should be '(1,2,3)'.
The root cause of this behavior is that while ALTER TYPE can be run
concurrently to INSERT, the logical decoding doesn't handle cache
invalidation properly, and it got a cache hit of stale data (of
typecache in this case). Unlike snapshots that are stored in the
transaction’s reorder buffer changes, the invalidation messages of
other transactions are not distributed. As a result, the snapshot
becomes moot when we get a cache hit of stale data due to not
processing the invalidation messages again. This is not an issue for
ALTER TABLE and the like due to 2 phase locking and taking an
AccessExclusive lock. The issue with DMLs and ALTER TYPE has been
discovered but there might be other similar cases.
As far as I tested, this issue has existed since v9.4, where the
logical decoding was introduced, so it seems to be a long-standing
issue.
The simplest fix would be to invalidate all caches when setting a new
historical snapshot (i.e. applying CHANGE_INTERNAL_SNAPSHOT) but we
end up invalidating other caches unnecessarily too.
A better fix would be that when decoding the commit of a catalog
changing transaction, we distribute invalidation messages to other
concurrent transactions, like we do for snapshots. But we might not
need to distribute all types of invalidation messages, though.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
logical-decoding-ALTER-TYPE.patchapplication/octet-stream; name=logical-decoding-ALTER-TYPE.patchDownload
diff --git a/contrib/test_decoding/expected/concurrent_ddl_dml.out b/contrib/test_decoding/expected/concurrent_ddl_dml.out
index 3742a2a247..c8960c3ff2 100644
--- a/contrib/test_decoding/expected/concurrent_ddl_dml.out
+++ b/contrib/test_decoding/expected/concurrent_ddl_dml.out
@@ -797,3 +797,30 @@ COMMIT
stop
(1 row)
+
+starting permutation: s1_init s1_begin s1_insert_tbl3_2f s2_alter_type_add s1_insert_tbl3_3f s1_commit s2_get_changes
+step s1_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_insert_tbl3_2f: INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2'));
+step s2_alter_type_add: ALTER TYPE comp ADD ATTRIBUTE f3 int;
+step s1_insert_tbl3_3f: INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2', 3));
+step s1_commit: COMMIT;
+step s2_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data
+---------------------------------------------------------------
+BEGIN
+table public.tbl3: INSERT: val1[integer]:1 val2[comp]:'(1,2)'
+table public.tbl3: INSERT: val1[integer]:1 val2[comp]:'(1,2,3)'
+COMMIT
+(4 rows)
+
+?column?
+--------
+stop
+(1 row)
+
diff --git a/contrib/test_decoding/specs/concurrent_ddl_dml.spec b/contrib/test_decoding/specs/concurrent_ddl_dml.spec
index d16515e6f4..d4690b92d9 100644
--- a/contrib/test_decoding/specs/concurrent_ddl_dml.spec
+++ b/contrib/test_decoding/specs/concurrent_ddl_dml.spec
@@ -2,14 +2,20 @@ setup
{
DROP TABLE IF EXISTS tbl1;
DROP TABLE IF EXISTS tbl2;
+ DROP TABLE IF EXISTS tbl3;
+ DROP TYPE IF EXISTS comp;
+ CREATE TYPE comp AS (f1 int, f2 text);
CREATE TABLE tbl1(val1 integer, val2 integer);
CREATE TABLE tbl2(val1 integer, val2 integer);
+ CREATE TABLE tbl3(val1 INT, val2 comp);
}
teardown
{
DROP TABLE tbl1;
DROP TABLE tbl2;
+ DROP TABLE tbl3;
+ DROP TYPE comp;
SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
}
@@ -21,6 +27,8 @@ step "s1_begin" { BEGIN; }
step "s1_insert_tbl1" { INSERT INTO tbl1 (val1, val2) VALUES (1, 1); }
step "s1_insert_tbl2" { INSERT INTO tbl2 (val1, val2) VALUES (1, 1); }
step "s1_insert_tbl2_3col" { INSERT INTO tbl2 (val1, val2, val3) VALUES (1, 1, 1); }
+step "s1_insert_tbl3_2f" { INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2')); }
+step "s1_insert_tbl3_3f" { INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2', 3)); }
step "s1_commit" { COMMIT; }
session "s2"
@@ -44,6 +52,8 @@ step "s2_alter_tbl2_3rd_char" { ALTER TABLE tbl2 ALTER COLUMN val3 TYPE characte
step "s2_alter_tbl2_3rd_text" { ALTER TABLE tbl2 ALTER COLUMN val3 TYPE text; }
step "s2_alter_tbl2_3rd_int" { ALTER TABLE tbl2 ALTER COLUMN val3 TYPE int USING val3::integer; }
+step "s2_alter_type_add" { ALTER TYPE comp ADD ATTRIBUTE f3 int; }
+
step "s2_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
@@ -86,3 +96,5 @@ permutation "s1_init" "s2_alter_tbl2_add_char" "s1_begin" "s1_insert_tbl1" "s2_a
permutation "s1_init" "s2_alter_tbl2_add_text" "s1_begin" "s1_insert_tbl1" "s2_alter_tbl2_3rd_char" "s1_insert_tbl2_3col" "s1_commit" "s2_alter_tbl2_drop_3rd_col" "s1_insert_tbl2" "s2_get_changes"
permutation "s1_init" "s2_alter_tbl2_add_char" "s1_begin" "s1_insert_tbl1" "s2_alter_tbl2_drop_3rd_col" "s1_insert_tbl1" "s1_commit" "s2_get_changes"
+
+permutation "s1_init" "s1_begin" "s1_insert_tbl3_2f" "s2_alter_type_add" "s1_insert_tbl3_3f" "s1_commit" "s2_get_changes"
On Sun, Aug 6, 2023 at 11:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
A colleague Drew Callahan (in CC) has discovered that the logical
decoding doesn't handle syscache invalidation messages properly that
are generated by other transactions. Here is example (I've attached a
patch for isolation test),-- Setup
CREATE TYPE comp AS (f1 int, f2 text);
CREATE TABLE tbl3(val1 int, val2 comp);
SELECT pg_create_logical_replication_slot('s', 'test_decoding');-- Session 1
BEGIN;
INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2'));-- Session 2
ALTER TYPE comp ADD ATTRIBUTE f3 int;INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2', 3));
COMMIT;pg_logical_slot_get_changes() returns:
BEGIN
table public.tbl3: INSERT: val1[integer]:1 val2[comp]:'(1,2)'
table public.tbl3: INSERT: val1[integer]:1 val2[comp]:'(1,2)'
COMMITHowever, the logical decoding should reflect the result of ALTER TYPE
and the val2 of the second INSERT output should be '(1,2,3)'.The root cause of this behavior is that while ALTER TYPE can be run
concurrently to INSERT, the logical decoding doesn't handle cache
invalidation properly, and it got a cache hit of stale data (of
typecache in this case). Unlike snapshots that are stored in the
transaction’s reorder buffer changes, the invalidation messages of
other transactions are not distributed. As a result, the snapshot
becomes moot when we get a cache hit of stale data due to not
processing the invalidation messages again. This is not an issue for
ALTER TABLE and the like due to 2 phase locking and taking an
AccessExclusive lock. The issue with DMLs and ALTER TYPE has been
discovered but there might be other similar cases.As far as I tested, this issue has existed since v9.4, where the
logical decoding was introduced, so it seems to be a long-standing
issue.The simplest fix would be to invalidate all caches when setting a new
historical snapshot (i.e. applying CHANGE_INTERNAL_SNAPSHOT) but we
end up invalidating other caches unnecessarily too.
I think this could be quite onerous for workloads having a mix of DDL/DMLs.
A better fix would be that when decoding the commit of a catalog
changing transaction, we distribute invalidation messages to other
concurrent transactions, like we do for snapshots. But we might not
need to distribute all types of invalidation messages, though.
I would prefer the solution on these lines. It would be good if we
distribute only the required type of invalidations.
--
With Regards,
Amit Kapila.