BUG #15114: logical decoding Segmentation fault
The following bug has been logged on the website:
Bug reference: 15114
Logged by: Peter Slavov
Email address: pet.slavov@gmail.com
PostgreSQL version: 10.3
Operating system: Ubuntu xenial
Description:
Hi,
I have a Segmentation fault problem when trying to start logical decoding on
the latest Postgres version - 10.3 (package version 10.3-1.pgdg16.04+1).
Here is what happens:
--- Query on master db server ---
create publication big_tables FOR TABLE table1, table2, table3, table4;
--- Query on the logical replica db server
create subscription sub_name CONNECTION 'host=master.server dbname=db_name
user=db_user password=db_password' PUBLICATION big_tables ;
--- Code dump: ---
------------------
Reading symbols from /usr/lib/postgresql/10/bin/postgres...Reading symbols
from
/usr/lib/debug/.build-id/a3/2aff963eb198ec48d1946523ef49379fa49c5e.debug...done.
done.
[New LWP 26112]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by 'postgres: 10/main: wal sender process lg_replica
ip-10-4-1-11.eu-west-1.compute'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 GetActiveSnapshot () at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c:843
843 /build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c:
No such file or directory.
(gdb)
--- Logs ---
------------
# Master db server:
2018-03-14 23:30:35 UTC [23352]: [7-1] user=,db= host=,app=[] LOG: server
process (PID 26112) was terminated by signal 11: Segmentation fault
2018-03-14 23:30:35 UTC [23352]: [8-1] user=,db= host=,app=[] LOG:
terminating any other active server processes
# logical Replica db server:
2018-03-14 23:29:22 UTC [6777]: [3-1] user=postgres,db=db_replica_name
host=[local],app=[psql] LOG: duration: 73.901 ms statement: create
subscription sub_name CONNECTION 'host=master.server dbname=db_name
user=db_user password=db_password' PUBLICATION big_tables ;
2018-03-14 23:29:22 UTC [6779]: [1-1] user=,db= host=,app=[] LOG: logical
replication apply worker for subscription "sub_name" has started
2018-03-14 23:29:22 UTC [6780]: [1-1] user=,db= host=,app=[] LOG: logical
replication table synchronization worker for subscription "sub_name", table
"table1" has started
2018-03-14 23:29:22 UTC [6781]: [1-1] user=,db= host=,app=[] LOG: logical
replication table synchronization worker for subscription "sub_name", table
"table2" has started
2018-03-14 23:29:23 UTC [6780]: [2-1] user=,db= host=,app=[] LOG: logical
replication table synchronization worker for subscription "sub_name", table
"table1" has finished
2018-03-14 23:29:23 UTC [6782]: [1-1] user=,db= host=,app=[] LOG: logical
replication table synchronization worker for subscription "sub_name", table
"table3" has started
2018-03-14 23:29:24 UTC [6781]: [2-1] user=,db= host=,app=[] LOG: logical
replication table synchronization worker for subscription "sub_name", table
"table2" has finished
2018-03-14 23:29:24 UTC [6783]: [1-1] user=,db= host=,app=[] LOG: logical
replication table synchronization worker for subscription "sub_name", table
"table4" has started
2018-03-14 23:29:26 UTC [6783]: [2-1] user=,db= host=,app=[] LOG: logical
replication table synchronization worker for subscription "sub_name", table
"table4" has finished
2018-03-14 23:29:26 UTC [6782]: [2-1] user=,db= host=,app=[] LOG: logical
replication table synchronization worker for subscription "sub_name", table
"table3" has finished
2018-03-14 23:30:35 UTC [6779]: [2-1] user=,db= host=,app=[] ERROR: could
not receive data from WAL stream: SSL SYSCALL error: EOF detected
2018-03-14 23:30:35 UTC [6739]: [7-1] user=,db= host=,app=[] LOG: worker
process: logical replication worker for subscription 17178 (PID 6779) exited
with exit code 1
2018-03-14 23:30:35 UTC [6798]: [1-1] user=,db= host=,app=[] LOG: logical
replication apply worker for subscription "sub_name" has started
2018-03-14 23:30:35 UTC [6798]: [2-1] user=,db= host=,app=[] ERROR: could
not connect to the publisher: FATAL: the database system is in recovery
mode
--- Configurations ---
----------------------
Default debian postgresql.conf with this changes:
listen_addresses = '*'
max_connections = 500
shared_buffers = 1GB
work_mem = 4MB
maintenance_work_mem = 128MB
effective_cache_size = 4GB
effective_io_concurrency = 5
wal_level = logical
max_wal_senders = 10
max_replication_slots = 10
max_worker_processes = 20
wal_log_hints = off
hot_standby = on
shared_preload_libraries = 'pg_stat_statements'
tcp_keepalives_idle = 300
tcp_keepalives_interval = 10
tcp_keepalives_count = 6
log_min_duration_statement = 5
log_lock_waits = on
log_checkpoints = on
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_filename = '%A.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d host=%h,app=[%a] '
log_timezone = 'UTC'
log_statement = 'none'
log_duration = off
log_min_duration_statement = 0
log_hostname = on
log_checkpoints = on
log_connections = on
log_disconnections = on
wal_compression = on
ssl_cert_file = '/etc/postgresql/ssl/database_public.crt'
ssl_key_file = '/etc/postgresql/ssl/database_private.key'
Does it sound familiar? what am I doing wrong ?
PG Bug reporting form wrote:
--- Code dump: --- ------------------ Reading symbols from /usr/lib/postgresql/10/bin/postgres...Reading symbols from /usr/lib/debug/.build-id/a3/2aff963eb198ec48d1946523ef49379fa49c5e.debug...done. done. [New LWP 26112] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by 'postgres: 10/main: wal sender process lg_replica ip-10-4-1-11.eu-west-1.compute'. Program terminated with signal SIGSEGV, Segmentation fault. #0 GetActiveSnapshot () at /build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c:843 843 /build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c: No such file or directory. (gdb)
Can you do a "bt" in that gdb and paste the output, please?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Here is the code dump with bt:
Reading symbols from /usr/lib/postgresql/10/bin/postgres...Reading symbols
from
/usr/lib/debug/.build-id/a3/2aff963eb198ec48d1946523ef49379fa49c5e.debug...done.
done.
[New LWP 26112]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: 10/main: wal sender process lg_replica
ip-10-4-1-11.eu-west-1.compute'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 GetActiveSnapshot () at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c:843
843
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c:
No such file or directory.
(gdb) bt
#0 GetActiveSnapshot () at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c:843
#1 0x00005649b753e928 in postquel_start (fcache=0x5649b89b7968,
es=0x5649b89c1288) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/executor/functions.c:809
#2 fmgr_sql (fcinfo=0x5649b89b5c48) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/executor/functions.c:1150
#3 0x00005649b7530ae1 in ExecInterpExpr (state=0x5649b89b5b68,
econtext=0x5649b89b6468, isnull=<optimized out>) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/executor/execExprInterp.c:650
#4 0x00005649b75d9bc7 in ExecEvalExprSwitchContext (isNull=0x7fff8d0a8794
"", econtext=<optimized out>, state=0x5649b89b5b68) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/include/executor/executor.h:308
#5 evaluate_expr (expr=<optimized out>, result_type=result_type@entry=1114,
result_typmod=result_typmod@entry=-1,
result_collation=result_collation@entry=0) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:4739
#6 0x00005649b75dba31 in evaluate_function (context=0x7fff8d0a8b50,
func_tuple=0x7f76e7843548, funcvariadic=0 '\000', args=0x5649b89b2730,
input_collid=0, result_collid=0, result_typmod=-1, result_type=1114,
funcid=16506)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:4296
#7 simplify_function (funcid=16506, result_type=1114, result_typmod=-1,
result_collid=result_collid@entry=0, input_collid=input_collid@entry=0,
args_p=args_p@entry=0x7fff8d0a8930, funcvariadic=0 '\000', process_args=1
'\001', allow_non_const=1 '\001', context=0x7fff8d0a8b50)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:3936
#8 0x00005649b75dc449 in eval_const_expressions_mutator
(node=0x5649b89b2530, context=0x7fff8d0a8b50) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:2595
#9 0x00005649b757bc8b in expression_tree_mutator (node=<optimized out>,
mutator=mutator@entry=0x5649b75dc210 <eval_const_expressions_mutator>,
context=context@entry=0x7fff8d0a8b50) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/nodes/nodeFuncs.c:2903
#10 0x00005649b75dbfce in simplify_function (funcid=2057, result_type=16,
result_typmod=result_typmod@entry=-1, result_collid=0, input_collid=0,
args_p=args_p@entry=0x7fff8d0a8b00, funcvariadic=0 '\000', process_args=1
'\001', allow_non_const=1 '\001', context=0x7fff8d0a8b50)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:3927
#11 0x00005649b75dc38b in eval_const_expressions_mutator
(node=node@entry=0x5649b89b0888,
context=context@entry=0x7fff8d0a8b50) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:2643
#12 0x00005649b75ddabf in eval_const_expressions (root=root@entry=0x0,
node=node@entry=0x5649b89b0888) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:2437
#13 0x00005649b777181a in RelationGetIndexPredicate
(relation=relation@entry=0x7f76e7841568)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/cache/relcache.c:4813
#14 0x00005649b7443d60 in BuildIndexInfo (index=index@entry=0x7f76e7841568)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/catalog/index.c:1660
#15 0x00005649b7771b90 in RelationGetIndexAttrBitmap
(relation=relation@entry=0x7f76e7836638,
attrKind=attrKind@entry=INDEX_ATTR_BITMAP_IDENTITY_KEY)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/cache/relcache.c:4938
#16 0x00005649b7614368 in logicalrep_write_attrs (rel=0x7f76e7836638,
out=0x5649b88dee38) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/proto.c:530
#17 logicalrep_write_rel (out=0x5649b88dee38, rel=0x7f76e7836638) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/proto.c:317
#18 0x00007f76998b5e6f in pgoutput_change (ctx=<optimized out>,
txn=<optimized out>, relation=0x7f76e7836638, change=0x5649b899b1f8) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/pgoutput/pgoutput.c:324
#19 0x00005649b760fbc9 in change_cb_wrapper (cache=<optimized out>,
txn=<optimized out>, relation=<optimized out>, change=<optimized out>) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/logical.c:711
#20 0x00005649b761844b in ReorderBufferCommit (rb=0x5649b89848a8,
xid=xid@entry=71743261, commit_lsn=commit_lsn@entry=463103922872,
end_lsn=end_lsn@entry=463103922928,
commit_time=commit_time@entry=574385433206439,
origin_id=origin_id@entry=0, origin_lsn=0)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/reorderbuffer.c:1445
#21 0x00005649b760e012 in DecodeCommit (xid=71743261,
parsed=0x7fff8d0a91e0, buf=<synthetic pointer>, ctx=0x5649b88b0d48) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/decode.c:611
#22 DecodeXactOp (buf=<synthetic pointer>, ctx=0x5649b88b0d48) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/decode.c:241
#23 LogicalDecodingProcessRecord (ctx=0x5649b88b0d48, record=<optimized
out>) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/decode.c:113
#24 0x00005649b761ff19 in XLogSendLogical () at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/walsender.c:2781
#25 0x00005649b7621883 in WalSndLoop (send_data=send_data@entry=0x5649b761feb0
<XLogSendLogical>) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/walsender.c:2149
#26 0x00005649b76224c3 in StartLogicalReplication (cmd=<optimized out>) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/walsender.c:1102
#27 exec_replication_command (cmd_string=cmd_string@entry=0x5649b8966768
"START_REPLICATION SLOT \"rbank_staging\" LOGICAL 0/0 (proto_version '1',
publication_names '\"big_tables\"')")
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/walsender.c:1539
#28 0x00005649b766ca72 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x5649b88d9990, dbname=0x5649b88d9838 "receiptbank",
username=<optimized out>) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/tcop/postgres.c:4084
#29 0x00005649b73a60d9 in BackendRun (port=0x5649b88d7000) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/postmaster/postmaster.c:4405
#30 BackendStartup (port=0x5649b88d7000) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/postmaster/postmaster.c:4077
#31 ServerLoop () at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/postmaster/postmaster.c:1755
#32 0x00005649b75fc78b in PostmasterMain (argc=5, argv=<optimized out>) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/postmaster/postmaster.c:1363
#33 0x00005649b73a74d5 in main (argc=5, argv=0x5649b8891850) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/main/main.c:228
(gdb)
2018-03-15 15:56 GMT+01:00 Alvaro Herrera <alvherre@alvh.no-ip.org>:
Show quoted text
PG Bug reporting form wrote:
--- Code dump: --- ------------------ Reading symbols from /usr/lib/postgresql/10/bin/postgres...Readingsymbols
from
/usr/lib/debug/.build-id/a3/2aff963eb198ec48d1946523ef49379fa49c5e.debug...done.
done.
[New LWP 26112]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by 'postgres: 10/main: wal sender process lg_replica
ip-10-4-1-11.eu-west-1.compute'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 GetActiveSnapshot () at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c:843
843 /build/postgresql-10-drhiey/postgresql-10-10.3/build/../
src/backend/utils/time/snapmgr.c:
No such file or directory.
(gdb)Can you do a "bt" in that gdb and paste the output, please?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2018-03-15 23:55:32 +0100, Петър Славов wrote:
Here is the code dump with bt:
Thanks, this is interesting. Could you show the definition of the
tables involved in this?
For some reason I dont understand you appear to have managed to get a
predicated index used as the identity column.
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:2437
#13 0x00005649b777181a in RelationGetIndexPredicate
(relation=relation@entry=0x7f76e7841568)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/cache/relcache.c:4813
#14 0x00005649b7443d60 in BuildIndexInfo (index=index@entry=0x7f76e7841568)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/catalog/index.c:1660
#15 0x00005649b7771b90 in RelationGetIndexAttrBitmap
(relation=relation@entry=0x7f76e7836638,
attrKind=attrKind@entry=INDEX_ATTR_BITMAP_IDENTITY_KEY)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/cache/relcache.c:4938
#16 0x00005649b7614368 in logicalrep_write_attrs (rel=0x7f76e7836638,
out=0x5649b88dee38) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/proto.c:530
#17 logicalrep_write_rel (out=0x5649b88dee38, rel=0x7f76e7836638) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/proto.c:317
#18 0x00007f76998b5e6f in pgoutput_change (ctx=<optimized out>,
txn=<optimized out>, relation=0x7f76e7836638, change=0x5649b899b1f8) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/pgoutput/pgoutput.c:324
We should never end up with an identity index that has a predicate. So
something's wrong here...
Greetings,
Andres Freund
Hi,
On 2018-03-15 23:55:32 +0100, Петър Славов wrote:
#0 GetActiveSnapshot () at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/time/snapmgr.c:843
#1 0x00005649b753e928 in postquel_start (fcache=0x5649b89b7968,
2
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/optimizer/util/clauses.c:2437
#13 0x00005649b777181a in RelationGetIndexPredicate
(relation=relation@entry=0x7f76e7841568)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/cache/relcache.c:4813
#14 0x00005649b7443d60 in BuildIndexInfo (index=index@entry=0x7f76e7841568)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/catalog/index.c:1660
#15 0x00005649b7771b90 in RelationGetIndexAttrBitmap
(relation=relation@entry=0x7f76e7836638,
attrKind=attrKind@entry=INDEX_ATTR_BITMAP_IDENTITY_KEY)
at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/utils/cache/relcache.c:4938
#16 0x00005649b7614368 in logicalrep_write_attrs (rel=0x7f76e7836638,
out=0x5649b88dee38) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/proto.c:530
#17 logicalrep_write_rel (out=0x5649b88dee38, rel=0x7f76e7836638) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/proto.c:317
#18 0x00007f76998b5e6f in pgoutput_change (ctx=<optimized out>,
txn=<optimized out>, relation=0x7f76e7836638, change=0x5649b899b1f8) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/pgoutput/pgoutput.c:324
#19 0x00005649b760fbc9 in change_cb_wrapper (cache=<optimized out>,
txn=<optimized out>, relation=<optimized out>, change=<optimized out>) at
/build/postgresql-10-drhiey/postgresql-10-10.3/build/../src/backend/replication/logical/logical.c:711
Alvaro, Petr: If I understand correctly what happens here is that some
*other* column than the replica identity has a predicate with a full
blown SQL function. We shouldn't evaluate arbitrary code like that
inside the otuput plugin!
So RelationGetIndexAttrBitmap() builds a bitmap of *all* indexed
columns. Besides the code exec problem above, this is also just
inefficient. I think this needs to be replaced by some replident
bespoke code?
Greetings,
Andres Freund
Andres Freund wrote:
Alvaro, Petr: If I understand correctly what happens here is that some
*other* column than the replica identity has a predicate with a full
blown SQL function. We shouldn't evaluate arbitrary code like that
inside the otuput plugin!So RelationGetIndexAttrBitmap() builds a bitmap of *all* indexed
columns. Besides the code exec problem above, this is also just
inefficient. I think this needs to be replaced by some replident
bespoke code?
I agree -- I think logicalrep_write_attrs() should not use
RelationGetIndexAttrBitmap at all but instead use some lower-level way
to obtain columns of the replica identity. I don't know whether there
are historical-snapshot-related considerations to care about, in this
code, though.
This is neither my area nor my code, but if Petr and Peter decline
producing a patch for this quickly, I can look at it in a few days.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2018-03-26 13:08:01 -0300, Alvaro Herrera wrote:
I agree -- I think logicalrep_write_attrs() should not use
RelationGetIndexAttrBitmap at all but instead use some lower-level way
to obtain columns of the replica identity.
Indeed.
I don't know whether there are historical-snapshot-related
considerations to care about, in this code, though.
There shouldn't be much of an issue if all that's accessed is system
tables/cols.
This is neither my area nor my code, but if Petr and Peter decline
producing a patch for this quickly, I can look at it in a few days.
Cool!
Greetings,
Andres Freund
Hi Andres,
Alvaro, Petr: If I understand correctly what happens here is that some
*other* column than the replica identity has a predicate with a full
blown SQL function. We shouldn't evaluate arbitrary code like that
inside the otuput plugin!
Not sure what you mean here, but in my case the replica identity is a
regular integer primary key, without any functions as predicates in the
index or anything else.
... or my schema is not the problem at all in this case ?
Peter Slavov
Петър Славов wrote:
Hi Andres,
Alvaro, Petr: If I understand correctly what happens here is that some
*other* column than the replica identity has a predicate with a full
blown SQL function. We shouldn't evaluate arbitrary code like that
inside the otuput plugin!Not sure what you mean here, but in my case the replica identity is a
regular integer primary key, without any functions as predicates in the
index or anything else.... or my schema is not the problem at all in this case ?
Andres is saying that you have some other index that contains a
predicate.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-03-26 22:59:12 +0200, Петър Славов wrote:
Hi Andres,
Alvaro, Petr: If I understand correctly what happens here is that some
*other* column than the replica identity has a predicate with a full
blown SQL function. We shouldn't evaluate arbitrary code like that
inside the otuput plugin!Not sure what you mean here, but in my case the replica identity is a
regular integer primary key, without any functions as predicates in the
index or anything else.
Turns out it it's not the replica identity index that's the problem,
it's *another* index with a WHERE clause that evaluates an SQL
function. That's a bug that needs to be fixed. But you might be able to
work around it for now by simplifying your WHERE clauses for the indexes
to not use SQL functions.
Greetings,
Andres Freund
Hi,
Now I understand - I have partial index with this predicate - "WHERE
created_at > months_ago(2)",
where months_ago(int) is immutable function that returns current timestamp
minus the given number of months.
Just to confirm - removing this index on the testing server stopped this
Segmentation fault from happening.
Of course I will have to wait for a fix to be able to use logical decoding
in this case.
One more thing - I am almost curtain that the same thing have to be fixed
in pglogical extension ..
Peter Slavov
2018-03-26 23:06 GMT+02:00 Andres Freund <andres@anarazel.de>:
Show quoted text
On 2018-03-26 22:59:12 +0200, Петър Славов wrote:
Hi Andres,
Alvaro, Petr: If I understand correctly what happens here is that some
*other* column than the replica identity has a predicate with a full
blown SQL function. We shouldn't evaluate arbitrary code like that
inside the otuput plugin!Not sure what you mean here, but in my case the replica identity is a
regular integer primary key, without any functions as predicates in the
index or anything else.Turns out it it's not the replica identity index that's the problem,
it's *another* index with a WHERE clause that evaluates an SQL
function. That's a bug that needs to be fixed. But you might be able to
work around it for now by simplifying your WHERE clauses for the indexes
to not use SQL functions.Greetings,
Andres Freund
Alvaro Herrera wrote:
I agree -- I think logicalrep_write_attrs() should not use
RelationGetIndexAttrBitmap at all but instead use some lower-level way
to obtain columns of the replica identity. I don't know whether there
are historical-snapshot-related considerations to care about, in this
code, though.
Here's a patch. There is a disadvantage: one additional syscache lookup
each time logicalrep_write_attrs() is called. I think that's all right,
since that's done once for each relation being replicated (not, say,
once per tuple or once per transaction). Even so, I tried to move this
code as a special "quick-path" case in RelationGetIndexAttrBitmap, so
that it could be cached separately without processing the other indexes.
That turned out a bit too messy for my taste.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
proto.patchtext/plain; charset=us-asciiDownload+41-2
Hi,
I tried to compile this and i had this error:
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-O2 -I. -I../../../../src/include -D_GNU_SOURCE -c -o proto.o proto.c
proto.c: In function ‘logicalrep_write_attrs’:
proto.c:565:17: error: ‘FormData_pg_index {aka struct FormData_pg_index}’
has no member named ‘indnkeyatts’
natts = index->indnkeyatts;
^
<builtin>: recipe for target 'proto.o' failed
make[4]: *** [proto.o] Error 1
make[4]: Leaving directory
'/usr/src/postgresql-10.3/src/backend/replication/logical'
../../../src/backend/common.mk:41: recipe for target 'logical-recursive'
failed
make[3]: *** [logical-recursive] Error 2
make[3]: Leaving directory
'/usr/src/postgresql-10.3/src/backend/replication'
common.mk:41: recipe for target 'replication-recursive' failed
make[2]: *** [replication-recursive] Error 2
make[2]: Leaving directory '/usr/src/postgresql-10.3/src/backend'
Makefile:38: recipe for target 'all-backend-recurse' failed
make[1]: *** [all-backend-recurse] Error 2
make[1]: Leaving directory '/usr/src/postgresql-10.3/src'
GNUmakefile:11: recipe for target 'all-src-recurse' failed
make: *** [all-src-recurse] Error 2
-----
maybe - index->indnatts; ??
Peter Slavov
2018-04-16 19:59 GMT+02:00 Alvaro Herrera <alvherre@alvh.no-ip.org>:
Show quoted text
Alvaro Herrera wrote:
I agree -- I think logicalrep_write_attrs() should not use
RelationGetIndexAttrBitmap at all but instead use some lower-level way
to obtain columns of the replica identity. I don't know whether there
are historical-snapshot-related considerations to care about, in this
code, though.Here's a patch. There is a disadvantage: one additional syscache lookup
each time logicalrep_write_attrs() is called. I think that's all right,
since that's done once for each relation being replicated (not, say,
once per tuple or once per transaction). Even so, I tried to move this
code as a special "quick-path" case in RelationGetIndexAttrBitmap, so
that it could be cached separately without processing the other indexes.
That turned out a bit too messy for my taste.--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 27, 2018 at 12:26:52AM +0200, Петър Славов wrote:
I tried to compile this and i had this error:
proto.c: In function ‘logicalrep_write_attrs’:
proto.c:565:17: error: ‘FormData_pg_index {aka struct FormData_pg_index}’
has no member named ‘indnkeyatts’
natts = index->indnkeyatts;
The patch Alvaro has sent is made for the master branch, and you are
visibly using a build of REL_10_STABLE. Ths conflict you are seeing
comes from 8224de4 which introduced Covering indexes, and renamed
indnatts to indnkeyatts in Form_pg_index.
--
Michael
Hi,
What happen here - I cannot see fix for this bug.
I also cannot see the workaround parch in the code.
Do I have to do something to mo e this fix forward?
I think we agreed that this bug braked logical decoding or I am wrong ?
Peter Slavov
На пт, 27.04.2018 г. в 3:22 ч. Michael Paquier <michael@paquier.xyz> написа:
Show quoted text
On Fri, Apr 27, 2018 at 12:26:52AM +0200, Петър Славов wrote:
I tried to compile this and i had this error:
proto.c: In function ‘logicalrep_write_attrs’:
proto.c:565:17: error: ‘FormData_pg_index {aka struct FormData_pg_index}’
has no member named ‘indnkeyatts’
natts = index->indnkeyatts;The patch Alvaro has sent is made for the master branch, and you are
visibly using a build of REL_10_STABLE. Ths conflict you are seeing
comes from 8224de4 which introduced Covering indexes, and renamed
indnatts to indnkeyatts in Form_pg_index.
--
Michael
Hi,
This thread was brought to my attention during pgconf.eu (as I sadly
missed it).
On 04/10/2018 10:03, Петър Славов wrote:
Hi,
What happen here - I cannot see fix for this bug.
I also cannot see the workaround parch in the code.
Do I have to do something to mo e this fix forward?I think we agreed that this bug braked logical decoding or I am wrong ?
Yes it's definitely a bug.
Also please note that we bottom-post in PostgreSQL mailing lists.
На пт, 27.04.2018 г. в 3:22 ч. Michael Paquier <michael@paquier.xyz
<mailto:michael@paquier.xyz>> написа:On Fri, Apr 27, 2018 at 12:26:52AM +0200, Петър Славов wrote:
I tried to compile this and i had this error:
proto.c: In function ‘logicalrep_write_attrs’:
proto.c:565:17: error: ‘FormData_pg_index {aka structFormData_pg_index}’
has no member named ‘indnkeyatts’
natts = index->indnkeyatts;The patch Alvaro has sent is made for the master branch, and you are
visibly using a build of REL_10_STABLE. Ths conflict you are seeing
comes from 8224de4 which introduced Covering indexes, and renamed
indnatts to indnkeyatts in Form_pg_index.
--
Michael
I wrote my own patch to solve this which is quite similar to Alvaro's
except that it does not do the manual work to build indexes list as we
already have RelationGetReplicaIndex which does all the necessary work
and also uses the same rd_idattr for cache that
RelationGetIndexAttrBitmap does, it's better for performance and looks
safe to me.
I attached both patch for PG11+ (ie master) and the backport patch for
PG10, they only differ in the line identified above as problematic.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
pg10-0001-Fix-REPLICA-IDENTITY-handling-in-logical-replication.patchtext/x-patch; name=pg10-0001-Fix-REPLICA-IDENTITY-handling-in-logical-replication.patchDownload+56-5
pg11-0001-Fix-REPLICA-IDENTITY-handling-in-logical-replication.patchtext/x-patch; name=pg11-0001-Fix-REPLICA-IDENTITY-handling-in-logical-replication.patchDownload+56-5
On Sun, Oct 28, 2018 at 11:59:49PM +0100, Petr Jelinek wrote:
I wrote my own patch to solve this which is quite similar to Alvaro's
except that it does not do the manual work to build indexes list as we
already have RelationGetReplicaIndex which does all the necessary work
and also uses the same rd_idattr for cache that
RelationGetIndexAttrBitmap does, it's better for performance and looks
safe to me.I attached both patch for PG11+ (ie master) and the backport patch for
PG10, they only differ in the line identified above as problematic.
Hmm. I am studying this patch, and this visibly cannot be easily
reproduced? I would have imagined that something like this would be
enough, with a predicate index on the publisher side which is in charge
of loading the :
-- connect to publisher
\c postgres postgres /tmp 5432
create table aa (a int not null, b int);
create function copy_int(a int) returns int as $$ select $1 $$
language sql immutable;
create index aai on aa (b) where copy_int(b) > 0;
create unique index aai2 on aa (a);
alter table aa replica identity using index aai2;
create publication big_tables for table aa;
insert into aa values (generate_series(1,100), 1);
-- connect to subscriber
\c postgres postgres /tmp 5433
create table aa (a int not null, b int);
create function copy_int(a int) returns int as $$ select $1 $$
language sql immutable;
create index aai on aa (b) where copy_int(b) > 0;
create unique index aai2 on aa (a);
alter table aa replica identity using index aai2;
create subscription sub_name CONNECTION 'host=/tmp dbname=postgres
user=postgres' PUBLICATION big_tables;
Is there something I am missing?
If possible, could it be possible to add a test case in
001_rep_changes.pl with a more complete schema?
--
Michael
Hi Michael,
My latest tests was with immutable function with a now() function inside,
which is a little different. Here is the schema that replicates the problem:
on Producer:
CREATE FUNCTION public.months_ago(integer) RETURNS timestamp without time
zone
LANGUAGE sql IMMUTABLE
AS $_$
SELECT (NOW() - '1 month'::interval * $1)::timestamp;
$_$;
CREATE TABLE public.replicated_table (
id serial PRIMARY KEY NOT NULL,
a text,
b bigint,
created_at timestamp without time zone DEFAULT now()
);
CREATE INDEX ON public.replicated_table (a) WHERE (created_at >
public.months_ago(4));
on Subscriber:
CREATE TABLE public.replicated_table (
id integer NOT NULL,
a text,
b bigint,
created_at timestamp without time zone DEFAULT now()
);
Peter
На пн, 29.10.2018 г. в 2:54 ч. Michael Paquier <michael@paquier.xyz> написа:
Show quoted text
On Sun, Oct 28, 2018 at 11:59:49PM +0100, Petr Jelinek wrote:
I wrote my own patch to solve this which is quite similar to Alvaro's
except that it does not do the manual work to build indexes list as we
already have RelationGetReplicaIndex which does all the necessary work
and also uses the same rd_idattr for cache that
RelationGetIndexAttrBitmap does, it's better for performance and looks
safe to me.I attached both patch for PG11+ (ie master) and the backport patch for
PG10, they only differ in the line identified above as problematic.Hmm. I am studying this patch, and this visibly cannot be easily
reproduced? I would have imagined that something like this would be
enough, with a predicate index on the publisher side which is in charge
of loading the :
-- connect to publisher
\c postgres postgres /tmp 5432
create table aa (a int not null, b int);
create function copy_int(a int) returns int as $$ select $1 $$
language sql immutable;
create index aai on aa (b) where copy_int(b) > 0;
create unique index aai2 on aa (a);
alter table aa replica identity using index aai2;
create publication big_tables for table aa;
insert into aa values (generate_series(1,100), 1);-- connect to subscriber
\c postgres postgres /tmp 5433
create table aa (a int not null, b int);
create function copy_int(a int) returns int as $$ select $1 $$
language sql immutable;
create index aai on aa (b) where copy_int(b) > 0;
create unique index aai2 on aa (a);
alter table aa replica identity using index aai2;
create subscription sub_name CONNECTION 'host=/tmp dbname=postgres
user=postgres' PUBLICATION big_tables;Is there something I am missing?
If possible, could it be possible to add a test case in
001_rep_changes.pl with a more complete schema?
--
Michael
Hi Michael,
On 29/10/2018 02:54, Michael Paquier wrote:
On Sun, Oct 28, 2018 at 11:59:49PM +0100, Petr Jelinek wrote:
I wrote my own patch to solve this which is quite similar to Alvaro's
except that it does not do the manual work to build indexes list as we
already have RelationGetReplicaIndex which does all the necessary work
and also uses the same rd_idattr for cache that
RelationGetIndexAttrBitmap does, it's better for performance and looks
safe to me.I attached both patch for PG11+ (ie master) and the backport patch for
PG10, they only differ in the line identified above as problematic.Hmm. I am studying this patch, and this visibly cannot be easily
reproduced? I would have imagined that something like this would be
enough, with a predicate index on the publisher side which is in charge
of loading the :
-- connect to publisher
\c postgres postgres /tmp 5432
create table aa (a int not null, b int);
create function copy_int(a int) returns int as $$ select $1 $$
language sql immutable;
create index aai on aa (b) where copy_int(b) > 0;
create unique index aai2 on aa (a);
alter table aa replica identity using index aai2;
create publication big_tables for table aa;
insert into aa values (generate_series(1,100), 1);-- connect to subscriber
\c postgres postgres /tmp 5433
create table aa (a int not null, b int);
create function copy_int(a int) returns int as $$ select $1 $$
language sql immutable;
create index aai on aa (b) where copy_int(b) > 0;
create unique index aai2 on aa (a);
alter table aa replica identity using index aai2;
create subscription sub_name CONNECTION 'host=/tmp dbname=postgres
user=postgres' PUBLICATION big_tables;Is there something I am missing?
It needs to be function that requires snapshot to be built, which yours
does not. This all happens because we try to inline the SQL functions
into expression so all the expression machinery for running the function
gets involved and if the function needs snapshot the snapshot will be
created (which is where the crash comes from).
The reason why it breaks for OP is that his IMMUTABLE function is not
actually IMMUTABLE, that's also why we don't see it reported much.
Given this I am not sure if we want to add test for this as it's
something one should not do because it has unpredictable side effects,
but if you think it will be useful anyway I can whip something out.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 29, 2018 at 04:03:31PM +0100, Petr Jelinek wrote:
The reason why it breaks for OP is that his IMMUTABLE function is not
actually IMMUTABLE, that's also why we don't see it reported much.
Thanks, Petr. I have missed that part. Now I can see the failure
easily. Please note that on the receiver side you have the same
problem, RelationGetIndexAttrBitmap() is still being used in
logicalrep_rel_open() so that crashes the receiver:
#18 0x0000563cc278baa4 in RelationGetIndexPredicate
(relation=0x7f9764cdc050) at relcache.c:4720
#19 0x0000563cc22ccdd6 in BuildIndexInfo (index=0x7f9764cdc050) at
index.c:1789
#20 0x0000563cc278be4f in RelationGetIndexAttrBitmap
(relation=0x7f9764de52e8, attrKind=INDEX_ATTR_BITMAP_IDENTITY_KEY) at
relcache.c:4921
#21 0x0000563cc25a6b0c in logicalrep_rel_open (remoteid=16388,
lockmode=3) at relation.c:313
Given this I am not sure if we want to add test for this as it's
something one should not do because it has unpredictable side effects,
but if you think it will be useful anyway I can whip something out.
Hmm. People should not do that but they can do it, so this thread is
telling is to be careful. Adding new objects in the schema of the
publisher and the receiver would be also cheap. A test like that is not
something we would see but I would suggest to stress the new routine on
both sides. In the patch, something which makes me uncomfortable is
that get_rel_idattr() is a duplicate of the code paths taken by
RelationGetIndexAttrBitmap() when INDEX_ATTR_BITMAP_IDENTITY_KEY gets
used to fill in the relation cache. In this case, wouldn't we want to
just fetch the replica index instead of the full index list? What is
proposed here needs more thoughts I think. Something that we could
consider on HEAD is to remove INDEX_ATTR_BITMAP_IDENTITY_KEY as it is
visibly unsafe. rd_idattr is also an unsafe concept.
Putting a safeguard in RelationGetIndexAttrBitmap() so as it generates
an ERROR if called in an output plugin, or just assert would be nice,
however such a check would need to use MyReplicationSlot (I don't recall
something else), and the limitations could just be documented as well.
--
Michael