Parallel scan with SubTransGetTopmostTransaction assert coredump

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Pengchengliu (#1)

RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Andres,
Reproduce steps.

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file "src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.
3, init a new database cluster.
4, modify postgres.conf and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce.

max_connections = 2000

parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=8
max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file.
pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 120

Thanks
Pengcheng

-----Original Message-----
From: Andres Freund <andres@anarazel.de>
Sent: 2021年5月7日 11:55
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: pgsql-hackers@postgresql.org
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi,

On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:

Hi Hackers，

Last email, format error, missing some information, so I resend this email.

With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a subtransaction coredump as below:

So the root cause is the Parallel Workers process set the TransactionXmin with later transcation snapshot. When parallel scan, Parallel Workers process use the older active snapshot.

It leads to subtrans assert coredump. I don't know how to fix it. Is there any ideas?

Do you have steps to reliably reproduce this?

Greetings,

Andres Freund

Import Notes

Reply to msg id not found: 000001d7432b33cc4a009b64de00@tju.edu.cnReference msg id not found: AGAApwBuDke9jEHtVdrbtaq-.1.1620293412680.Hmail.2008202116@tju.edu.cn

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Pengchengliu (#1)

RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Andres,
Reproduce steps.

max_connections = 2000

parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=8
max_parallel_workers = 32

Thanks
Pengcheng

Hi,

On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:

Hi Hackers，

Last email, format error, missing some information, so I resend this email.

With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a subtransaction coredump as below:

So the root cause is the Parallel Workers process set the TransactionXmin with later transcation snapshot. When parallel scan, Parallel Workers process use the older active snapshot.

It leads to subtrans assert coredump. I don't know how to fix it. Is there any ideas?

Do you have steps to reliably reproduce this?

Greetings,

Andres Freund

Import Notes

Reply to msg id not found: 000001d7432b33cc4a009b64de00@tju.edu.cnReference msg id not found: AGAApwBuDke9jEHtVdrbtaq-.1.1620293412680.Hmail.2008202116@tju.edu.cn

gregn4422@gmail.com

about 5 years ago

In reply to: Pengchengliu (#4)

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Tue, May 11, 2021 at 11:28 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

Hi Andres,
Reproduce steps.

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file "src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.
3, init a new database cluster.
4, modify postgres.conf and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce.

max_connections = 2000

parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=8
max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file.
pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 120

Hi,

I had a go at reproducing your reported issue, making sure to follow
all your steps.
Unfortunately, your script seemed to run OK with pgbench and no
crash/coredump occurred for me (and yes, I definitely had asserts
enabled).
I tried with both the 13.2 source code
(3fb4c75e857adee3da4386e947ba58a75f3e74b7), running through the script
with pgbench twice to completion, and also did the same using the
latest Postgres source code.

Will be interesting to see if anyone is able to reproduce your issue.

Regards,
Greg Nancarrow
Fujitsu Australia

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Greg Nancarrow (#5)

RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Andres,

Thanks for you replay.

And If you still cannot reproduce it in 2 minitus. Could you run pgbench longer time, such as 30 or 60 minutes.

This coredump, It should be from parallel scan only.

For normal scan(without parallel), SubTransGetTopmostTransaction assert(HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction->Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin))),

I think this Assert is correct. As single scan get transaction snapshot while set the TransactionXmin and Snapshot->xmin.

In XidInMVCCSnapshot, it will check whether xid precedes snapshot->xmin first. If it is, XidInMVCCSnapshot will return false directly.

So in XidInMVCCSnapshot->SubTransGetTopmostTransaction, xid cannot precede snapshot->xmin.

But for parallel scan, it is different. I modify the code and use the sleep to replace the SubTransGetTopmostTransaction Assert.

Then we can check TransactionXmin and the snapshot from DSA.

The stack is as below, when got the Assert error.

(gdb) bt

#0 0x0000149fb3d254bb in select () from /lib64/libc.so.6

#1 0x0000000000b1d3b3 in pg_usleep (microsec=1000000) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/port/pgsleep.c:56

#2 0x0000000000562a3b in SubTransGetTopmostTransaction (xid=799225) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:164

#3 0x0000000000b04acb in XidInMVCCSnapshot (xid=799225, snapshot=0x2af2d00) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293

#4 0x00000000004ff24c in HeapTupleSatisfiesMVCC (htup=0x7fffc1465f60, snapshot=0x2af2d00, buffer=109832)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1073

#5 0x00000000005002f3 in HeapTupleSatisfiesVisibility (tup=0x7fffc1465f60, snapshot=0x2af2d00, buffer=109832)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1695

#6 0x00000000004e41cb in heapgetpage (sscan=0x2af3118, page=10846) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:447

#7 0x00000000004e684f in heapgettup_pagemode (scan=0x2af3118, dir=ForwardScanDirection, nkeys=0, key=0x0)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1077

#8 0x00000000004e6e46 in heap_getnextslot (sscan=0x2af3118, direction=ForwardScanDirection, slot=0x2affab0)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1333

#9 0x0000000000752e1a in table_scan_getnextslot (sscan=0x2af3118, direction=ForwardScanDirection, slot=0x2affab0)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/access/tableam.h:906

#10 0x0000000000752ee2 in SeqNext (node=0x2aff538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:80

#11 0x000000000071a848 in ExecScanFetch (node=0x2aff538, accessMtd=0x752e4e <SeqNext>, recheckMtd=0x752ef3 <SeqRecheck>)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:133

#12 0x000000000071a8e9 in ExecScan (node=0x2aff538, accessMtd=0x752e4e <SeqNext>, recheckMtd=0x752ef3 <SeqRecheck>)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:199

#13 0x0000000000752f3d in ExecSeqScan (pstate=0x2aff538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:112

#14 0x0000000000725794 in ExecProcNode (node=0x2aff538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor/executor.h:248

#15 0x0000000000725c7f in fetch_input_tuple (aggstate=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeAgg.c:589

#16 0x0000000000728f98 in agg_retrieve_direct (aggstate=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeAgg.c:2463

#17 0x00000000007289f2 in ExecAgg (pstate=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeAgg.c:2183

#18 0x0000000000716cbb in ExecProcNodeFirst (node=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execProcnode.c:450

#19 0x000000000070b103 in ExecProcNode (node=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor/executor.h:248

#20 0x000000000070dc0e in ExecutePlan (estate=0x2afeb30, planstate=0x2afeff0, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0,

direction=ForwardScanDirection, dest=0x2ab0578, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:1632

#21 0x000000000070b72e in standard_ExecutorRun (queryDesc=0x2af2c68, direction=ForwardScanDirection, count=0, execute_once=true)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:350

#22 0x000000000070b55c in ExecutorRun (queryDesc=0x2af2c68, direction=ForwardScanDirection, count=0, execute_once=true)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:294

#23 0x0000000000712ae1 in ParallelQueryMain (seg=0x2a0a0c8, toc=0x149fb4dab000) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execParallel.c:1448

#24 0x000000000055f69c in ParallelWorkerMain (main_arg=1403863538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:1470

#25 0x000000000086db61 in StartBackgroundWorker () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/bgworker.c:879

#26 0x0000000000881238 in do_start_bgworker (rw=0x2a351b0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5870

#27 0x00000000008815e4 in maybe_start_bgworkers () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:6095

#28 0x0000000000880620 in sigusr1_handler (postgres_signal_arg=10) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5255

#29 <signal handler called>

#30 0x0000149fb3d254bb in select () from /lib64/libc.so.6

#31 0x000000000087c173 in ServerLoop () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:1703

#32 0x000000000087bb3e in PostmasterMain (argc=3, argv=0x2a08080) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:1412

#33 0x0000000000782d24 in main (argc=3, argv=0x2a08080) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/main/main.c:210

(gdb) f 24

#24 0x000000000055f69c in ParallelWorkerMain (main_arg=1403863538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:1470

1470 entrypt(seg, toc);

(gdb) p *ActiveSnapshot->as_snap //active snapshot from main process

$18 = {snapshot_type = SNAPSHOT_MVCC, xmin = 799162, xmax = 822061, xip = 0x2ab0190, xcnt = 169, subxip = 0x0, subxcnt = 0, suboverflowed = true, takenDuringRecovery = false,

copied = true, curcid = 119, speculativeToken = 2139062143, active_count = 1, regd_count = 2, ph_node = {first_child = 0x2af2d40, next_sibling = 0x0, prev_or_parent = 0x0},

whenTaken = 0, lsn = 0}

(gdb) p *CurrentSnapshot //transaction snapshot from main process

$19 = {snapshot_type = SNAPSHOT_MVCC, xmin = 799425, xmax = 822293, xip = 0x2ab1c00, xcnt = 172, subxip = 0x149f29302010, subxcnt = 0, suboverflowed = true,

takenDuringRecovery = false, copied = false, curcid = 119, speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0,

prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

(gdb) f 4

#4 0x00000000004ff24c in HeapTupleSatisfiesMVCC (htup=0x7fffc1465f60, snapshot=0x2af2d00, buffer=109832)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1073

1073 XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))

(gdb) p *snapshot //ative snap shot from main process for scan

$20 = {snapshot_type = SNAPSHOT_MVCC, xmin = 799162, xmax = 822061, xip = 0x2af2d68, xcnt = 169, subxip = 0x0, subxcnt = 0, suboverflowed = true, takenDuringRecovery = false,

copied = true, curcid = 119, speculativeToken = 2139062143, active_count = 0, regd_count = 1, ph_node = {first_child = 0x0, next_sibling = 0xf65ca0 <CatalogSnapshotData+64>,

prev_or_parent = 0x2ab0168}, whenTaken = 0, lsn = 0}

(gdb) p TransactionXmin

$21 = 799425

(gdb) f 3

#3 0x0000000000b04acb in XidInMVCCSnapshot (xid=799225, snapshot=0x2af2d00) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293

2293 xid = SubTransGetTopmostTransaction(xid);

(gdb) p xid

$22 = 799225

The main process:

1, Main process get the transaction snapshot(xmin 799162, xmax 822061) and push the active snapshot first. And set this active snapshot to QueryDesc in CreateQueryDesc.

2, Main process collect active snapshot(xmin 799162, xmax 82206) and get the newer transaction snapshot(xmin 799425, xmax 822293). And use the parameter PARALLEL_KEY_TRANSACTION_SNAPSHOT, PARALLEL_KEY_ACTIVE_SNAPSHOT

store transaction snapshot and active snapshot.

3, Main process ExecGather->ExecInitParallelPlan->ExecParallelInitializeDSM->ExecSeqScanInitializeDSM->table_parallelscan_initialize, send active snapshot(xmin 799162, xmax 82206) with plan id to parallel work process.

4, lauch parallel work process.

The parallel work process:

1, Get Snapshot and set TransactionXmin itself, in ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.

2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from main process, and set TransactionXmin 799425 in ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.

3, ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel get the active snapshot(xmin 799162, xmax 82206) from main process, and set this snapshot to scan->rs_base.rs_snapshot.

4, parallel scan begin, with active snapshot(xmin 799162, xmax 82206) and TransactionXmin(799425),when scan tuple(xmin 799225) SubTransGetTopmostTransaction assert got.

In HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.

As main process gets the active snapshot (xmin 799162, xmax 822061) which is earlier than transaction snapshot(xmin 799425, xmax 822293). Parallel work process set TransactionXmin with transaction snapshot(xmin 799425, xmax 822293).

But scan tuple with active snapshot (xmin 799162, xmax 822061).

Thanks

Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月11日 19:08
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Tue, May 11, 2021 at 11:28 AM Pengchengliu < <mailto:pengchengliu@tju.edu.cn> pengchengliu@tju.edu.cn> wrote:

Hi Andres,

Reproduce steps.

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file "src/include/access/subtrans.h" line number 15.

2, configure with enable assert and build it.

3, init a new database cluster.

4, modify postgres.conf and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce.

max_connections = 2000

parallel_setup_cost=0

parallel_tuple_cost=0

min_parallel_table_scan_size=0

max_parallel_workers_per_gather=8

max_parallel_workers = 32

5, start the database cluster.

6, use the script init_test.sql in attachment to create tables.

7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file.

pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 120

Hi,

I had a go at reproducing your reported issue, making sure to follow all your steps.

Unfortunately, your script seemed to run OK with pgbench and no crash/coredump occurred for me (and yes, I definitely had asserts enabled).

I tried with both the 13.2 source code

(3fb4c75e857adee3da4386e947ba58a75f3e74b7), running through the script with pgbench twice to completion, and also did the same using the latest Postgres source code.

Will be interesting to see if anyone is able to reproduce your issue.

Regards,

Greg Nancarrow

Fujitsu Australia

gregn4422@gmail.com

about 5 years ago

In reply to: Pengchengliu (#6)

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Thu, May 13, 2021 at 11:25 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

Hi Andres,
Thanks for you replay.

Er, it's Greg who has replied so far (not Andres).

And If you still cannot reproduce it in 2 minitus. Could you run pgbench longer time, such as 30 or 60 minutes.

Actually, I did run it, multiple times, for more than 60 minutes, but
no assert/crash/coredump occurred in my environment.

The parallel work process:

1, Get Snapshot and set TransactionXmin itself, in ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.

2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from main process, and set TransactionXmin 799425 in ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.

3, ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel get the active snapshot(xmin 799162, xmax 82206) from main process, and set this snapshot to scan->rs_base.rs_snapshot.

4, parallel scan begin, with active snapshot(xmin 799162, xmax 82206) and TransactionXmin(799425),when scan tuple(xmin 799225) SubTransGetTopmostTransaction assert got.

In HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.

I added some logging at a couple of points in the code:
1) In the Worker process code - ParallelWorkerMain() - where it
restores the serialized transaction and active snapshots (i.e. passed
to the Worker from the main process).
2) In the HeapTupleSatisfiesMVCC() function, immediately before it
calls XidInMVCCSnapshot()

After running it for an hour, examination of the log showed that in
ALL cases, the transaction snapshot xmin,xmax was always THE SAME as
the active snapshot xmin,xmax.
(Can you verify that this occurs on your system when things are
working, prior to the coredump?)

This is different to what you are getting in your environment (at
least, different to what you described when the problem occurs).
In your case, you say that the main process gets "the newer
transaction snapshot" - where exactly is this happening in your case?
(or this is what you don't yet know?)
Perhaps very occasionally this somehow happens on your system and
triggers the Assert (and coredump)? I have not been able to reproduce
that on my system.

Have you reproduced this issue on any other system, using the same
steps as you provided?
I'm wondering if there might be something else in your environment
that may be influencing this problem.

Regards,
Greg Nancarrow
Fujitsu Australia

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Greg Nancarrow (#7)

RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Greg,

Thanks for you replay and test.

When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot, the transaction snapshot xmin is very likely follows active snapshot xmin.

Use gdb it is easy to verify it.

Create env as blow:

1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database.

2, Append the postgres.conf as below:

max_connections = 2000

parallel_setup_cost=0

parallel_tuple_cost=0

min_parallel_table_scan_size=0

max_parallel_workers_per_gather=8

max_parallel_workers = 128

3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables.

4, Use the sub_120.sql script in attachment with pgbench to test it.

pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 1800

Then you can login the database, and use GDB to verify it.

1, First use explain, make sure force Parallel is OK.

postgres=# explain (verbose,analyze) select count(*) from contend1;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

----------------

Finalize Aggregate (cost=12006.11..12006.12 rows=1 width=8) (actual time=1075.214..1075.449 rows=1 loops=1)

Output: count(*)

-> Gather (cost=12006.08..12006.09 rows=8 width=8) (actual time=1075.198..1075.433 rows=1 loops=1)

Output: (PARTIAL count(*))

Workers Planned: 8

Workers Launched: 0

-> Partial Aggregate (cost=12006.08..12006.09 rows=1 width=8) (actual time=1074.674..1074.676 rows=1 loops=1)

Output: PARTIAL count(*)

-> Parallel Seq Scan on public.contend1 (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454 rows=1

010200 loops=1)

Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, crt_time

Planning Time: 0.123 ms

Execution Time: 1075.588 ms

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

2182678

2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue.

gdb -q -p 2182678

...

(gdb) b parallel.c:219

Breakpoint 1 at 0x55d085: file /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, line 219.

(gdb) c

Continuing.

3, In the psql clinet, we can execute the explain command in step1 again.

After we get the breakpoint in gdb, we wait a moment. Then we execute next.

Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.

When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple times, and before execute next, try wait longer time.

Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219

219 Snapshot transaction_snapshot = GetTransactionSnapshot();

(gdb) n

220 Snapshot active_snapshot = GetActiveSnapshot();

(gdb)

223 oldcontext = MemoryContextSwitchTo(TopTransactionContext);

(gdb) p *transaction_snapshot

$1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010,

subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0,

active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

(gdb) p *active_snapshot

$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0,

suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 0, speculativeToken = 0, active_count = 1,

regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}

(gdb)

Thanks

Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月13日 22:15
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Thu, May 13, 2021 at 11:25 AM Pengchengliu < <mailto:pengchengliu@tju.edu.cn> pengchengliu@tju.edu.cn> wrote:

Hi Andres,

Thanks for you replay.

Er, it's Greg who has replied so far (not Andres).

And If you still cannot reproduce it in 2 minitus. Could you run pgbench longer time, such as 30 or 60 minutes.

Actually, I did run it, multiple times, for more than 60 minutes, but no assert/crash/coredump occurred in my environment.

The parallel work process:

1, Get Snapshot and set TransactionXmin itself, in ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.

2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from main process, and set TransactionXmin 799425 in ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.

3, ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel get the active snapshot(xmin 799162, xmax 82206) from main process, and set this snapshot to scan->rs_base.rs_snapshot.

4, parallel scan begin, with active snapshot(xmin 799162, xmax 82206) and TransactionXmin(799425),when scan tuple(xmin 799225) SubTransGetTopmostTransaction assert got.

In HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.

I added some logging at a couple of points in the code:

1) In the Worker process code - ParallelWorkerMain() - where it restores the serialized transaction and active snapshots (i.e. passed to the Worker from the main process).

2) In the HeapTupleSatisfiesMVCC() function, immediately before it calls XidInMVCCSnapshot()

After running it for an hour, examination of the log showed that in ALL cases, the transaction snapshot xmin,xmax was always THE SAME as the active snapshot xmin,xmax.

(Can you verify that this occurs on your system when things are working, prior to the coredump?)

This is different to what you are getting in your environment (at least, different to what you described when the problem occurs).

In your case, you say that the main process gets "the newer transaction snapshot" - where exactly is this happening in your case?

(or this is what you don't yet know?)

Perhaps very occasionally this somehow happens on your system and triggers the Assert (and coredump)? I have not been able to reproduce that on my system.

Have you reproduced this issue on any other system, using the same steps as you provided?

I'm wondering if there might be something else in your environment that may be influencing this problem.

Regards,

Greg Nancarrow

Fujitsu Australia

gregn4422@gmail.com

about 5 years ago

In reply to: Pengchengliu (#8)

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Fri, May 14, 2021 at 12:25 PM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

Hi Greg,

Thanks for you replay and test.

When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot, the transaction snapshot xmin is very likely follows active snapshot xmin.

Use gdb it is easy to verify it.

Create env as blow:

1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database.

2, Append the postgres.conf as below:

max_connections = 2000

parallel_setup_cost=0

parallel_tuple_cost=0

min_parallel_table_scan_size=0

max_parallel_workers_per_gather=8

max_parallel_workers = 128

3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables.

4, Use the sub_120.sql script in attachment with pgbench to test it.

pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 1800

Then you can login the database, and use GDB to verify it.

1, First use explain, make sure force Parallel is OK.

postgres=# explain (verbose,analyze) select count(*) from contend1;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

----------------

Finalize Aggregate (cost=12006.11..12006.12 rows=1 width=8) (actual time=1075.214..1075.449 rows=1 loops=1)

Output: count(*)

-> Gather (cost=12006.08..12006.09 rows=8 width=8) (actual time=1075.198..1075.433 rows=1 loops=1)

Output: (PARTIAL count(*))

Workers Planned: 8

Workers Launched: 0

-> Partial Aggregate (cost=12006.08..12006.09 rows=1 width=8) (actual time=1074.674..1074.676 rows=1 loops=1)

Output: PARTIAL count(*)

-> Parallel Seq Scan on public.contend1 (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454 rows=1

010200 loops=1)

Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, crt_time

Planning Time: 0.123 ms

Execution Time: 1075.588 ms

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

2182678

2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue.

gdb -q -p 2182678

...

(gdb) b parallel.c:219

Breakpoint 1 at 0x55d085: file /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, line 219.

(gdb) c

Continuing.

3, In the psql clinet, we can execute the explain command in step1 again.

After we get the breakpoint in gdb, we wait a moment. Then we execute next.

Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.

When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple times, and before execute next, try wait longer time.

Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)

at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219

219 Snapshot transaction_snapshot = GetTransactionSnapshot();

(gdb) n

220 Snapshot active_snapshot = GetActiveSnapshot();

(gdb)

223 oldcontext = MemoryContextSwitchTo(TopTransactionContext);

(gdb) p *transaction_snapshot

$1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010,

subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0,

active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

(gdb) p *active_snapshot

$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0,

suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 0, speculativeToken = 0, active_count = 1,

regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}

(gdb)

Hi Pengcheng,

I followed all your steps.
However, I perhaps get different behavior in my environment.
99% of the time, the xmin and xmax of the active_snapshot and
transaction_snapshot are the same (regardless of how long I wait at
different points after the breakpoint is hit). I've had one or two
instances where the xmax values differ. I managed to catch just one
case where there were different xmin and xmax values in the snapshots,
but this occurred just prior to the pgbench client completing and
terminating, and when I continued in the debugger, there was no
crash/coredump.

However, I think I've spotted something potentially important to this issue:
For me, almost always "suboverflowed = false" in the snapshots (except
in that one case, right at the end of the pgbench run), yet in your
gdb example "suboverflowed = true" in both of the snapshots (i.e. the
snapshot subxip array has overflowed). I'm guessing that this may be
related to the coredump issue, but I'm not exactly sure how it has
happened, and why it seemingly isn't being handled correctly and
causes that Assert to fire in your case.
Can you try and find out how the snapshot suboverflow is being set in
your case? (since you are getting this readily in your examined
snapshots???) I think there's only several places where it can be set
to "true" (e.g. procarray.c:1641).
Also, does increasing PGPROC_MAX_CACHED_SUBXIDS avoid, or delay, the
problem for you? It's currently defined as 64.
I notice that there's been some changes related to snapshot data
handling and subxid overflow since 13.2, so I'm wondering whether your
coredump issue can be reproduced with the latest code?

Regards,
Greg Nancarrow
Fujitsu Australia

#10

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Greg Nancarrow (#9)

RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Greg,

When you get the different xmin between active snapshot and transaction snapshot, maybe there is no coredump.

As maybe there is not tupe(xmin) between ActiveSnapshot->xmin and TransactionSnapshot->xmin which needs to be scaned in parallel process.

There is no doubt, it is very likely that ActiveSnapshot->xmin precedes TransactionSnapshot->xmin.

For this coredump, we must make sure parallel and snapshot overflow. If snapshot is not overflow, you cannot get the coredump.

As coredump is from parallel scan in MVCC when snapshot is overflow.

Did you use pgbench with the script sub_120.sql which I provide in attachment?

As the default PGPROC_MAX_CACHED_SUBXIDS is 64. In script sub_120.sql, for one transaction, it will use 120 subtransactions which is much larger than 64.

While getting the snapshot, it must be overflow. I really don't know why your snapshot is not overflow.

Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin codes in PG13.2.

I have checked the codes in master branch, there is no change about this mechanism. This issue should still exist.

Thanks

Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月14日 16:47
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Fri, May 14, 2021 at 12:25 PM Pengchengliu < <mailto:pengchengliu@tju.edu.cn> pengchengliu@tju.edu.cn> wrote:

Hi Greg,

Thanks for you replay and test.

When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot, the transaction snapshot xmin is very likely follows active snapshot xmin.

Use gdb it is easy to verify it.

Create env as blow:

1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database.

2, Append the postgres.conf as below:

max_connections = 2000

parallel_setup_cost=0

parallel_tuple_cost=0

min_parallel_table_scan_size=0

max_parallel_workers_per_gather=8

max_parallel_workers = 128

3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables.

4, Use the sub_120.sql script in attachment with pgbench to test it.

pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 1800

Then you can login the database, and use GDB to verify it.

1, First use explain, make sure force Parallel is OK.

postgres=# explain (verbose,analyze) select count(*) from contend1;

QUERY PLAN

----------------------------------------------------------------------

---------------------------------------------------------------

----------------

Finalize Aggregate (cost=12006.11..12006.12 rows=1 width=8) (actual

time=1075.214..1075.449 rows=1 loops=1)

Output: count(*)

-> Gather (cost=12006.08..12006.09 rows=8 width=8) (actual

time=1075.198..1075.433 rows=1 loops=1)

Output: (PARTIAL count(*))

Workers Planned: 8

Workers Launched: 0

-> Partial Aggregate (cost=12006.08..12006.09 rows=1

width=8) (actual time=1074.674..1074.676 rows=1 loops=1)

Output: PARTIAL count(*)

-> Parallel Seq Scan on public.contend1

(cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454

rows=1

010200 loops=1)

Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9,

c10, crt_time

Planning Time: 0.123 ms

Execution Time: 1075.588 ms

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

2182678

2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue.

gdb -q -p 2182678

...

(gdb) b parallel.c:219

Breakpoint 1 at 0x55d085: file /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, line 219.

(gdb) c

Continuing.

3, In the psql clinet, we can execute the explain command in step1 again.

After we get the breakpoint in gdb, we wait a moment. Then we execute next.

Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.

When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple times, and before execute next, try wait longer time.

Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)

at

/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/ac

cess/transam/parallel.c:219

219 Snapshot transaction_snapshot = GetTransactionSnapshot();

(gdb) n

220 Snapshot active_snapshot = GetActiveSnapshot();

(gdb)

223 oldcontext = MemoryContextSwitchTo(TopTransactionContext);

(gdb) p *transaction_snapshot

$1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip

= 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010,

subxcnt = 0, suboverflowed = true, takenDuringRecovery = false,

copied = false, curcid = 0, speculativeToken = 0,

active_count = 0, regd_count = 0, ph_node = {first_child = 0x0,

next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

(gdb) p *active_snapshot

$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip

= 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0,

suboverflowed = true, takenDuringRecovery = false, copied = true,

curcid = 0, speculativeToken = 0, active_count = 1,

regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0,

prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}

(gdb)

Hi Pengcheng,

I followed all your steps.

However, I perhaps get different behavior in my environment.

99% of the time, the xmin and xmax of the active_snapshot and transaction_snapshot are the same (regardless of how long I wait at different points after the breakpoint is hit). I've had one or two instances where the xmax values differ. I managed to catch just one case where there were different xmin and xmax values in the snapshots, but this occurred just prior to the pgbench client completing and terminating, and when I continued in the debugger, there was no crash/coredump.

However, I think I've spotted something potentially important to this issue:

For me, almost always "suboverflowed = false" in the snapshots (except in that one case, right at the end of the pgbench run), yet in your gdb example "suboverflowed = true" in both of the snapshots (i.e. the snapshot subxip array has overflowed). I'm guessing that this may be related to the coredump issue, but I'm not exactly sure how it has happened, and why it seemingly isn't being handled correctly and causes that Assert to fire in your case.

Can you try and find out how the snapshot suboverflow is being set in your case? (since you are getting this readily in your examined

snapshots???) I think there's only several places where it can be set to "true" (e.g. procarray.c:1641).

Also, does increasing PGPROC_MAX_CACHED_SUBXIDS avoid, or delay, the problem for you? It's currently defined as 64.

I notice that there's been some changes related to snapshot data handling and subxid overflow since 13.2, so I'm wondering whether your coredump issue can be reproduced with the latest code?

Regards,

Greg Nancarrow

Fujitsu Australia

#11

gregn4422@gmail.com

about 5 years ago

In reply to: Pengchengliu (#10)

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Fri, May 14, 2021 at 8:36 PM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

Did you use pgbench with the script sub_120.sql which I provide in attachment?

yes

Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin codes in PG13.2.

No, I have made no source code changes at all.
That was my suggestion, for you to try - because if the problem is
avoided by increasing PGPROC_MAX_CACHED_SUBXIDS (to say 128) then it
probably indicates the overflow condition is affecting the xmin.xmax
of the two snapshots such that it invalidates the condition that is
asserted.

I think one problem is that in your settings, you haven't set
"max_worker_processes", yet have set "max_parallel_workers = 128".
I'm finding no more than 8 parallel workers are actually active at any one time.
On top of this, you've got pgbench running with 200 concurrent clients.
So many queries are actually executing parallel plans without using
parallel workers, as the workers can't actually be launched (and this
is probably why I'm finding it hard to reproduce the issue, if the
problem involves snapshot suboverflow and parallel workers).
I find that the following settings improve the parallelism per query
and the whole test runs very much faster:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128

and adjust the pgbench command-line: pgbench -d postgres -p 33550
-n -r -f sub_120.sql -c 25 -j 25 -T 1800

Problem is, I still get no coredump when using this.
Can you try these settings and let me know if the crash still happens
if you use these settings?

I also tried:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=2
max_parallel_workers = 280
max_worker_processes = 300

and the pgbench command-line: pgbench -d postgres -p 33550 -n -r
-f sub_120.sql -c 140 -j 140 -T 1800

- but I still get no coredump.

Regards,
Greg Nancarrow
Fujitsu Australia

#12

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Greg Nancarrow (#11)

Re:Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Greg,
It is really weird. Could you make sure is the SnapShot overflow in you ENV? It is very impoint.
Abount SnapShot overflow and Subtrans, you can refer this https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/.

In the script sub_120.sql, for one transaction, we use 120 transcations. So this pgxact->overflowed will be set to true.
Then snapshot must be overflow. When MVCC, it will call SubTransGetTopmostTransaction.
So the snapshot overflow is requirement.

Even though there is no coredump in you ENV, from the codes, we can find some clue.

First, in main process , ActiveSnapshot xmin is very likely preceds TransactionSnapShot xmin.
Second, in parallel work process, it sets TransactionXmin with TransactionSnapShot from main process. But table Scan with ative Snapshot from main process.
So in parallel work process SubTransGetTopmostTransaction, the Assert TransactionIdFollowsOrEquals(xid, TransactionXmin) is not correct.
At least this assert is unsuitable for parallel work process.
For my analyze, if there is any incorrect, please corret me.

BTW, I test it in a high performance server. It is verly easily be reproduced. My colleague and me use different environment both can reproduce it.

Thanks
Pengcheng

#13

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Greg Nancarrow (#11)

RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Tom & Robert,
Could you review this Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) in SubTransGetTopmostTransaction.
I think this assert is unsuitable for parallel work process.

Before we discuss it in
https://www.postgresql-archive.org/Parallel-scan-with-SubTransGetTopmostTransaction-assert-coredump-td6197408.html

Thanks
Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月15日 0:44
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Fri, May 14, 2021 at 8:36 PM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

Did you use pgbench with the script sub_120.sql which I provide in attachment?

yes

Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin codes in PG13.2.

No, I have made no source code changes at all.
That was my suggestion, for you to try - because if the problem is avoided by increasing PGPROC_MAX_CACHED_SUBXIDS (to say 128) then it probably indicates the overflow condition is affecting the xmin.xmax of the two snapshots such that it invalidates the condition that is asserted.

I think one problem is that in your settings, you haven't set "max_worker_processes", yet have set "max_parallel_workers = 128".
I'm finding no more than 8 parallel workers are actually active at any one time.
On top of this, you've got pgbench running with 200 concurrent clients.
So many queries are actually executing parallel plans without using parallel workers, as the workers can't actually be launched (and this is probably why I'm finding it hard to reproduce the issue, if the problem involves snapshot suboverflow and parallel workers).
I find that the following settings improve the parallelism per query and the whole test runs very much faster:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128

and adjust the pgbench command-line: pgbench -d postgres -p 33550
-n -r -f sub_120.sql -c 25 -j 25 -T 1800

Problem is, I still get no coredump when using this.
Can you try these settings and let me know if the crash still happens if you use these settings?

I also tried:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=2
max_parallel_workers = 280
max_worker_processes = 300

and the pgbench command-line: pgbench -d postgres -p 33550 -n -r
-f sub_120.sql -c 140 -j 140 -T 1800

- but I still get no coredump.

Regards,
Greg Nancarrow
Fujitsu Australia

#14

gregn4422@gmail.com

about 5 years ago

In reply to: Pengchengliu (#12)

Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Sat, May 15, 2021 at 12:37 PM 刘鹏程 <pengchengliu@tju.edu.cn> wrote:

BTW, I test it in a high performance server. It is verly easily be reproduced. My colleague and me use different environment both can reproduce it.

Hi Pengcheng,

Although the issue won't reproduce easily in my system, I can
certainly see how, for the snapshots used in the parallel worker case,
the Active snapshot used is potentially an earlier snapshot that the
Transaction snapshot. I don't know why it is getting a newer
Transaction snapshot in InitializeParallelDSM(), when it has
previously pushed the return value of GetTransactionSnapshot() as the
Active snapshot.

So I too hope Tom or Robert can explain what is going on here and how
to resolve it (as you requested them to, in your other post).

I actually think that the Assert in SubTransGetTopmostTransaction() is
correct, but in the parallel-worker case, the snapshots are not being
setup correctly.

Can you try the trivial change below and see if it prevents the coredump?

Regards,
Greg Nancarrow
Fujitsu Australia

diff --git a/src/backend/access/transam/parallel.c
b/src/backend/access/transam/parallel.c
index 14a8690019..870889053f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  int i;
  FixedParallelState *fps;
  dsm_handle session_dsm_handle = DSM_HANDLE_INVALID;
- Snapshot transaction_snapshot = GetTransactionSnapshot();
+ Snapshot transaction_snapshot = GetActiveSnapshot();
  Snapshot active_snapshot = GetActiveSnapshot();

/* We might be running in a very short-lived memory context. */

#15

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Greg Nancarrow (#14)

RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Greg,

I actually think that the Assert in SubTransGetTopmostTransaction() is correct, but in the parallel-worker case, the snapshots are not being setup correctly.

I agree with you that Assert in SubTransGetTopmostTransaction() is correct. The root cause is Transaction Xmin are not being setup correctly in the parallel-worker.

Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I don't know why main process send ActiveSnapshot and TransactionSnapshot separately. And what is exact difference between them?
If you know that, could you explain that for me? It will be very appreciated.
Before we know them exactly, I think we should not modify the TransactionSnapshot to ActiveSnapshot in main process. If it is, the main process should send ActiveSnapshot only.

Thanks
Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月17日 20:59
To: 刘鹏程 <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Sat, May 15, 2021 at 12:37 PM 刘鹏程 <pengchengliu@tju.edu.cn> wrote:

BTW, I test it in a high performance server. It is verly easily be reproduced. My colleague and me use different environment both can reproduce it.

Hi Pengcheng,

Although the issue won't reproduce easily in my system, I can certainly see how, for the snapshots used in the parallel worker case, the Active snapshot used is potentially an earlier snapshot that the Transaction snapshot. I don't know why it is getting a newer Transaction snapshot in InitializeParallelDSM(), when it has previously pushed the return value of GetTransactionSnapshot() as the Active snapshot.

So I too hope Tom or Robert can explain what is going on here and how to resolve it (as you requested them to, in your other post).

I actually think that the Assert in SubTransGetTopmostTransaction() is correct, but in the parallel-worker case, the snapshots are not being setup correctly.

Can you try the trivial change below and see if it prevents the coredump?

Regards,
Greg Nancarrow
Fujitsu Australia

diff --git a/src/backend/access/transam/parallel.c
b/src/backend/access/transam/parallel.c
index 14a8690019..870889053f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  int i;
  FixedParallelState *fps;
  dsm_handle session_dsm_handle = DSM_HANDLE_INVALID;
- Snapshot transaction_snapshot = GetTransactionSnapshot();
+ Snapshot transaction_snapshot = GetActiveSnapshot();
  Snapshot active_snapshot = GetActiveSnapshot();

/* We might be running in a very short-lived memory context. */

#16

pashkin.elfe@gmail.com

about 5 years ago

In reply to: Pengchengliu (#15)

Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid,
TransactionXmin)) with a subsequent crash in a parallel worker in
PostgreSQL v11-based build, Though I was unable to investigate deeper and
reproduce the issue. The details above in the thread make me think it is a
real and long-time-persistent error that is surely worth to be fixed.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

#17

gregn4422@gmail.com

about 5 years ago

In reply to: Pengchengliu (#15)

Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Tue, May 18, 2021 at 11:27 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

Hi Greg,

Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I don't know why main process send ActiveSnapshot and TransactionSnapshot separately. And what is exact difference between them?
If you know that, could you explain that for me? It will be very appreciated.

In the context of a parallel-worker, I am a little confused too, so I
can't explain it either.
It is not really explained in the file
"src\backend\access\transam\README.parallel", it only mentions the
following as part of the state that needs to be copied to each worker:

- The transaction snapshot.
- The active snapshot, which might be different from the transaction snapshot.

So they might be different, but exactly when and why?

When I debugged a typical parallel-SELECT case, I found that prior to
plan execution, GetTransactionSnapshot() was called and its return
value was stored in both the QueryDesc and the estate (es_snapshot),
which was then pushed on the ActiveSnapshot stack. So by the time
InitializeParallelDSM() was called, the (top) ActiveSnapshot was the
last snapshot returned from GetTransactionSnapshot().
So why InitializeParallelDSM() calls GetTransactionSnapshot() again is
not clear to me (because isn't then the ActiveSnapshot a potentially
earlier snapshot? - which it shouldn't be, AFAIK. And also, it's then
different to the non-parallel case).

Before we know them exactly, I think we should not modify the TransactionSnapshot to ActiveSnapshot in main process. If it is, the main process should send ActiveSnapshot only.

I think it would be worth you trying my suggested change (if you have
a development environment, which I assume you have). Sure, IF it was
deemed a proper solution, you'd only send the one snapshot, and adjust
accordingly in ParallelWorkerMain(), but we need not worry about that
in order to test it.

Regards,
Greg Nancarrow
Fujitsu Australia

#18

Zhijie Hou (Fujitsu)

houzj.fnst@fujitsu.com

about 5 years ago

In reply to: Greg Nancarrow (#17)

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Greg Nancarrow <gregn4422@gmail.com>; Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) with a subsequent crash in a parallel worker in PostgreSQL v11-based
build, Though I was unable to investigate deeper and reproduce the issue. The details above in the thread make me think it is a real and long-time-persistent error that is
surely worth to be fixed.

I followed Liu's reproduce steps and successfully reproduce it in about half an hour running.
My compile option is : " ./configure --enable-cassert --prefix=/home/pgsql".

After applying greg-san's change, the coredump did not happened in two hour(it is still running).
Note, I have not taken a deep look into the change, just provide some test information in advance.

Best regards,
houzj

Import Notes

Resolved by subject fallback

#19

gregn4422@gmail.com

about 5 years ago

In reply to: Zhijie Hou (Fujitsu) (#18)

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Tue, May 18, 2021 at 9:41 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Greg Nancarrow <gregn4422@gmail.com>; Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) with a subsequent crash in a parallel worker in PostgreSQL v11-based
build, Though I was unable to investigate deeper and reproduce the issue. The details above in the thread make me think it is a real and long-time-persistent error that is
surely worth to be fixed.

I followed Liu's reproduce steps and successfully reproduce it in about half an hour running.
My compile option is : " ./configure --enable-cassert --prefix=/home/pgsql".

After applying greg-san's change, the coredump did not happened in two hour(it is still running).
Note, I have not taken a deep look into the change, just provide some test information in advance.

+1
Thanks for doing that.
I'm unsure if that "fix" is the right approach, so please investigate it too.

Regards,
Greg Nancarrow
Fujitsu Australia

#20

pengchengliu@tju.edu.cn

about 5 years ago

In reply to: Greg Nancarrow (#17)

RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi Greg,
Thanks a lot for you explanation and your fix.

I think your fix can resolve the core dump issue. As with your fix, parallel process reset Transaction Xmin from ActiveSnapshot.
But it will change Transaction snapshot for all parallel scenarios. I don't know whether it bring in other issue.
For test only, I think it is enough.

So is there anybody can explain what's exactly difference between ActiveSnapshot and TransactionSnapshot in parallel work process.
Then maybe we can find a better solution and try to fix it really.

Thanks
Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月18日 17:15
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Tue, May 18, 2021 at 11:27 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

Hi Greg,

Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I don't know why main process send ActiveSnapshot and TransactionSnapshot separately. And what is exact difference between them?
If you know that, could you explain that for me? It will be very appreciated.

In the context of a parallel-worker, I am a little confused too, so I can't explain it either.
It is not really explained in the file
"src\backend\access\transam\README.parallel", it only mentions the following as part of the state that needs to be copied to each worker:

- The transaction snapshot.
- The active snapshot, which might be different from the transaction snapshot.

So they might be different, but exactly when and why?

When I debugged a typical parallel-SELECT case, I found that prior to plan execution, GetTransactionSnapshot() was called and its return value was stored in both the QueryDesc and the estate (es_snapshot), which was then pushed on the ActiveSnapshot stack. So by the time
InitializeParallelDSM() was called, the (top) ActiveSnapshot was the last snapshot returned from GetTransactionSnapshot().
So why InitializeParallelDSM() calls GetTransactionSnapshot() again is not clear to me (because isn't then the ActiveSnapshot a potentially earlier snapshot? - which it shouldn't be, AFAIK. And also, it's then different to the non-parallel case).

Before we know them exactly, I think we should not modify the TransactionSnapshot to ActiveSnapshot in main process. If it is, the main process should send ActiveSnapshot only.

I think it would be worth you trying my suggested change (if you have a development environment, which I assume you have). Sure, IF it was deemed a proper solution, you'd only send the one snapshot, and adjust accordingly in ParallelWorkerMain(), but we need not worry about that in order to test it.

Regards,
Greg Nancarrow
Fujitsu Australia

#21

gregn4422@gmail.com

about 5 years ago

In reply to: Pengchengliu (#20)

#22

gregn4422@gmail.com

about 5 years ago

In reply to: Greg Nancarrow (#21)

#23

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Greg Nancarrow (#22)

#24

gregn4422@gmail.com

about 5 years ago

In reply to: Michael Paquier (#23)

#25

pashkin.elfe@gmail.com

about 5 years ago

In reply to: Greg Nancarrow (#24)

#26

gregn4422@gmail.com

about 5 years ago

In reply to: Pavel Borisov (#25)

#27

Maxim Orlov

m.orlov@postgrespro.ru

almost 5 years ago

In reply to: Greg Nancarrow (#26)

#28

Ranier Vilela

ranier.vf@gmail.com

almost 5 years ago

In reply to: Maxim Orlov (#27)

#29

pashkin.elfe@gmail.com

almost 5 years ago

In reply to: Ranier Vilela (#28)

#30

Ranier Vilela

ranier.vf@gmail.com

almost 5 years ago

In reply to: Pavel Borisov (#29)

#31

Maxim Orlov

m.orlov@postgrespro.ru

almost 5 years ago

In reply to: Maxim Orlov (#27)

#32

gregn4422@gmail.com

almost 5 years ago

In reply to: Maxim Orlov (#31)

#33

pashkin.elfe@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#32)

#34

pashkin.elfe@gmail.com

almost 5 years ago

In reply to: Pavel Borisov (#33)

#35

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 5 years ago

In reply to: Pavel Borisov (#34)

#36

Maxim Orlov

m.orlov@postgrespro.ru

almost 5 years ago

In reply to: Tomas Vondra (#35)

#37

pashkin.elfe@gmail.com

almost 5 years ago

In reply to: Maxim Orlov (#36)

#38

gregn4422@gmail.com

almost 5 years ago

In reply to: Tomas Vondra (#35)

#39

robertmhaas@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#38)

#40

gregn4422@gmail.com

almost 5 years ago

In reply to: Robert Haas (#39)

#41

pashkin.elfe@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#40)

#42

robertmhaas@gmail.com

almost 5 years ago

In reply to: Pavel Borisov (#41)

#43

gregn4422@gmail.com

almost 5 years ago

In reply to: Robert Haas (#42)

#44

pashkin.elfe@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#43)

#45

gregn4422@gmail.com

almost 5 years ago

In reply to: Pavel Borisov (#44)

#46

gregn4422@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#45)

#47

pashkin.elfe@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#45)

#48

pashkin.elfe@gmail.com

almost 5 years ago

In reply to: Pavel Borisov (#47)

#49

robertmhaas@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#43)

#50

gregn4422@gmail.com

almost 5 years ago

In reply to: Robert Haas (#49)

#51

gregn4422@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#50)

#52

robertmhaas@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#50)

#53

gregn4422@gmail.com

almost 5 years ago

In reply to: Robert Haas (#52)

#54

robertmhaas@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#53)

#55

gregn4422@gmail.com

almost 5 years ago

In reply to: Robert Haas (#54)

#56

robertmhaas@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#55)

#57

gregn4422@gmail.com

almost 5 years ago

In reply to: Robert Haas (#56)

#58

robertmhaas@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#57)

#59

gregn4422@gmail.com

almost 5 years ago

In reply to: Robert Haas (#58)

#60

robertmhaas@gmail.com

almost 5 years ago

In reply to: Greg Nancarrow (#59)

#61

gregn4422@gmail.com

almost 5 years ago

In reply to: Robert Haas (#60)

#62