Logic problem in SerializeSnapshot()

Started by Rushabh Lathiaalmost 10 years ago2 messages
#1Rushabh Lathia
rushabh.lathia@gmail.com
1 attachment(s)

Hi All,

During the testing of parallel query (with force_parallel_mode = regress),
noticed random server crash with the below stack:

#0 0x0000003fc84896d5 in memcpy () from /lib64/libc.so.6
#1 0x0000000000a36867 in SerializeSnapshot (snapshot=0x1e49f40,
start_address=0x7f391e9ec728 <Address 0x7f391e9ec728 out of bounds>) at
snapmgr.c:1523
#2 0x0000000000522a20 in InitializeParallelDSM (pcxt=0x1e49ce0) at
parallel.c:330
#3 0x00000000006dd256 in ExecInitParallelPlan (planstate=0x1f012b0,
estate=0x1f00be8, nworkers=1) at execParallel.c:398
#4 0x00000000006f8abb in ExecGather (node=0x1f00d00) at nodeGather.c:160
#5 0x00000000006de42e in ExecProcNode (node=0x1f00d00) at
execProcnode.c:516
#6 0x00000000006da4fd in ExecutePlan (estate=0x1f00be8,
planstate=0x1f00d00, use_parallel_mode=1 '\001', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x1e5e118) at execMain.c:1633

So started looking into SerializeSnapshot() and with code reading I found
that
we ignore copying the SubXID array if it has overflowed, unless the snapshot
was taken during recovey, and for this we mark the
serialized_snapshot->subxcnt
to 0. But later while copying the SubXID array we check then condition
based on
snapshot->subxcnt. We should check serialized_snapshot->subxcnt rather then
snapshot->subxcnt.

I tried hard to come up with individual test but somehow I was unable to
create testcase.

PFA patch to fix the issue.

regards,
Rushabh Lathia
www.EnterpriseDB.com

Attachments:

serializesnapshot_logic.patchtext/x-diff; charset=US-ASCII; name=serializesnapshot_logic.patchDownload
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 63e908d..b88e012 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1515,7 +1515,7 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
 	 * snapshot taken during recovery; all the top-level XIDs are in subxip as
 	 * well in that case, so we mustn't lose them.
 	 */
-	if (snapshot->subxcnt > 0)
+	if (serialized_snapshot->subxcnt > 0)
 	{
 		Size		subxipoff = sizeof(SerializedSnapshotData) +
 		snapshot->xcnt * sizeof(TransactionId);
#2Robert Haas
robertmhaas@gmail.com
In reply to: Rushabh Lathia (#1)
Re: Logic problem in SerializeSnapshot()

On Tue, Mar 1, 2016 at 12:34 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:

During the testing of parallel query (with force_parallel_mode = regress),
noticed random server crash with the below stack:

#0 0x0000003fc84896d5 in memcpy () from /lib64/libc.so.6
#1 0x0000000000a36867 in SerializeSnapshot (snapshot=0x1e49f40,
start_address=0x7f391e9ec728 <Address 0x7f391e9ec728 out of bounds>) at
snapmgr.c:1523
#2 0x0000000000522a20 in InitializeParallelDSM (pcxt=0x1e49ce0) at
parallel.c:330
#3 0x00000000006dd256 in ExecInitParallelPlan (planstate=0x1f012b0,
estate=0x1f00be8, nworkers=1) at execParallel.c:398
#4 0x00000000006f8abb in ExecGather (node=0x1f00d00) at nodeGather.c:160
#5 0x00000000006de42e in ExecProcNode (node=0x1f00d00) at
execProcnode.c:516
#6 0x00000000006da4fd in ExecutePlan (estate=0x1f00be8,
planstate=0x1f00d00, use_parallel_mode=1 '\001', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x1e5e118) at execMain.c:1633

So started looking into SerializeSnapshot() and with code reading I found
that
we ignore copying the SubXID array if it has overflowed, unless the snapshot
was taken during recovey, and for this we mark the
serialized_snapshot->subxcnt
to 0. But later while copying the SubXID array we check then condition based
on
snapshot->subxcnt. We should check serialized_snapshot->subxcnt rather then
snapshot->subxcnt.

I tried hard to come up with individual test but somehow I was unable to
create testcase.

PFA patch to fix the issue.

Thanks, that looks right to me. Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers