Re: POC: Cache data in GetSnapshotData()
On May 20, 2015 at 8:57 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
+1 to proceed with this patch for 9.6, as I think this patch improves the
situation with compare to current.
Also I have seen crash once in below test scenario:
Crashed in test with scale-factor - 300, other settings same as above:
./pgbench -c 128 -j 128 -T 1800 -M prepared postgres
I have rebased the patch and tried to run pgbench.
I see memory corruptions, attaching the valgrind report for the same.
First interesting callstack in valgrind report is as below.
==77922== For counts of detected and suppressed errors, rerun with: -v
==77922== Use --track-origins=yes to see where uninitialised values come
from
==77922== ERROR SUMMARY: 15 errors from 7 contexts (suppressed: 2 from 2)
==77873== Source and destination overlap in memcpy(0x5c08020, 0x5c08020, 4)
==77873== at 0x4C2E1DC: memcpy@@GLIBC_2.14 (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==77873== by 0x773303: GetSnapshotData (procarray.c:1698)
==77873== by 0x90FDB8: GetTransactionSnapshot (snapmgr.c:248)
==77873== by 0x79A22F: PortalStart (pquery.c:506)
==77873== by 0x795F67: exec_bind_message (postgres.c:1798)
==77873== by 0x798DDC: PostgresMain (postgres.c:4078)
==77873== by 0x724B27: BackendRun (postmaster.c:4237)
==77873== by 0x7242BB: BackendStartup (postmaster.c:3913)
==77873== by 0x720CF9: ServerLoop (postmaster.c:1684)
==77873== by 0x720380: PostmasterMain (postmaster.c:1292)
==77873== by 0x67CC9D: main (main.c:223)
==77873==
==77873== Source and destination overlap in memcpy(0x5c08020, 0x5c08020, 4)
==77873== at 0x4C2E1DC: memcpy@@GLIBC_2.14 (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==77873== by 0x77304A: GetSnapshotData (procarray.c:1579)
==77873== by 0x90FD75: GetTransactionSnapshot (snapmgr.c:233)
==77873== by 0x795A2F: exec_bind_message (postgres.c:1613)
==77873== by 0x798DDC: PostgresMain (postgres.c:4078)
==77873== by 0x724B27: BackendRun (postmaster.c:4237)
==77873== by 0x7242BB: BackendStartup (postmaster.c:3913)
==77873== by 0x720CF9: ServerLoop (postmaster.c:1684)
==77873== by 0x720380: PostmasterMain (postmaster.c:1292)
==77873== by 0x67CC9D: main (main.c:223)
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Attachments:
valgrind.outapplication/octet-stream; name=valgrind.outDownload
On Thu, Dec 17, 2015 at 3:15 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
I have rebased the patch and tried to run pgbench.
I see memory corruptions, attaching the valgrind report for the same.
Sorry forgot to add re-based patch, adding the same now.
After some analysis I saw writing to shared memory to store shared snapshot
is not protected under exclusive write lock, this leads to memory
corruptions.
I think until this is fixed measuring the performance will not be much
useful.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Attachments:
cache_snapshot_in_GetSnapshotData.patchtext/x-patch; charset=US-ASCII; name=cache_snapshot_in_GetSnapshotData.patchDownload+88-39
On 2015-12-19 22:47:30 -0800, Mithun Cy wrote:
After some analysis I saw writing to shared memory to store shared snapshot
is not protected under exclusive write lock, this leads to memory
corruptions.
I think until this is fixed measuring the performance will not be much
useful.
I think at the very least the cache should be protected by a separate
lock, and that lock should be acquired with TryLock. I.e. the cache is
updated opportunistically. I'd go for an lwlock in the first iteration.
If that works nicely we can try to keep several 'snapshot slots' around,
and only lock one of them exclusively. With some care users of cached
snapshots can copy the snapshot, while another slot is updated in
parallel. But that's definitely not step 1.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers