Re: POC: Cache data in GetSnapshotData()
On Mon, Jan 4, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
I think at the very least the cache should be protected by a separate
lock, and that lock should be acquired with TryLock. I.e. the cache is
updated opportunistically. I'd go for an lwlock in the first iteration.
I tried to implement a simple patch which protect the cache. Of all the
backend which
compute the snapshot(when cache is invalid) only one of them will write to
cache.
This is done with one atomic compare and swap operation.
After above fix memory corruption is not visible. But I see some more
failures at higher client sessions(128 it is easily reproducible).
Errors are
DETAIL: Key (bid)=(24) already exists.
STATEMENT: UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid
= $2;
client 17 aborted in state 11: ERROR: duplicate key value violates unique
constraint "pgbench_branches_pkey"
DETAIL: Key (bid)=(24) already exists.
client 26 aborted in state 11: ERROR: duplicate key value violates unique
constraint "pgbench_branches_pkey"
DETAIL: Key (bid)=(87) already exists.
ERROR: duplicate key value violates unique constraint
"pgbench_branches_pkey"
DETAIL: Key (bid)=(113) already exists.
After some analysis I think In GetSnapshotData() while computing snapshot.
/*
* We don't include our own XIDs (if any) in the snapshot,
but we
* must include them in xmin.
*/
if (NormalTransactionIdPrecedes(xid, xmin))
xmin = xid;
*********** if (pgxact == MyPgXact) ******************
continue;
We do not add our own xid to xip array, I am wondering if other backend try
to use
the same snapshot will it be able to see changes made by me(current
backend).
I think since we compute a snapshot which will be used by other backends we
need to
add our xid to xip array to tell transaction is open.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Attachments:
cache_snapshot_data_avoid_cuncurrent_write_to_cache.patchtext/x-patch; charset=US-ASCII; name=cache_snapshot_data_avoid_cuncurrent_write_to_cache.patchDownload
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 1559,1564 **** finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
--- 1559,1565 ----
elog(ERROR, "cache lookup failed for relation %u", OIDOldHeap);
relform = (Form_pg_class) GETSTRUCT(reltup);
+ Assert(TransactionIdIsNormal(frozenXid));
relform->relfrozenxid = frozenXid;
relform->relminmxid = cutoffMulti;
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 410,415 **** ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
--- 410,417 ----
if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
{
ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+ ProcGlobal->cached_snapshot_valid = false;
LWLockRelease(ProcArrayLock);
}
else
***************
*** 557,562 **** ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
--- 559,568 ----
/* Move to next proc in list. */
nextidx = pg_atomic_read_u32(&proc->nextClearXidElem);
}
+
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+
+ ProcGlobal->cached_snapshot_valid = false;
/* We're done with the lock now. */
LWLockRelease(ProcArrayLock);
***************
*** 1543,1548 **** GetSnapshotData(Snapshot snapshot)
--- 1549,1556 ----
errmsg("out of memory")));
}
+ snapshot->takenDuringRecovery = RecoveryInProgress();
+
/*
* It is sufficient to get shared lock on ProcArrayLock, even if we are
* going to set MyPgXact->xmin.
***************
*** 1557,1568 **** GetSnapshotData(Snapshot snapshot)
/* initialize xmin calculation with xmax */
globalxmin = xmin = xmax;
! snapshot->takenDuringRecovery = RecoveryInProgress();
! if (!snapshot->takenDuringRecovery)
{
int *pgprocnos = arrayP->pgprocnos;
int numProcs;
/*
* Spin over procArray checking xid, xmin, and subxids. The goal is
--- 1565,1599 ----
/* initialize xmin calculation with xmax */
globalxmin = xmin = xmax;
! if (!snapshot->takenDuringRecovery && ProcGlobal->cached_snapshot_valid)
! {
! Snapshot csnap = &ProcGlobal->cached_snapshot;
! TransactionId *saved_xip;
! TransactionId *saved_subxip;
! saved_xip = snapshot->xip;
! saved_subxip = snapshot->subxip;
!
! memcpy(snapshot, csnap, sizeof(SnapshotData));
!
! snapshot->xip = saved_xip;
! snapshot->subxip = saved_subxip;
!
! memcpy(snapshot->xip, csnap->xip,
! sizeof(TransactionId) * csnap->xcnt);
! memcpy(snapshot->subxip, csnap->subxip,
! sizeof(TransactionId) * csnap->subxcnt);
! globalxmin = ProcGlobal->cached_snapshot_globalxmin;
! xmin = csnap->xmin;
!
! Assert(TransactionIdIsValid(globalxmin));
! Assert(TransactionIdIsValid(xmin));
! }
! else if (!snapshot->takenDuringRecovery)
{
int *pgprocnos = arrayP->pgprocnos;
int numProcs;
+ const uint32 snapshot_cached= 0;
/*
* Spin over procArray checking xid, xmin, and subxids. The goal is
***************
*** 1577,1591 **** GetSnapshotData(Snapshot snapshot)
TransactionId xid;
/*
! * Backend is doing logical decoding which manages xmin
! * separately, check below.
*/
! if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
! continue;
!
! /* Ignore procs running LAZY VACUUM */
! if (pgxact->vacuumFlags & PROC_IN_VACUUM)
! continue;
/* Update globalxmin to be the smallest valid xmin */
xid = pgxact->xmin; /* fetch just once */
--- 1608,1619 ----
TransactionId xid;
/*
! * Ignore procs running LAZY VACUUM (which don't need to retain
! * rows) and backends doing logical decoding (which manages xmin
! * separately, check below).
*/
! if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
! continue;
/* Update globalxmin to be the smallest valid xmin */
xid = pgxact->xmin; /* fetch just once */
***************
*** 1653,1658 **** GetSnapshotData(Snapshot snapshot)
--- 1681,1715 ----
}
}
}
+
+ /*
+ * Let only one of the caller cache the computed snapshot, and others
+ * can continue as before.
+ */
+ if (pg_atomic_compare_exchange_u32(&ProcGlobal->snapshot_cached,
+ &snapshot_cached, 1))
+ {
+ Snapshot csnap = &ProcGlobal->cached_snapshot;
+ TransactionId *saved_xip;
+ TransactionId *saved_subxip;
+
+ ProcGlobal->cached_snapshot_globalxmin = globalxmin;
+
+ saved_xip = csnap->xip;
+ saved_subxip = csnap->subxip;
+ memcpy(csnap, snapshot, sizeof(SnapshotData));
+ csnap->xip = saved_xip;
+ csnap->subxip = saved_subxip;
+
+ /* not yet stored. Yuck */
+ csnap->xmin = xmin;
+
+ memcpy(csnap->xip, snapshot->xip,
+ sizeof(TransactionId) * csnap->xcnt);
+ memcpy(csnap->subxip, snapshot->subxip,
+ sizeof(TransactionId) * csnap->subxcnt);
+ ProcGlobal->cached_snapshot_valid = true;
+ }
}
else
{
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
***************
*** 51,57 ****
#include "storage/spin.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
!
/* GUC variables */
int DeadlockTimeout = 1000;
--- 51,57 ----
#include "storage/spin.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
! #include "port/atomics.h"
/* GUC variables */
int DeadlockTimeout = 1000;
***************
*** 114,119 **** ProcGlobalShmemSize(void)
--- 114,126 ----
size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(PGXACT)));
size = add_size(size, mul_size(max_prepared_xacts, sizeof(PGXACT)));
+ /* for the cached snapshot */
+ #define PROCARRAY_MAXPROCS (MaxBackends + max_prepared_xacts)
+ size = add_size(size, mul_size(sizeof(TransactionId), PROCARRAY_MAXPROCS));
+ #define TOTAL_MAX_CACHED_SUBXIDS \
+ ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
+ size = add_size(size, mul_size(sizeof(TransactionId), TOTAL_MAX_CACHED_SUBXIDS));
+
return size;
}
***************
*** 275,280 **** InitProcGlobal(void)
--- 282,294 ----
/* Create ProcStructLock spinlock, too */
ProcStructLock = (slock_t *) ShmemAlloc(sizeof(slock_t));
SpinLockInit(ProcStructLock);
+
+ /* cached snapshot */
+ ProcGlobal->cached_snapshot_valid = false;
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+ ProcGlobal->cached_snapshot.xip = ShmemAlloc(PROCARRAY_MAXPROCS * sizeof(TransactionId));
+ ProcGlobal->cached_snapshot.subxip = ShmemAlloc(TOTAL_MAX_CACHED_SUBXIDS * sizeof(TransactionId));
+ ProcGlobal->cached_snapshot_globalxmin = InvalidTransactionId;
}
/*
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
***************
*** 16,21 ****
--- 16,22 ----
#include "access/xlogdefs.h"
#include "lib/ilist.h"
+ #include "utils/snapshot.h"
#include "storage/latch.h"
#include "storage/lock.h"
#include "storage/pg_sema.h"
***************
*** 220,225 **** typedef struct PROC_HDR
--- 221,232 ----
int startupProcPid;
/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
int startupBufferPinWaitBufId;
+
+ /* Cached snapshot */
+ volatile bool cached_snapshot_valid;
+ pg_atomic_uint32 snapshot_cached;
+ SnapshotData cached_snapshot;
+ TransactionId cached_snapshot_globalxmin;
} PROC_HDR;
extern PROC_HDR *ProcGlobal;
On Fri, Jan 15, 2016 at 11:23 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
On Mon, Jan 4, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
I think at the very least the cache should be protected by a separate
lock, and that lock should be acquired with TryLock. I.e. the cache is
updated opportunistically. I'd go for an lwlock in the first iteration.I tried to implement a simple patch which protect the cache. Of all the
backend which
compute the snapshot(when cache is invalid) only one of them will write to
cache.
This is done with one atomic compare and swap operation.After above fix memory corruption is not visible. But I see some more
failures at higher client sessions(128 it is easily reproducible).
Don't you think we need to update the snapshot fields like count,
subcount before saving it to shared memory?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 15, 2016 at 11:23 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
On Mon, Jan 4, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
I think at the very least the cache should be protected by a separate
lock, and that lock should be acquired with TryLock. I.e. the cache is
updated opportunistically. I'd go for an lwlock in the first iteration.I tried to implement a simple patch which protect the cache. Of all the
backend which
compute the snapshot(when cache is invalid) only one of them will write to
cache.
This is done with one atomic compare and swap operation.After above fix memory corruption is not visible. But I see some more
failures at higher client sessions(128 it is easily reproducible).Errors are
DETAIL: Key (bid)=(24) already exists.
STATEMENT: UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid
= $2;
client 17 aborted in state 11: ERROR: duplicate key value violates unique
constraint "pgbench_branches_pkey"
DETAIL: Key (bid)=(24) already exists.
client 26 aborted in state 11: ERROR: duplicate key value violates unique
constraint "pgbench_branches_pkey"
DETAIL: Key (bid)=(87) already exists.
ERROR: duplicate key value violates unique constraint
"pgbench_branches_pkey"
DETAIL: Key (bid)=(113) already exists.After some analysis I think In GetSnapshotData() while computing snapshot.
/*
* We don't include our own XIDs (if any) in the snapshot,
but we
* must include them in xmin.
*/
if (NormalTransactionIdPrecedes(xid, xmin))
xmin = xid;
*********** if (pgxact == MyPgXact) ******************
continue;We do not add our own xid to xip array, I am wondering if other backend
try to use
the same snapshot will it be able to see changes made by me(current
backend).
I think since we compute a snapshot which will be used by other backends
we need to
add our xid to xip array to tell transaction is open.
I also think this observation of yours is right and currently that is
okay because we always first check TransactionIdIsCurrentTransactionId().
Refer comments on top of XidInMVCCSnapshot() [1]Note: GetSnapshotData never stores either top xid or subxids of our own backend into a snapshot, so these xids will not be reported as "running" by this function. This is OK for current uses, because we always check TransactionIdIsCurrentTransactionId first, except for known-committed XIDs which could not be ours anyway.. However, I
am not sure if it is good idea to start including current backends xid
in snapshot, because that can lead to including its subxids as well
which can make snapshot size bigger for cases where current transaction
has many sub-transactions. One way could be that we store current
backends transaction and sub-transaction id's only for the saved
snapshot, does that sound reasonable to you?
Other than that, I think patch needs to clear saved snapshot for
Commit Prepared Transaction as well (refer function
FinishPreparedTransaction()).
! else if (!snapshot->takenDuringRecovery)
{
int *pgprocnos = arrayP->pgprocnos;
int numProcs;
+ const uint32 snapshot_cached= 0;
I don't think the const is required for above variable.
[1]: Note: GetSnapshotData never stores either top xid or subxids of our own backend into a snapshot, so these xids will not be reported as "running" by this function. This is OK for current uses, because we always check TransactionIdIsCurrentTransactionId first, except for known-committed XIDs which could not be ours anyway.
Note: GetSnapshotData never stores either top xid or subxids of our own
backend into a snapshot, so these xids will not be reported as "running"
by this function. This is OK for current uses, because we always check
TransactionIdIsCurrentTransactionId first, except for known-committed
XIDs which could not be ours anyway.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Jan 16, 2016 at 10:23 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Fri, Jan 15, 2016 at 11:23 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
On Mon, Jan 4, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
I think at the very least the cache should be protected by a separate
lock, and that lock should be acquired with TryLock. I.e. the cache is
updated opportunistically. I'd go for an lwlock in the first iteration.I also think this observation of yours is right and currently that is
okay because we always first check TransactionIdIsCurrentTransactionId().+ const uint32 snapshot_cached= 0;
I have fixed all of the issues reported by regress test. Also now when
backend try to cache the snapshot we also try to store the self-xid and sub
xid, so other backends can use them.
I also did some read-only perf tests.
Non-Default Settings.
================
scale_factor=300.
./postgres -c shared_buffers=16GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9
./pgbench -c $clients -j $clients -T 300 -M prepared -S postgres
Machine Detail:
cpu : POWER8
cores: 24 (192 with HT).
Clients Base With cached snapshot
1 19653.914409 19926.884664
16 190764.519336 190040.299297
32 339327.881272 354467.445076
48 462632.02493 464767.917813
64 522642.515148 533271.556703
80 515262.813189 513353.962521
But did not see any perf improvement. Will continue testing the same.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Attachments:
Cache_data_in_GetSnapshotData_POC.patchtext/x-patch; charset=US-ASCII; name=Cache_data_in_GetSnapshotData_POC.patchDownload
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 5cb28cf..a441ca0 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1561,6 +1561,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
elog(ERROR, "cache lookup failed for relation %u", OIDOldHeap);
relform = (Form_pg_class) GETSTRUCT(reltup);
+ Assert(TransactionIdIsNormal(frozenXid));
relform->relfrozenxid = frozenXid;
relform->relminmxid = cutoffMulti;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 97e8962..8db028f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -372,11 +372,19 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
(arrayP->numProcs - index - 1) * sizeof(int));
arrayP->pgprocnos[arrayP->numProcs - 1] = -1; /* for debugging */
arrayP->numProcs--;
+
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+
+ ProcGlobal->cached_snapshot_valid = false;
LWLockRelease(ProcArrayLock);
return;
}
}
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+
+ ProcGlobal->cached_snapshot_valid = false;
+
/* Ooops */
LWLockRelease(ProcArrayLock);
@@ -420,6 +428,8 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
{
ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+ ProcGlobal->cached_snapshot_valid = false;
LWLockRelease(ProcArrayLock);
}
else
@@ -568,6 +578,9 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
nextidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
}
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+ ProcGlobal->cached_snapshot_valid = false;
+
/* We're done with the lock now. */
LWLockRelease(ProcArrayLock);
@@ -1553,6 +1566,8 @@ GetSnapshotData(Snapshot snapshot)
errmsg("out of memory")));
}
+ snapshot->takenDuringRecovery = RecoveryInProgress();
+
/*
* It is sufficient to get shared lock on ProcArrayLock, even if we are
* going to set MyPgXact->xmin.
@@ -1567,12 +1582,39 @@ GetSnapshotData(Snapshot snapshot)
/* initialize xmin calculation with xmax */
globalxmin = xmin = xmax;
- snapshot->takenDuringRecovery = RecoveryInProgress();
+ if (!snapshot->takenDuringRecovery && ProcGlobal->cached_snapshot_valid)
+ {
+ Snapshot csnap = &ProcGlobal->cached_snapshot;
+ TransactionId *saved_xip;
+ TransactionId *saved_subxip;
- if (!snapshot->takenDuringRecovery)
+ saved_xip = snapshot->xip;
+ saved_subxip = snapshot->subxip;
+
+ memcpy(snapshot, csnap, sizeof(SnapshotData));
+
+ snapshot->xip = saved_xip;
+ snapshot->subxip = saved_subxip;
+
+ memcpy(snapshot->xip, csnap->xip,
+ sizeof(TransactionId) * csnap->xcnt);
+ memcpy(snapshot->subxip, csnap->subxip,
+ sizeof(TransactionId) * csnap->subxcnt);
+ globalxmin = ProcGlobal->cached_snapshot_globalxmin;
+ xmin = csnap->xmin;
+
+ count = csnap->xcnt;
+ subcount = csnap->subxcnt;
+ suboverflowed = csnap->suboverflowed;
+
+ Assert(TransactionIdIsValid(globalxmin));
+ Assert(TransactionIdIsValid(xmin));
+ }
+ else if (!snapshot->takenDuringRecovery)
{
int *pgprocnos = arrayP->pgprocnos;
int numProcs;
+ uint32 snapshot_cached= 0;
/*
* Spin over procArray checking xid, xmin, and subxids. The goal is
@@ -1587,14 +1629,11 @@ GetSnapshotData(Snapshot snapshot)
TransactionId xid;
/*
- * Backend is doing logical decoding which manages xmin
- * separately, check below.
+ * Ignore procs running LAZY VACUUM (which don't need to retain
+ * rows) and backends doing logical decoding (which manages xmin
+ * separately, check below).
*/
- if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
- continue;
-
- /* Ignore procs running LAZY VACUUM */
- if (pgxact->vacuumFlags & PROC_IN_VACUUM)
+ if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
continue;
/* Update globalxmin to be the smallest valid xmin */
@@ -1663,6 +1702,62 @@ GetSnapshotData(Snapshot snapshot)
}
}
}
+
+ /*
+ * Let only one of the caller cache the computed snapshot, and others
+ * can continue as before.
+ */
+ if (pg_atomic_compare_exchange_u32(&ProcGlobal->snapshot_cached,
+ &snapshot_cached, 1))
+ {
+ Snapshot csnap = &ProcGlobal->cached_snapshot;
+ TransactionId *saved_xip;
+ TransactionId *saved_subxip;
+ int curr_subcount= subcount;
+
+ ProcGlobal->cached_snapshot_globalxmin = globalxmin;
+
+ saved_xip = csnap->xip;
+ saved_subxip = csnap->subxip;
+ memcpy(csnap, snapshot, sizeof(SnapshotData));
+ csnap->xip = saved_xip;
+ csnap->subxip = saved_subxip;
+
+ /* not yet stored. Yuck */
+ csnap->xmin = xmin;
+
+ memcpy(csnap->xip, snapshot->xip,
+ sizeof(TransactionId) * count);
+ memcpy(csnap->subxip, snapshot->subxip,
+ sizeof(TransactionId) * subcount);
+
+ /* Add my own XID to snapshot. */
+ csnap->xip[count] = MyPgXact->xid;
+ csnap->xcnt = count + 1;
+
+ if (!suboverflowed)
+ {
+ if (MyPgXact->overflowed)
+ suboverflowed = true;
+ else
+ {
+ int nxids = MyPgXact->nxids;
+
+ if (nxids > 0)
+ {
+ memcpy(csnap->subxip + subcount,
+ (void *) MyProc->subxids.xids,
+ nxids * sizeof(TransactionId));
+ curr_subcount += nxids;
+ }
+ }
+ }
+
+ csnap->subxcnt = curr_subcount;
+ csnap->suboverflowed = suboverflowed;
+
+ ProcGlobal->cached_snapshot_valid = true;
+ }
}
else
{
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6453b88..b8d0297 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,7 +51,7 @@
#include "storage/spin.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
-
+#include "port/atomics.h"
/* GUC variables */
int DeadlockTimeout = 1000;
@@ -114,6 +114,13 @@ ProcGlobalShmemSize(void)
size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(PGXACT)));
size = add_size(size, mul_size(max_prepared_xacts, sizeof(PGXACT)));
+ /* for the cached snapshot */
+#define PROCARRAY_MAXPROCS (MaxBackends + max_prepared_xacts)
+ size = add_size(size, mul_size(sizeof(TransactionId), PROCARRAY_MAXPROCS));
+#define TOTAL_MAX_CACHED_SUBXIDS \
+ ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
+ size = add_size(size, mul_size(sizeof(TransactionId), TOTAL_MAX_CACHED_SUBXIDS));
+
return size;
}
@@ -278,6 +285,13 @@ InitProcGlobal(void)
/* Create ProcStructLock spinlock, too */
ProcStructLock = (slock_t *) ShmemAlloc(sizeof(slock_t));
SpinLockInit(ProcStructLock);
+
+ /* cached snapshot */
+ ProcGlobal->cached_snapshot_valid = false;
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+ ProcGlobal->cached_snapshot.xip = ShmemAlloc(PROCARRAY_MAXPROCS * sizeof(TransactionId));
+ ProcGlobal->cached_snapshot.subxip = ShmemAlloc(TOTAL_MAX_CACHED_SUBXIDS * sizeof(TransactionId));
+ ProcGlobal->cached_snapshot_globalxmin = InvalidTransactionId;
}
/*
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index dbcdd3f..73a424e 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
#include "access/xlogdefs.h"
#include "lib/ilist.h"
+#include "utils/snapshot.h"
#include "storage/latch.h"
#include "storage/lock.h"
#include "storage/pg_sema.h"
@@ -234,6 +235,12 @@ typedef struct PROC_HDR
int startupProcPid;
/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
int startupBufferPinWaitBufId;
+
+ /* Cached snapshot */
+ volatile bool cached_snapshot_valid;
+ pg_atomic_uint32 snapshot_cached;
+ SnapshotData cached_snapshot;
+ TransactionId cached_snapshot_globalxmin;
} PROC_HDR;
extern PROC_HDR *ProcGlobal;
On Thu, Feb 25, 2016 at 12:57 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
I have fixed all of the issues reported by regress test. Also now when
backend try to cache the snapshot we also try to store the self-xid and sub
xid, so other backends can use them.I also did some read-only perf tests.
I'm not sure what you are doing that keeps breaking threading for
Gmail, but this thread is getting split up for me into multiple
threads with only a few messages in each. The same seems to be
happening in the community archives. Please try to figure out what is
causing that and stop doing it.
I notice you seem to not to have implemented this suggestion by Andres:
http://www.postgresql.org//message-id/20160104085845.m5nrypvmmpea5nm7@alap3.anarazel.de
Also, you should test this on a machine with more than 2 cores.
Andres's original test seems to have been on a 4-core system, where
this would be more likely to work out.
Also, Andres suggested testing this on an 80-20 write mix, where as
you tested it on 100% read-only. In that case there is no blocking
ProcArrayLock, which reduces the chances that the patch will benefit.
I suspect, too, that the chances of this patch working out have
probably been reduced by 0e141c0fbb211bdd23783afa731e3eef95c9ad7a,
which seems to be targeting mostly the same problem. I mean it's
possible that this patch could win even when no ProcArrayLock-related
blocking is happening, but the original idea seems to have been that
it would help mostly with that case. You could try benchmarking it on
the commit just before that one and then on current sources and see if
you get the same results on both, or if there was more benefit before
that commit.
Also, you could consider repeating the LWLOCK_STATS testing that Amit
did in his original reply to Andres. That would tell you whether the
performance is not increasing because the patch doesn't reduce
ProcArrayLock contention, or whether the performance is not increasing
because the patch DOES reduce ProcArrayLock contention but then the
contention shifts to CLogControlLock or elsewhere.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Feb 26, 2016 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Feb 25, 2016 at 12:57 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
I have fixed all of the issues reported by regress test. Also now when
backend try to cache the snapshot we also try to store the self-xid and
sub
xid, so other backends can use them.
+ /* Add my own XID to snapshot. */
+ csnap->xip[count] = MyPgXact->xid;
+ csnap->xcnt = count + 1;
Don't we need to add this only when the xid of current transaction is
valid? Also, I think it will be better if we can explain why we need to
add the our own transaction id while caching the snapshot.
I also did some read-only perf tests.
I'm not sure what you are doing that keeps breaking threading for
Gmail, but this thread is getting split up for me into multiple
threads with only a few messages in each. The same seems to be
happening in the community archives. Please try to figure out what is
causing that and stop doing it.I notice you seem to not to have implemented this suggestion by Andres:
http://www.postgresql.org//message-id/20160104085845.m5nrypvmmpea5nm7@alap3.anarazel.de
As far as I can understand by reading the patch, it is kind of already
implemented the first suggestion by Andres which is to use try lock, now
the patch is using atomic ops to implement the same instead of using
lwlock, but I think it should show the same impact, do you see any problem
with the same?
Now talking about second suggestion which is to use some form of 'snapshot
slots' to reduce the contention further, it seems that could be beneficial,
if see any gain with try lock. If you think that can benefit over current
approach taken in patch, then we can discuss how to implement the same.
Also, you should test this on a machine with more than 2 cores.
Andres's original test seems to have been on a 4-core system, where
this would be more likely to work out.Also, Andres suggested testing this on an 80-20 write mix, where as
you tested it on 100% read-only. In that case there is no blocking
ProcArrayLock, which reduces the chances that the patch will benefit.
+1
Also, you could consider repeating the LWLOCK_STATS testing that Amit
did in his original reply to Andres. That would tell you whether the
performance is not increasing because the patch doesn't reduce
ProcArrayLock contention, or whether the performance is not increasing
because the patch DOES reduce ProcArrayLock contention but then the
contention shifts to CLogControlLock or elsewhere.
+1
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 1, 2016 at 12:59 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
Don't we need to add this only when the xid of current transaction is
valid? Also, I think it will be better if we can explain why we need to
add the our >own transaction id while caching the snapshot.
I have fixed the same thing and patch is attached.
Some more tests done after that
*pgbench write tests: on 8 socket, 64 core machine.*
/postgres -c shared_buffers=16GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9
./pgbench -c $clients -j $clients -T 1800 -M prepared postgres
[image: Inline image 3]
A small improvement in performance at 64 thread.
*LWLock_Stats data:*
ProcArrayLock: Base.
=================
postgresql-2016-03-01_115252.log:PID 110019 lwlock main 4: shacq 1867601
exacq 35625 blk 134682 spindelay 128 dequeue self 28871
postgresql-2016-03-01_115253.log:PID 110115 lwlock main 4: shacq 2201613
exacq 43489 blk 155499 spindelay 127 dequeue self 32751
postgresql-2016-03-01_115253.log:PID 110122 lwlock main 4: shacq 2231327
exacq 44824 blk 159440 spindelay 128 dequeue self 33336
postgresql-2016-03-01_115254.log:PID 110126 lwlock main 4: shacq 2247983
exacq 44632 blk 158669 spindelay 131 dequeue self 33365
postgresql-2016-03-01_115254.log:PID 110059 lwlock main 4: shacq 2036809
exacq 38607 blk 143538 spindelay 117 dequeue self 31008
ProcArrayLock: With Patch.
=====================
postgresql-2016-03-01_124747.log:PID 1789 lwlock main 4: shacq 2273958
exacq 61605 blk 79581 spindelay 307 dequeue self 66088
postgresql-2016-03-01_124748.log:PID 1880 lwlock main 4: shacq 2456388
exacq 65996 blk 82300 spindelay 470 dequeue self 68770
postgresql-2016-03-01_124748.log:PID 1765 lwlock main 4: shacq 2244083
exacq 60835 blk 79042 spindelay 336 dequeue self 65212
postgresql-2016-03-01_124749.log:PID 1882 lwlock main 4: shacq 2489271
exacq 67043 blk 85650 spindelay 463 dequeue self 68401
postgresql-2016-03-01_124749.log:PID 1753 lwlock main 4: shacq 2232791
exacq 60647 blk 78659 spindelay 364 dequeue self 65180
postgresql-2016-03-01_124750.log:PID 1849 lwlock main 4: shacq 2374922
exacq 64075 blk 81860 spindelay 339 dequeue self 67584
*-------------Block time of ProcArrayLock has reduced significantly.*
ClogControlLock : Base.
===================
postgresql-2016-03-01_115302.log:PID 110040 lwlock main 11: shacq 586129
exacq 268808 blk 90570 spindelay 261 dequeue self 59619
postgresql-2016-03-01_115303.log:PID 110047 lwlock main 11: shacq 593492
exacq 271019 blk 89547 spindelay 268 dequeue self 59999
postgresql-2016-03-01_115303.log:PID 110078 lwlock main 11: shacq 620830
exacq 285244 blk 92939 spindelay 262 dequeue self 61912
postgresql-2016-03-01_115304.log:PID 110083 lwlock main 11: shacq 633101
exacq 289983 blk 93485 spindelay 262 dequeue self 62394
postgresql-2016-03-01_115305.log:PID 110105 lwlock main 11: shacq 646584
exacq 297598 blk 93331 spindelay 312 dequeue self 63279
ClogControlLock : With Patch.
=======================
postgresql-2016-03-01_124730.log:PID 1865 lwlock main 11: shacq 722881
exacq 330273 blk 106163 spindelay 468 dequeue self 80316
postgresql-2016-03-01_124731.log:PID 1857 lwlock main 11: shacq 713720
exacq 327158 blk 106719 spindelay 439 dequeue self 79996
postgresql-2016-03-01_124732.log:PID 1826 lwlock main 11: shacq 696762
exacq 317239 blk 104523 spindelay 448 dequeue self 79374
postgresql-2016-03-01_124732.log:PID 1862 lwlock main 11: shacq 721272
exacq 330350 blk 105965 spindelay 492 dequeue self 81036
postgresql-2016-03-01_124733.log:PID 1879 lwlock main 11: shacq 737398
exacq 335357 blk 105424 spindelay 520 dequeue self 80977
*-------------Block time of ClogControlLock has increased slightly.*
Will continue with further tests on lower clients.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Attachments:
Cache_data_in_GetSnapshotData_POC_01.patchapplication/octet-stream; name=Cache_data_in_GetSnapshotData_POC_01.patchDownload
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 5cb28cf..a441ca0 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1561,6 +1561,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
elog(ERROR, "cache lookup failed for relation %u", OIDOldHeap);
relform = (Form_pg_class) GETSTRUCT(reltup);
+ Assert(TransactionIdIsNormal(frozenXid));
relform->relfrozenxid = frozenXid;
relform->relminmxid = cutoffMulti;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 97e8962..e1e4681 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -372,11 +372,19 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
(arrayP->numProcs - index - 1) * sizeof(int));
arrayP->pgprocnos[arrayP->numProcs - 1] = -1; /* for debugging */
arrayP->numProcs--;
+
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+
+ ProcGlobal->cached_snapshot_valid = false;
LWLockRelease(ProcArrayLock);
return;
}
}
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+
+ ProcGlobal->cached_snapshot_valid = false;
+
/* Ooops */
LWLockRelease(ProcArrayLock);
@@ -420,6 +428,8 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
{
ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+ ProcGlobal->cached_snapshot_valid = false;
LWLockRelease(ProcArrayLock);
}
else
@@ -568,6 +578,9 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
nextidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
}
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+ ProcGlobal->cached_snapshot_valid = false;
+
/* We're done with the lock now. */
LWLockRelease(ProcArrayLock);
@@ -1553,6 +1566,8 @@ GetSnapshotData(Snapshot snapshot)
errmsg("out of memory")));
}
+ snapshot->takenDuringRecovery = RecoveryInProgress();
+
/*
* It is sufficient to get shared lock on ProcArrayLock, even if we are
* going to set MyPgXact->xmin.
@@ -1567,12 +1582,39 @@ GetSnapshotData(Snapshot snapshot)
/* initialize xmin calculation with xmax */
globalxmin = xmin = xmax;
- snapshot->takenDuringRecovery = RecoveryInProgress();
+ if (!snapshot->takenDuringRecovery && ProcGlobal->cached_snapshot_valid)
+ {
+ Snapshot csnap = &ProcGlobal->cached_snapshot;
+ TransactionId *saved_xip;
+ TransactionId *saved_subxip;
+
+ saved_xip = snapshot->xip;
+ saved_subxip = snapshot->subxip;
- if (!snapshot->takenDuringRecovery)
+ memcpy(snapshot, csnap, sizeof(SnapshotData));
+
+ snapshot->xip = saved_xip;
+ snapshot->subxip = saved_subxip;
+
+ memcpy(snapshot->xip, csnap->xip,
+ sizeof(TransactionId) * csnap->xcnt);
+ memcpy(snapshot->subxip, csnap->subxip,
+ sizeof(TransactionId) * csnap->subxcnt);
+ globalxmin = ProcGlobal->cached_snapshot_globalxmin;
+ xmin = csnap->xmin;
+
+ count = csnap->xcnt;
+ subcount = csnap->subxcnt;
+ suboverflowed = csnap->suboverflowed;
+
+ Assert(TransactionIdIsValid(globalxmin));
+ Assert(TransactionIdIsValid(xmin));
+ }
+ else if (!snapshot->takenDuringRecovery)
{
int *pgprocnos = arrayP->pgprocnos;
int numProcs;
+ uint32 snapshot_cached= 0;
/*
* Spin over procArray checking xid, xmin, and subxids. The goal is
@@ -1587,14 +1629,11 @@ GetSnapshotData(Snapshot snapshot)
TransactionId xid;
/*
- * Backend is doing logical decoding which manages xmin
- * separately, check below.
+ * Ignore procs running LAZY VACUUM (which don't need to retain
+ * rows) and backends doing logical decoding (which manages xmin
+ * separately, check below).
*/
- if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
- continue;
-
- /* Ignore procs running LAZY VACUUM */
- if (pgxact->vacuumFlags & PROC_IN_VACUUM)
+ if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
continue;
/* Update globalxmin to be the smallest valid xmin */
@@ -1663,6 +1702,79 @@ GetSnapshotData(Snapshot snapshot)
}
}
}
+
+ /*
+ * Let only one of the caller cache the computed snapshot, and others
+ * can continue as before.
+ */
+ if (pg_atomic_compare_exchange_u32(&ProcGlobal->snapshot_cached,
+ &snapshot_cached, 1))
+ {
+ Snapshot csnap = &ProcGlobal->cached_snapshot;
+ TransactionId *saved_xip;
+ TransactionId *saved_subxip;
+ int curr_subcount= subcount;
+
+ ProcGlobal->cached_snapshot_globalxmin = globalxmin;
+
+ saved_xip = csnap->xip;
+ saved_subxip = csnap->subxip;
+ memcpy(csnap, snapshot, sizeof(SnapshotData));
+ csnap->xip = saved_xip;
+ csnap->subxip = saved_subxip;
+
+ /* not yet stored. Yuck */
+ csnap->xmin = xmin;
+
+ memcpy(csnap->xip, snapshot->xip,
+ sizeof(TransactionId) * count);
+ memcpy(csnap->subxip, snapshot->subxip,
+ sizeof(TransactionId) * subcount);
+ /*
+ * If the transaction has no XID assigned, we can skip it; it
+ * won't have sub-XIDs either. If the XID is >= xmax, we can also
+ * skip it; such transactions will be treated as running anyway
+ * (and any sub-XIDs will also be >= xmax).
+ */
+ if (TransactionIdIsNormal(MyPgXact->xid) && NormalTransactionIdPrecedes(MyPgXact->xid, xmax))
+ {
+
+ /* Add my own XID and sub-XIDs to snapshot. */
+ csnap->xip[count] = MyPgXact->xid;
+ csnap->xcnt = count + 1;
+
+ if (!suboverflowed)
+ {
+ if (MyPgXact->overflowed)
+ suboverflowed = true;
+ else
+ {
+ int nxids = MyPgXact->nxids;
+
+ if (nxids > 0)
+ {
+ memcpy(csnap->subxip + subcount,
+ (void *) MyProc->subxids.xids,
+ nxids * sizeof(TransactionId));
+ curr_subcount += nxids;
+ }
+ }
+ }
+
+ }
+ else
+ csnap->xcnt = count;
+
+ csnap->subxcnt = curr_subcount;
+ csnap->suboverflowed = suboverflowed;
+
+ /*
+ * The memory barrier has be to be placed here to ensure that
+ * cached_snapshot_valid is set only after snapshot is cached.
+ */
+ pg_write_barrier();
+ ProcGlobal->cached_snapshot_valid = true;
+ }
}
else
{
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6453b88..b8d0297 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,7 +51,7 @@
#include "storage/spin.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
-
+#include "port/atomics.h"
/* GUC variables */
int DeadlockTimeout = 1000;
@@ -114,6 +114,13 @@ ProcGlobalShmemSize(void)
size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(PGXACT)));
size = add_size(size, mul_size(max_prepared_xacts, sizeof(PGXACT)));
+ /* for the cached snapshot */
+#define PROCARRAY_MAXPROCS (MaxBackends + max_prepared_xacts)
+ size = add_size(size, mul_size(sizeof(TransactionId), PROCARRAY_MAXPROCS));
+#define TOTAL_MAX_CACHED_SUBXIDS \
+ ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
+ size = add_size(size, mul_size(sizeof(TransactionId), TOTAL_MAX_CACHED_SUBXIDS));
+
return size;
}
@@ -278,6 +285,13 @@ InitProcGlobal(void)
/* Create ProcStructLock spinlock, too */
ProcStructLock = (slock_t *) ShmemAlloc(sizeof(slock_t));
SpinLockInit(ProcStructLock);
+
+ /* cached snapshot */
+ ProcGlobal->cached_snapshot_valid = false;
+ pg_atomic_write_u32(&ProcGlobal->snapshot_cached, 0);
+ ProcGlobal->cached_snapshot.xip = ShmemAlloc(PROCARRAY_MAXPROCS * sizeof(TransactionId));
+ ProcGlobal->cached_snapshot.subxip = ShmemAlloc(TOTAL_MAX_CACHED_SUBXIDS * sizeof(TransactionId));
+ ProcGlobal->cached_snapshot_globalxmin = InvalidTransactionId;
}
/*
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index dbcdd3f..73a424e 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -16,6 +16,7 @@
#include "access/xlogdefs.h"
#include "lib/ilist.h"
+#include "utils/snapshot.h"
#include "storage/latch.h"
#include "storage/lock.h"
#include "storage/pg_sema.h"
@@ -234,6 +235,12 @@ typedef struct PROC_HDR
int startupProcPid;
/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
int startupBufferPinWaitBufId;
+
+ /* Cached snapshot */
+ volatile bool cached_snapshot_valid;
+ pg_atomic_uint32 snapshot_cached;
+ SnapshotData cached_snapshot;
+ TransactionId cached_snapshot_globalxmin;
} PROC_HDR;
extern PROC_HDR *ProcGlobal;
On Thu, Mar 3, 2016 at 6:20 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
On Tue, Mar 1, 2016 at 12:59 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:Don't we need to add this only when the xid of current transaction is
valid? Also, I think it will be better if we can explain why we need to
add the our >own transaction id while caching the snapshot.
I have fixed the same thing and patch is attached.Some more tests done after that
*pgbench write tests: on 8 socket, 64 core machine.*
/postgres -c shared_buffers=16GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9./pgbench -c $clients -j $clients -T 1800 -M prepared postgres
[image: Inline image 3]
A small improvement in performance at 64 thread.
What if you apply both this and Amit's clog control log patch(es)? Maybe
the combination of the two helps substantially more than either one alone.
Or maybe not, but seems worth checking.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
On Thu, Mar 3, 2016 at 11:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
What if you apply both this and Amit's clog control log patch(es)? Maybe
the combination of the two helps substantially more than either >one alone.
I did the above tests along with Amit's clog patch. Machine :8 socket, 64
core. 128 hyperthread.
clients BASE ONLY CLOG CHANGES % Increase ONLY SAVE SNAPSHOT % Increase CLOG
CHANGES + SAVE SNAPSHOT % Increase
64 29247.658034 30855.728835 5.4981181711 29254.532186 0.0235032562
32691.832776 11.7758992463
88 31214.305391 33313.393095 6.7247618606 32109.248609 2.8670931702
35433.655203 13.5173592978
128 30896.673949 34015.362152 10.0939285832 *** *** 34947.296355
13.110221549
256 27183.780921 31192.895437 14.7481857938 *** *** 32873.782735
20.9316056164
With clog changes, perf of caching the snapshot patch improves.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
On Thu, Mar 10, 2016 at 1:04 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
On Thu, Mar 3, 2016 at 11:50 PM, Robert Haas <robertmhaas@gmail.com>
wrote:What if you apply both this and Amit's clog control log patch(es)? Maybe
the combination of the two helps substantially more than either >one alone.
I did the above tests along with Amit's clog patch. Machine :8 socket, 64
core. 128 hyperthread.clients BASE ONLY CLOG CHANGES % Increase ONLY SAVE SNAPSHOT % Increase CLOG
CHANGES + SAVE SNAPSHOT % Increase
64 29247.658034 30855.728835 5.4981181711 29254.532186 0.0235032562
32691.832776 11.7758992463
88 31214.305391 33313.393095 6.7247618606 32109.248609 2.8670931702
35433.655203 13.5173592978
128 30896.673949 34015.362152 10.0939285832 *** *** 34947.296355
13.110221549
256 27183.780921 31192.895437 14.7481857938 *** *** 32873.782735
20.9316056164
With clog changes, perf of caching the snapshot patch improves.
This data looks promising to me and indicates that saving the snapshot has
benefits and we can see noticeable performance improvement especially once
the CLog contention gets reduced. I wonder if we should try these tests
with unlogged tables, because I suspect most of the contention after
CLogControlLock and ProcArrayLock is for WAL related locks, so you might be
able to see better gain with these patches. Do you think it makes sense to
check the performance by increasing CLOG buffers (patch for same is posted
in Speed up Clog thread [1]/messages/by-id/CAA4eK1LMMGNQ439BUm0LcS3p0sb8S9kc-cUGU_ThNqMwA8_Tug@mail.gmail.com) as that also relieves contention on CLOG as
per the tests I have done?
[1]: /messages/by-id/CAA4eK1LMMGNQ439BUm0LcS3p0sb8S9kc-cUGU_ThNqMwA8_Tug@mail.gmail.com
/messages/by-id/CAA4eK1LMMGNQ439BUm0LcS3p0sb8S9kc-cUGU_ThNqMwA8_Tug@mail.gmail.com
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Thanks Amit,
I did a quick pgbench write tests for unlogged tables at 88 clients as it
had the peak performance from previous test. There is big jump in TPS due
to clog changes.
clients BASE ONLY CLOG CHANGES % Increase ONLY SAVE SNAPSHOT % Increase CLOG
CHANGES + SAVE SNAPSHOT % Increase
88 36055.425005 52796.618434 46.4318294034 37728.004118 4.6389111008
56025.454917 55.3870323515
Clients + WITH INCREASE IN CLOG BUFFER % increase
88 58217.924138 61.4678626862
I will continue to benchmark above tests with much wider range of clients.
On Thu, Mar 10, 2016 at 1:36 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Thu, Mar 10, 2016 at 1:04 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:On Thu, Mar 3, 2016 at 11:50 PM, Robert Haas <robertmhaas@gmail.com>
wrote:What if you apply both this and Amit's clog control log patch(es)?
Maybe the combination of the two helps substantially more than either >one
alone.I did the above tests along with Amit's clog patch. Machine :8 socket, 64
core. 128 hyperthread.clients BASE ONLY CLOG CHANGES % Increase ONLY SAVE SNAPSHOT % Increase CLOG
CHANGES + SAVE SNAPSHOT % Increase
64 29247.658034 30855.728835 5.4981181711 29254.532186 0.0235032562
32691.832776 11.7758992463
88 31214.305391 33313.393095 6.7247618606 32109.248609 2.8670931702
35433.655203 13.5173592978
128 30896.673949 34015.362152 10.0939285832 *** *** 34947.296355
13.110221549
256 27183.780921 31192.895437 14.7481857938 *** *** 32873.782735
20.9316056164
With clog changes, perf of caching the snapshot patch improves.This data looks promising to me and indicates that saving the snapshot has
benefits and we can see noticeable performance improvement especially once
the CLog contention gets reduced. I wonder if we should try these tests
with unlogged tables, because I suspect most of the contention after
CLogControlLock and ProcArrayLock is for WAL related locks, so you might be
able to see better gain with these patches. Do you think it makes sense to
check the performance by increasing CLOG buffers (patch for same is posted
in Speed up Clog thread [1]) as that also relieves contention on CLOG as
per the tests I have done?[1] -
/messages/by-id/CAA4eK1LMMGNQ439BUm0LcS3p0sb8S9kc-cUGU_ThNqMwA8_Tug@mail.gmail.comWith Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
On Thu, Mar 3, 2016 at 6:20 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
I will continue to benchmark above tests with much wider range of clients.
Latest Benchmarking shows following results for unlogged tables.
clients BASE ONLY CLOG CHANGES % Increase CLOG CHANGES + SAVE SNAPSHOT %
Increase
1 1198.326337 1328.069656 10.8270439357 1234.078342 2.9834948875
32 37455.181727 38295.250519 2.2428640131 41023.126293 9.5259037641
64 48838.016451 50675.845885 3.7631123611 51662.814319 5.7840143259
88 36878.187766 53173.577363 44.1870671639 56025.454917 51.9203038731
128 35901.537773 52026.024098 44.9130798434 53864.486733 50.0339263281
256 28130.354402 46793.134156 66.3439197647 46817.04602 66.4289235427
Performance diff in 1 client seems just a run to run variance.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 16, 2016 at 4:40 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
On Thu, Mar 3, 2016 at 6:20 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:I will continue to benchmark above tests with much wider range of
clients.
Latest Benchmarking shows following results for unlogged tables.
clients BASE ONLY CLOG CHANGES % Increase CLOG CHANGES + SAVE SNAPSHOT %
Increase
1 1198.326337 1328.069656 10.8270439357 1234.078342 2.9834948875
32 37455.181727 38295.250519 2.2428640131 41023.126293 9.5259037641
64 48838.016451 50675.845885 3.7631123611 51662.814319 5.7840143259
88 36878.187766 53173.577363 44.1870671639 56025.454917 51.9203038731
128 35901.537773 52026.024098 44.9130798434 53864.486733 50.0339263281
256 28130.354402 46793.134156 66.3439197647 46817.04602 66.4289235427
Whoa. At 64 clients, we're hardly getting any benefit, but then by 88
clients, we're getting a huge benefit. I wonder why there's that sharp
change there.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2016-03-16 13:29:22 -0400, Robert Haas wrote:
Whoa. At 64 clients, we're hardly getting any benefit, but then by 88
clients, we're getting a huge benefit. I wonder why there's that sharp
change there.
What's the specifics of the machine tested? I wonder if it either
correlates with the number of hardware threads, real cores, or cache
sizes.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 16, 2016 at 1:33 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-03-16 13:29:22 -0400, Robert Haas wrote:
Whoa. At 64 clients, we're hardly getting any benefit, but then by 88
clients, we're getting a huge benefit. I wonder why there's that sharp
change there.What's the specifics of the machine tested? I wonder if it either
correlates with the number of hardware threads, real cores, or cache
sizes.
I think this was done on cthulhu, whose /proc/cpuinfo output ends this way:
processor : 127
vendor_id : GenuineIntel
cpu family : 6
model : 47
model name : Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
stepping : 2
microcode : 0x37
cpu MHz : 2129.000
cache size : 24576 KB
physical id : 3
siblings : 16
core id : 25
cpu cores : 8
apicid : 243
initial apicid : 243
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64
monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1
sse4_2 x2apic popcnt aes lahf_lm ida arat epb dtherm tpr_shadow vnmi
flexpriority ept vpid
bogomips : 4266.62
clflush size : 64
cache_alignment : 64
address sizes : 44 bits physical, 48 bits virtual
power management:
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 16, 2016 at 10:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 16, 2016 at 4:40 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:On Thu, Mar 3, 2016 at 6:20 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:I will continue to benchmark above tests with much wider range of
clients.
Latest Benchmarking shows following results for unlogged tables.
clients BASE ONLY CLOG CHANGES % Increase CLOG CHANGES + SAVE SNAPSHOT %
Increase
1 1198.326337 1328.069656 10.8270439357 1234.078342 2.9834948875
32 37455.181727 38295.250519 2.2428640131 41023.126293 9.5259037641
64 48838.016451 50675.845885 3.7631123611 51662.814319 5.7840143259
88 36878.187766 53173.577363 44.1870671639 56025.454917 51.9203038731
128 35901.537773 52026.024098 44.9130798434 53864.486733 50.0339263281
256 28130.354402 46793.134156 66.3439197647 46817.04602 66.4289235427Whoa. At 64 clients, we're hardly getting any benefit, but then by 88
clients, we're getting a huge benefit. I wonder why there's that sharp
change there.
If you see, for the Base readings, there is a performance increase up till
64 clients and then there is a fall at 88 clients, which to me indicates
that it hits very high-contention around CLogControlLock at 88 clients
which CLog patch is able to control to a great degree (if required, I think
the same can be verified by LWLock stats data). One reason for hitting
contention at 88 clients is that this machine seems to have 64-cores (it
has 8 sockets and 8 Core(s) per socket) as per below information of lscpu
command.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
Stepping: 2
CPU MHz: 1064.000
BogoMIPS: 4266.62
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
NUMA node0 CPU(s): 0,65-71,96-103
NUMA node1 CPU(s): 72-79,104-111
NUMA node2 CPU(s): 80-87,112-119
NUMA node3 CPU(s): 88-95,120-127
NUMA node4 CPU(s): 1-8,33-40
NUMA node5 CPU(s): 9-16,41-48
NUMA node6 CPU(s): 17-24,49-56
NUMA node7 CPU(s): 25-32,57-64
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Mar 17, 2016 at 9:00 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
If you see, for the Base readings, there is a performance increase up till
64 clients and then there is a fall at 88 clients, which to me >indicates
that it hits very high-contention around CLogControlLock at 88 clients
which CLog patch is able to control to a great degree (if >required, I
think the same can be verified by LWLock stats data). One reason for
hitting contention at 88 clients is that this machine >seems to have
64-cores (it has 8 sockets and 8 Core(s) per socket) as per below
information of lscpu command.
I am attaching LWLock stats data for above test setups(unlogged tables).
*BASE* At 64 clients Block-time unit At 88 clients Block-time unit
ProcArrayLock 182946 117827
ClogControlLock 107420 120266
*clog patch*
ProcArrayLock 183663 121215
ClogControlLock 72806 65220
*clog patch + save snapshot*
ProcArrayLock 128260 83356
ClogControlLock 78921 74011
This is for unlogged lables, I mainly see ProcArrayLock have higher
contention at 64 clients and at 88 contention is slightly moved to other
entities.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Attachments:
lwlock details.odtapplication/vnd.oasis.opendocument.text; name="lwlock details.odt"Download
PK �LrH^�2'