LWLocks in DSM memory
Hi,
As already noted[1]/messages/by-id/CA+Tgmobjia49CCJ0ZazbWaVv7nKgYt+1Zo5CwxkH9Aahgn2vPg@mail.gmail.com, LWLocks don't currently work in DSM segments,
because they use dlist for the list of waiters. Even though all of
the waiter nodes are in PGPROC and therefore have stable addresses,
the dlist code internally constructs a circular list including
pointers to a special sentinel node inside the dlist_head object, and
those pointers may be invalid in other backends.
One solution could be to provide a non-circular variant of the dlist
interface that uses NULL list termination. I've attached a quick
sketch of something like that which seems to work correctly. It is
only lightly tested so far and probably buggy, but shows the general
idea.
Any thoughts on this approach, or better ways to solve this problem?
How valuable is the branch-free property of the circular push and
delete operations?
[1]: /messages/by-id/CA+Tgmobjia49CCJ0ZazbWaVv7nKgYt+1Zo5CwxkH9Aahgn2vPg@mail.gmail.com
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
lwlocks-in-dsm.patchapplication/octet-stream; name=lwlocks-in-dsm.patchDownload
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 7ffa87d..dbfddd9 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -717,7 +717,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
pg_atomic_init_u32(&lock->nwaiters, 0);
#endif
lock->tranche = tranche_id;
- dlist_init(&lock->waiters);
+ dlist_init_noncircular(&lock->waiters);
}
/*
@@ -930,14 +930,14 @@ LWLockWakeup(LWLock *lock)
/* lock wait list while collecting backends to wake up */
LWLockWaitListLock(lock);
- dlist_foreach_modify(iter, &lock->waiters)
+ dlist_foreach_modify_noncircular(iter, &lock->waiters)
{
PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
continue;
- dlist_delete(&waiter->lwWaitLink);
+ dlist_delete_noncircular(&lock->waiters, &waiter->lwWaitLink);
dlist_push_tail(&wakeup, &waiter->lwWaitLink);
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
@@ -1046,9 +1046,9 @@ LWLockQueueSelf(LWLock *lock, LWLockMode mode)
/* LW_WAIT_UNTIL_FREE waiters are always at the front of the queue */
if (mode == LW_WAIT_UNTIL_FREE)
- dlist_push_head(&lock->waiters, &MyProc->lwWaitLink);
+ dlist_push_head_noncircular(&lock->waiters, &MyProc->lwWaitLink);
else
- dlist_push_tail(&lock->waiters, &MyProc->lwWaitLink);
+ dlist_push_tail_noncircular(&lock->waiters, &MyProc->lwWaitLink);
/* Can release the mutex now */
LWLockWaitListUnlock(lock);
@@ -1086,14 +1086,14 @@ LWLockDequeueSelf(LWLock *lock)
* Can't just remove ourselves from the list, but we need to iterate over
* all entries as somebody else could have unqueued us.
*/
- dlist_foreach_modify(iter, &lock->waiters)
+ dlist_foreach_modify_noncircular(iter, &lock->waiters)
{
PGPROC *proc = dlist_container(PGPROC, lwWaitLink, iter.cur);
if (proc == MyProc)
{
found = true;
- dlist_delete(&proc->lwWaitLink);
+ dlist_delete_noncircular(&lock->waiters, &proc->lwWaitLink);
break;
}
}
@@ -1737,14 +1737,14 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
* See if there are any LW_WAIT_UNTIL_FREE waiters that need to be woken
* up. They are always in the front of the queue.
*/
- dlist_foreach_modify(iter, &lock->waiters)
+ dlist_foreach_modify_noncircular(iter, &lock->waiters)
{
PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
break;
- dlist_delete(&waiter->lwWaitLink);
+ dlist_delete_noncircular(&lock->waiters, &waiter->lwWaitLink);
dlist_push_tail(&wakeup, &waiter->lwWaitLink);
}
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index e189e4f..12edb05 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -25,6 +25,14 @@
* initialize list headers to zeroes rather than setting them up with an
* explicit initialization function, so we also allow the other case.
*
+ * By using the dlist_init_noncircular and then using only the _noncircular
+ * variants of the dlist functions, circularity can be avoided. This is
+ * useful for lists in dynamic shared memory where the dlist_head object may
+ * be mapped into different backends at different addresses, since the list is
+ * terminated by NULL pointers rather than pointers to the head object. Other
+ * nodes must still be mapped at the same address in each backend in this
+ * case.
+ *
* EXAMPLES
*
* Here's a simple example demonstrating how this can be used. Let's assume
@@ -281,6 +289,15 @@ dlist_init(dlist_head *head)
}
/*
+ * Initialize a doubly linked list with non-circular layout.
+ */
+static inline void
+dlist_init_noncircular(dlist_head *head)
+{
+ head->head.next = head->head.prev = NULL;
+}
+
+/*
* Is the list empty?
*
* An empty list has either its first 'next' pointer set to NULL, or to itself.
@@ -311,6 +328,34 @@ dlist_push_head(dlist_head *head, dlist_node *node)
}
/*
+ * Insert a node at the beginning of a list, without converting it to circular
+ * layout.
+ */
+static inline void
+dlist_push_head_noncircular(dlist_head *head, dlist_node *node)
+{
+ Assert(head->head.next != &head->head);
+ Assert(head->head.prev != &head->head);
+
+ if (head->head.next == NULL)
+ {
+ /* It was empty, and this node becomes the only entry. */
+ Assert(head->head.prev == NULL);
+ node->next = node->prev = NULL;
+ head->head.next = head->head.prev = node;
+ }
+ else
+ {
+ /* Insert before head. */
+ Assert(head->head.prev != NULL);
+ node->next = head->head.next;
+ node->next->prev = node;
+ node->prev = NULL;
+ head->head.next = node;
+ }
+}
+
+/*
* Insert a node at the end of the list.
*/
static inline void
@@ -328,6 +373,34 @@ dlist_push_tail(dlist_head *head, dlist_node *node)
}
/*
+ * Insert a node at the end of a list, without converting it to circular
+ * layout.
+ */
+static inline void
+dlist_push_tail_noncircular(dlist_head *head, dlist_node *node)
+{
+ Assert(head->head.next != &head->head);
+ Assert(head->head.prev != &head->head);
+
+ if (head->head.prev == NULL)
+ {
+ /* It was empty, and this node becomes the only entry. */
+ Assert(head->head.next == NULL);
+ node->next = node->prev = NULL;
+ head->head.next = head->head.prev = node;
+ }
+ else
+ {
+ /* Insert after tail. */
+ Assert(head->head.next != NULL);
+ node->prev = head->head.prev;
+ node->prev->next = node;
+ node->next = NULL;
+ head->head.prev = node;
+ }
+}
+
+/*
* Insert a node after another *in the same list*
*/
static inline void
@@ -362,6 +435,33 @@ dlist_delete(dlist_node *node)
}
/*
+ * Delete 'node' from 'list' (it must be in that list, which must be
+ * noncircular).
+ */
+static inline void
+dlist_delete_noncircular(dlist_head *head, dlist_node *node)
+{
+ Assert(head->head.next != &head->head);
+ Assert(head->head.prev != &head->head);
+
+ if (node->prev == NULL)
+ {
+ Assert(head->head.next == node);
+ head->head.next = node->next;
+ }
+ else
+ node->prev->next = node->next;
+
+ if (node->next == NULL)
+ {
+ Assert(head->head.prev == node);
+ head->head.prev = node->prev;
+ }
+ else
+ node->next->prev = node->prev;
+}
+
+/*
* Remove and return the first node from a list (there must be one).
*/
static inline dlist_node *
@@ -531,6 +631,18 @@ dlist_tail_node(dlist_head *head)
(iter).cur = (iter).next, (iter).next = (iter).cur->next)
/*
+ * Like dlist_foreach_modify, but for noncircular lists.
+ */
+#define dlist_foreach_modify_noncircular(iter, lhead) \
+ for (AssertVariableIsOfTypeMacro(iter, dlist_mutable_iter), \
+ AssertVariableIsOfTypeMacro(lhead, dlist_head *), \
+ (iter).cur = (lhead)->head.next, \
+ (iter).next = (iter).cur == NULL ? NULL : (iter).cur->next; \
+ (iter).cur != NULL; \
+ (iter).cur = (iter).next, \
+ (iter).next = (iter).cur == NULL ? NULL : (iter).cur->next)
+
+/*
* Iterate through the list in reverse order.
*
* It is *not* allowed to manipulate the list during iteration.
On Sun, Jul 24, 2016 at 1:10 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
One solution could be to provide a non-circular variant of the dlist
interface that uses NULL list termination. I've attached a quick
sketch of something like that which seems to work correctly. It is
only lightly tested so far and probably buggy, but shows the general
idea.
On reflection, it wouldn't make much sense to put "noncircular" in the
names of interfaces, as that is an internal detail. Maybe
"relocatable" or "position independent" or something like that, since
that's a user-visible property of the dlist_head. Here is a version
of the patch that uses _relocatable.
Any thoughts on this approach, or better ways to solve this problem?
How valuable is the branch-free property of the circular push and
delete operations?
I investigated this a bit. If I do this on my laptop (clang, no
asserts, -O2), it takes 3895 milliseconds, or 4.8ns per push/delete
operation:
dlist_init(&head);
for (i = 0; i < 100000000; ++i)
{
dlist_push_head(&head, &nodes[0]);
dlist_push_tail(&head, &nodes[1]);
dlist_push_head(&head, &nodes[2]);
dlist_push_tail(&head, &nodes[3]);
dlist_delete(&nodes[2]);
dlist_delete(&nodes[3]);
dlist_delete(&nodes[0]);
dlist_delete(&nodes[1]);
}
The relocatable version takes 5907 milliseconds, or 7.4ns per
push/delete operation:
dlist_init_relocatable(&head);
for (i = 0; i < 100000000; ++i)
{
dlist_push_head_relocatable(&head, &nodes[0]);
dlist_push_tail_relocatable(&head, &nodes[1]);
dlist_push_head_relocatable(&head, &nodes[2]);
dlist_push_tail_relocatable(&head, &nodes[3]);
dlist_delete_relocatable(&head, &nodes[2]);
dlist_delete_relocatable(&head, &nodes[3]);
dlist_delete_relocatable(&head, &nodes[0]);
dlist_delete_relocatable(&head, &nodes[1]);
}
Those operations are ~50% slower. So getting rid of dlist's clever
branch-free code generally seems like a bad idea.
Next I wondered if it would be a bad idea to use slower relocatable
dlist heads for all LWLocks. Obviously LWLocks are used all over the
place and it would be bad to slow them down, but I wondered if maybe
dlist manipulation might not be a very significant part of their work.
So I put a LWLock and a counter in shared memory, and had N background
workers run a tight loop that locks, increments and unlocks
concurrently until the counter reaches 1 billion. This creates
different degrees of contention and wait list sizes. The worker loop
looks like this:
while (!finished)
{
LWLockAcquire(&control->lock, LW_EXCLUSIVE);
++control->counter;
if (control->counter >= control->goal)
finished = true;
LWLockRelease(&control->lock);
}
I measured the following times for unpatched master, on my 4 core laptop:
16 workers = 73.067s, 74.869s, 75.338s
8 workers = 65.846s, 67.622s, 68.039s
4 workers = 68.763s, 68.980s, 69.035s <-- curiously slower here
3 workers = 59.701s, 59.991s, 60.133s
2 workers = 53.620s, 55.300s, 55.790s
1 worker = 21.578s, 21.535s, 21.598s
With the attached patched I got:
16 workers = 75.341s, 77.445s, 77.635s <- +3.4%
8 workers = 67.462s, 68.622s, 68.851s <- +1.4%
4 workers = 64.983s, 65.021s, 65.496s <- -5.7%
3 workers = 60.247s, 60.425s, 60.492s <- +0.7%
2 workers = 57.544s, 57.626s, 58.133s <- +2.3%
1 worker = 21.403s, 21.486s, 21.661s <- -0.2%
Somewhat noisy data and different effects at different numbers of workers.
I can post the source for those tests if anyone is interested. If you
have any other ideas for access patterns to test, or clever ways to
keep push and delete branch-free while also avoiding internal pointers
back to dlist_head, I'm all ears. Otherwise, if a change affecting
all LWLocks turns out to be unacceptable, maybe we would need to have
a different LWLock interface for relocatable LWLocks to make them
suitable for use in DSM segments. Any thoughts?
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
lwlocks-in-dsm-v2.patchapplication/octet-stream; name=lwlocks-in-dsm-v2.patchDownload
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 7ffa87d..e958407 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -717,7 +717,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
pg_atomic_init_u32(&lock->nwaiters, 0);
#endif
lock->tranche = tranche_id;
- dlist_init(&lock->waiters);
+ dlist_init_relocatable(&lock->waiters);
}
/*
@@ -930,14 +930,14 @@ LWLockWakeup(LWLock *lock)
/* lock wait list while collecting backends to wake up */
LWLockWaitListLock(lock);
- dlist_foreach_modify(iter, &lock->waiters)
+ dlist_foreach_modify_relocatable(iter, &lock->waiters)
{
PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
continue;
- dlist_delete(&waiter->lwWaitLink);
+ dlist_delete_relocatable(&lock->waiters, &waiter->lwWaitLink);
dlist_push_tail(&wakeup, &waiter->lwWaitLink);
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
@@ -1046,9 +1046,9 @@ LWLockQueueSelf(LWLock *lock, LWLockMode mode)
/* LW_WAIT_UNTIL_FREE waiters are always at the front of the queue */
if (mode == LW_WAIT_UNTIL_FREE)
- dlist_push_head(&lock->waiters, &MyProc->lwWaitLink);
+ dlist_push_head_relocatable(&lock->waiters, &MyProc->lwWaitLink);
else
- dlist_push_tail(&lock->waiters, &MyProc->lwWaitLink);
+ dlist_push_tail_relocatable(&lock->waiters, &MyProc->lwWaitLink);
/* Can release the mutex now */
LWLockWaitListUnlock(lock);
@@ -1086,14 +1086,14 @@ LWLockDequeueSelf(LWLock *lock)
* Can't just remove ourselves from the list, but we need to iterate over
* all entries as somebody else could have unqueued us.
*/
- dlist_foreach_modify(iter, &lock->waiters)
+ dlist_foreach_modify_relocatable(iter, &lock->waiters)
{
PGPROC *proc = dlist_container(PGPROC, lwWaitLink, iter.cur);
if (proc == MyProc)
{
found = true;
- dlist_delete(&proc->lwWaitLink);
+ dlist_delete_relocatable(&lock->waiters, &proc->lwWaitLink);
break;
}
}
@@ -1737,14 +1737,14 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
* See if there are any LW_WAIT_UNTIL_FREE waiters that need to be woken
* up. They are always in the front of the queue.
*/
- dlist_foreach_modify(iter, &lock->waiters)
+ dlist_foreach_modify_relocatable(iter, &lock->waiters)
{
PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
break;
- dlist_delete(&waiter->lwWaitLink);
+ dlist_delete_relocatable(&lock->waiters, &waiter->lwWaitLink);
dlist_push_tail(&wakeup, &waiter->lwWaitLink);
}
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index e189e4f..345cc61 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -25,6 +25,12 @@
* initialize list headers to zeroes rather than setting them up with an
* explicit initialization function, so we also allow the other case.
*
+ * By using the dlist_init_relocatable and then using only the _relocatable
+ * variants of the dlist functions, circularity can be avoided. This is
+ * useful for cases where a dlist_head object is mapped into different
+ * backends at different addresses, since the list is terminated by NULL
+ * pointers rather than pointers to the head object.
+ *
* EXAMPLES
*
* Here's a simple example demonstrating how this can be used. Let's assume
@@ -281,6 +287,15 @@ dlist_init(dlist_head *head)
}
/*
+ * Initialize a doubly linked list with non-circular layout.
+ */
+static inline void
+dlist_init_relocatable(dlist_head *head)
+{
+ head->head.next = head->head.prev = NULL;
+}
+
+/*
* Is the list empty?
*
* An empty list has either its first 'next' pointer set to NULL, or to itself.
@@ -311,6 +326,34 @@ dlist_push_head(dlist_head *head, dlist_node *node)
}
/*
+ * Insert a node at the beginning of a list, without converting it to circular
+ * layout.
+ */
+static inline void
+dlist_push_head_relocatable(dlist_head *head, dlist_node *node)
+{
+ Assert(head->head.next != &head->head);
+ Assert(head->head.prev != &head->head);
+
+ if (head->head.next == NULL)
+ {
+ /* It was empty, and this node becomes the only entry. */
+ Assert(head->head.prev == NULL);
+ node->next = node->prev = NULL;
+ head->head.next = head->head.prev = node;
+ }
+ else
+ {
+ /* Insert before head. */
+ Assert(head->head.prev != NULL);
+ node->next = head->head.next;
+ node->next->prev = node;
+ node->prev = NULL;
+ head->head.next = node;
+ }
+}
+
+/*
* Insert a node at the end of the list.
*/
static inline void
@@ -328,6 +371,34 @@ dlist_push_tail(dlist_head *head, dlist_node *node)
}
/*
+ * Insert a node at the end of a list, without converting it to circular
+ * layout.
+ */
+static inline void
+dlist_push_tail_relocatable(dlist_head *head, dlist_node *node)
+{
+ Assert(head->head.next != &head->head);
+ Assert(head->head.prev != &head->head);
+
+ if (head->head.prev == NULL)
+ {
+ /* It was empty, and this node becomes the only entry. */
+ Assert(head->head.next == NULL);
+ node->next = node->prev = NULL;
+ head->head.next = head->head.prev = node;
+ }
+ else
+ {
+ /* Insert after tail. */
+ Assert(head->head.next != NULL);
+ node->prev = head->head.prev;
+ node->prev->next = node;
+ node->next = NULL;
+ head->head.prev = node;
+ }
+}
+
+/*
* Insert a node after another *in the same list*
*/
static inline void
@@ -362,6 +433,33 @@ dlist_delete(dlist_node *node)
}
/*
+ * Delete 'node' from 'list' (it must be in that list, which must be
+ * noncircular).
+ */
+static inline void
+dlist_delete_relocatable(dlist_head *head, dlist_node *node)
+{
+ Assert(head->head.next != &head->head);
+ Assert(head->head.prev != &head->head);
+
+ if (node->prev == NULL)
+ {
+ Assert(head->head.next == node);
+ head->head.next = node->next;
+ }
+ else
+ node->prev->next = node->next;
+
+ if (node->next == NULL)
+ {
+ Assert(head->head.prev == node);
+ head->head.prev = node->prev;
+ }
+ else
+ node->next->prev = node->prev;
+}
+
+/*
* Remove and return the first node from a list (there must be one).
*/
static inline dlist_node *
@@ -531,6 +629,18 @@ dlist_tail_node(dlist_head *head)
(iter).cur = (iter).next, (iter).next = (iter).cur->next)
/*
+ * Like dlist_foreach_modify, but for noncircular lists.
+ */
+#define dlist_foreach_modify_relocatable(iter, lhead) \
+ for (AssertVariableIsOfTypeMacro(iter, dlist_mutable_iter), \
+ AssertVariableIsOfTypeMacro(lhead, dlist_head *), \
+ (iter).cur = (lhead)->head.next, \
+ (iter).next = (iter).cur == NULL ? NULL : (iter).cur->next; \
+ (iter).cur != NULL; \
+ (iter).cur = (iter).next, \
+ (iter).next = (iter).cur == NULL ? NULL : (iter).cur->next)
+
+/*
* Iterate through the list in reverse order.
*
* It is *not* allowed to manipulate the list during iteration.
On Mon, Jul 25, 2016 at 3:02 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
I measured the following times for unpatched master, on my 4 core laptop:
16 workers = 73.067s, 74.869s, 75.338s
8 workers = 65.846s, 67.622s, 68.039s
4 workers = 68.763s, 68.980s, 69.035s <-- curiously slower here
3 workers = 59.701s, 59.991s, 60.133s
2 workers = 53.620s, 55.300s, 55.790s
1 worker = 21.578s, 21.535s, 21.598sWith the attached patched I got:
16 workers = 75.341s, 77.445s, 77.635s <- +3.4%
8 workers = 67.462s, 68.622s, 68.851s <- +1.4%
4 workers = 64.983s, 65.021s, 65.496s <- -5.7%
3 workers = 60.247s, 60.425s, 60.492s <- +0.7%
2 workers = 57.544s, 57.626s, 58.133s <- +2.3%
1 worker = 21.403s, 21.486s, 21.661s <- -0.2%
Correction, that +2.3% for 2 workers should be +4.2%. And to clarify,
I ran the test 3 times as shown and those percentage changes are based
on the middle times.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jul 24, 2016 at 11:02 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Sun, Jul 24, 2016 at 1:10 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:One solution could be to provide a non-circular variant of the dlist
interface that uses NULL list termination. I've attached a quick
sketch of something like that which seems to work correctly. It is
only lightly tested so far and probably buggy, but shows the general
idea.On reflection, it wouldn't make much sense to put "noncircular" in the
names of interfaces, as that is an internal detail. Maybe
"relocatable" or "position independent" or something like that, since
that's a user-visible property of the dlist_head. Here is a version
of the patch that uses _relocatable.Any thoughts on this approach, or better ways to solve this problem?
How valuable is the branch-free property of the circular push and
delete operations?I investigated this a bit. If I do this on my laptop (clang, no
asserts, -O2), it takes 3895 milliseconds, or 4.8ns per push/delete
operation:dlist_init(&head);
for (i = 0; i < 100000000; ++i)
{
dlist_push_head(&head, &nodes[0]);
dlist_push_tail(&head, &nodes[1]);
dlist_push_head(&head, &nodes[2]);
dlist_push_tail(&head, &nodes[3]);
dlist_delete(&nodes[2]);
dlist_delete(&nodes[3]);
dlist_delete(&nodes[0]);
dlist_delete(&nodes[1]);
}The relocatable version takes 5907 milliseconds, or 7.4ns per
push/delete operation:dlist_init_relocatable(&head);
for (i = 0; i < 100000000; ++i)
{
dlist_push_head_relocatable(&head, &nodes[0]);
dlist_push_tail_relocatable(&head, &nodes[1]);
dlist_push_head_relocatable(&head, &nodes[2]);
dlist_push_tail_relocatable(&head, &nodes[3]);
dlist_delete_relocatable(&head, &nodes[2]);
dlist_delete_relocatable(&head, &nodes[3]);
dlist_delete_relocatable(&head, &nodes[0]);
dlist_delete_relocatable(&head, &nodes[1]);
}Those operations are ~50% slower. So getting rid of dlist's clever
branch-free code generally seems like a bad idea.Next I wondered if it would be a bad idea to use slower relocatable
dlist heads for all LWLocks. Obviously LWLocks are used all over the
place and it would be bad to slow them down, but I wondered if maybe
dlist manipulation might not be a very significant part of their work.
So I put a LWLock and a counter in shared memory, and had N background
workers run a tight loop that locks, increments and unlocks
concurrently until the counter reaches 1 billion. This creates
different degrees of contention and wait list sizes. The worker loop
looks like this:while (!finished)
{
LWLockAcquire(&control->lock, LW_EXCLUSIVE);
++control->counter;
if (control->counter >= control->goal)
finished = true;
LWLockRelease(&control->lock);
}I measured the following times for unpatched master, on my 4 core laptop:
16 workers = 73.067s, 74.869s, 75.338s
8 workers = 65.846s, 67.622s, 68.039s
4 workers = 68.763s, 68.980s, 69.035s <-- curiously slower here
3 workers = 59.701s, 59.991s, 60.133s
2 workers = 53.620s, 55.300s, 55.790s
1 worker = 21.578s, 21.535s, 21.598sWith the attached patched I got:
16 workers = 75.341s, 77.445s, 77.635s <- +3.4%
8 workers = 67.462s, 68.622s, 68.851s <- +1.4%
4 workers = 64.983s, 65.021s, 65.496s <- -5.7%
3 workers = 60.247s, 60.425s, 60.492s <- +0.7%
2 workers = 57.544s, 57.626s, 58.133s <- +2.3%
1 worker = 21.403s, 21.486s, 21.661s <- -0.2%Somewhat noisy data and different effects at different numbers of workers.
I can post the source for those tests if anyone is interested. If you
have any other ideas for access patterns to test, or clever ways to
keep push and delete branch-free while also avoiding internal pointers
back to dlist_head, I'm all ears. Otherwise, if a change affecting
all LWLocks turns out to be unacceptable, maybe we would need to have
a different LWLock interface for relocatable LWLocks to make them
suitable for use in DSM segments. Any thoughts?
Thanks for looking into this. Any theory on why this is slower in the
abstract but not slower, perhaps even faster, for LWLocks? I wonder
if it has to do with how likely it is for a list to be completely
empty.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2016-07-25 15:02:41 +1200, Thomas Munro wrote:
Any thoughts on this approach, or better ways to solve this problem?
How valuable is the branch-free property of the circular push and
delete operations?
I think it's generally valuable, but I suspect that it doesn't really
matter much for lwlocks, given that this is already the slow path, where
we enter the kernel and such. I suspect that the performance difference
here is a more due to code movement than anything.
Next I wondered if it would be a bad idea to use slower relocatable
dlist heads for all LWLocks. Obviously LWLocks are used all over the
place and it would be bad to slow them down, but I wondered if maybe
dlist manipulation might not be a very significant part of their work.
I'm pretty sure thats's the case.
So I put a LWLock and a counter in shared memory, and had N background
workers run a tight loop that locks, increments and unlocks
concurrently until the counter reaches 1 billion. This creates
different degrees of contention and wait list sizes. The worker loop
looks like this:while (!finished)
{
LWLockAcquire(&control->lock, LW_EXCLUSIVE);
++control->counter;
if (control->counter >= control->goal)
finished = true;
LWLockRelease(&control->lock);
}
You really need shared lwlocks here as well, because exclusive lwlocks
will only ever trigger a single list manipulation (basically a pop from
the wait queue), further minimizing the list manipulation overhead.
I think the better fix here would acutally be to get rid of a pointer
based list here, and a) replace the queue with integer offsets b) make
the queue lock-free. But I understand that you might not want to tackle
that :P
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 26, 2016 at 6:00 AM, Andres Freund <andres@anarazel.de> wrote:
I think the better fix here would acutally be to get rid of a pointer
based list here, and a) replace the queue with integer offsets ...
Here is a prototype of that idea. It replaces that dlist with a
proclist, a new specialised doubly-linked list for pgprocno values,
using INVALID_PGPROCNO for termination. It works with proclist_node
objects inside PGPROC at an arbitrary offset which must be provided
when you initialise the proclist.
It could be made more general than just the PGPROC array by taking the
base address and stride of the array, perhaps even allowing for a
different base address in each backend for arrays that live in DSM,
but I happened to see https://xkcd.com/974/ today and didn't pursue
that thought.
... b) make
the queue lock-free. But I understand that you might not want to tackle
that :P
This may be complicated by the not-strictly-queue-like access pattern.
I haven't looked into it yet, but I can see that it requires some
serious study (Harris or Zhang techniques, maybe with extra tagging to
avoid ABA problems on concurrent removal of the same node?, and if so
that might require a wider CAS than we have?).
I wonder if a proclist with an optional lock-free interface would also
be interesting for syncrep queues and others.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
lwlocks-in-dsm-v2.patchapplication/octet-stream; name=lwlocks-in-dsm-v2.patchDownload
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 7ffa87d..a408019 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -84,6 +84,7 @@
#include "storage/ipc.h"
#include "storage/predicate.h"
#include "storage/proc.h"
+#include "storage/proclist.h"
#include "storage/spin.h"
#include "utils/memutils.h"
@@ -717,7 +718,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
pg_atomic_init_u32(&lock->nwaiters, 0);
#endif
lock->tranche = tranche_id;
- dlist_init(&lock->waiters);
+ proclist_init(&lock->waiters, offsetof(PGPROC, lwWaitLink));
}
/*
@@ -920,25 +921,25 @@ LWLockWakeup(LWLock *lock)
{
bool new_release_ok;
bool wokeup_somebody = false;
- dlist_head wakeup;
- dlist_mutable_iter iter;
+ proclist_head wakeup;
+ proclist_mutable_iter iter;
- dlist_init(&wakeup);
+ proclist_init(&wakeup, offsetof(PGPROC, lwWaitLink));
new_release_ok = true;
/* lock wait list while collecting backends to wake up */
LWLockWaitListLock(lock);
- dlist_foreach_modify(iter, &lock->waiters)
+ proclist_foreach_modify(iter, &lock->waiters)
{
- PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ PGPROC *waiter = GetPGProcByNumber(iter.cur);
if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
continue;
- dlist_delete(&waiter->lwWaitLink);
- dlist_push_tail(&wakeup, &waiter->lwWaitLink);
+ proclist_delete(&lock->waiters, iter.cur);
+ proclist_push_tail(&wakeup, iter.cur);
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
{
@@ -963,7 +964,7 @@ LWLockWakeup(LWLock *lock)
break;
}
- Assert(dlist_is_empty(&wakeup) || pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS);
+ Assert(proclist_is_empty(&wakeup) || pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS);
/* unset required flags, and release lock, in one fell swoop */
{
@@ -982,7 +983,7 @@ LWLockWakeup(LWLock *lock)
else
desired_state &= ~LW_FLAG_RELEASE_OK;
- if (dlist_is_empty(&wakeup))
+ if (proclist_is_empty(&wakeup))
desired_state &= ~LW_FLAG_HAS_WAITERS;
desired_state &= ~LW_FLAG_LOCKED; /* release lock */
@@ -994,12 +995,12 @@ LWLockWakeup(LWLock *lock)
}
/* Awaken any waiters I removed from the queue. */
- dlist_foreach_modify(iter, &wakeup)
+ proclist_foreach_modify(iter, &wakeup)
{
- PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ PGPROC *waiter = GetPGProcByNumber(iter.cur);
LOG_LWDEBUG("LWLockRelease", lock, "release waiter");
- dlist_delete(&waiter->lwWaitLink);
+ proclist_delete(&wakeup, iter.cur);
/*
* Guarantee that lwWaiting being unset only becomes visible once the
@@ -1046,9 +1047,9 @@ LWLockQueueSelf(LWLock *lock, LWLockMode mode)
/* LW_WAIT_UNTIL_FREE waiters are always at the front of the queue */
if (mode == LW_WAIT_UNTIL_FREE)
- dlist_push_head(&lock->waiters, &MyProc->lwWaitLink);
+ proclist_push_head(&lock->waiters, MyProc->pgprocno);
else
- dlist_push_tail(&lock->waiters, &MyProc->lwWaitLink);
+ proclist_push_tail(&lock->waiters, MyProc->pgprocno);
/* Can release the mutex now */
LWLockWaitListUnlock(lock);
@@ -1070,7 +1071,7 @@ static void
LWLockDequeueSelf(LWLock *lock)
{
bool found = false;
- dlist_mutable_iter iter;
+ proclist_mutable_iter iter;
#ifdef LWLOCK_STATS
lwlock_stats *lwstats;
@@ -1086,19 +1087,17 @@ LWLockDequeueSelf(LWLock *lock)
* Can't just remove ourselves from the list, but we need to iterate over
* all entries as somebody else could have unqueued us.
*/
- dlist_foreach_modify(iter, &lock->waiters)
+ proclist_foreach_modify(iter, &lock->waiters)
{
- PGPROC *proc = dlist_container(PGPROC, lwWaitLink, iter.cur);
-
- if (proc == MyProc)
+ if (iter.cur == MyProc->pgprocno)
{
found = true;
- dlist_delete(&proc->lwWaitLink);
+ proclist_delete(&lock->waiters, iter.cur);
break;
}
}
- if (dlist_is_empty(&lock->waiters) &&
+ if (proclist_is_empty(&lock->waiters) &&
(pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) != 0)
{
pg_atomic_fetch_and_u32(&lock->state, ~LW_FLAG_HAS_WAITERS);
@@ -1719,12 +1718,12 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
void
LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
{
- dlist_head wakeup;
- dlist_mutable_iter iter;
+ proclist_head wakeup;
+ proclist_mutable_iter iter;
PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE);
- dlist_init(&wakeup);
+ proclist_init(&wakeup, offsetof(PGPROC, lwWaitLink));
LWLockWaitListLock(lock);
@@ -1737,15 +1736,15 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
* See if there are any LW_WAIT_UNTIL_FREE waiters that need to be woken
* up. They are always in the front of the queue.
*/
- dlist_foreach_modify(iter, &lock->waiters)
+ proclist_foreach_modify(iter, &lock->waiters)
{
- PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ PGPROC *waiter = GetPGProcByNumber(iter.cur);
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
break;
- dlist_delete(&waiter->lwWaitLink);
- dlist_push_tail(&wakeup, &waiter->lwWaitLink);
+ proclist_delete(&lock->waiters, iter.cur);
+ proclist_push_tail(&wakeup, iter.cur);
}
/* We are done updating shared state of the lock itself. */
@@ -1754,11 +1753,11 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
/*
* Awaken any waiters I removed from the queue.
*/
- dlist_foreach_modify(iter, &wakeup)
+ proclist_foreach_modify(iter, &wakeup)
{
- PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ PGPROC *waiter = GetPGProcByNumber(iter.cur);
- dlist_delete(&waiter->lwWaitLink);
+ proclist_delete(&wakeup, iter.cur);
/* check comment in LWLockWakeup() about this barrier */
pg_write_barrier();
waiter->lwWaiting = false;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3db11e4..959f5f1 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -18,7 +18,7 @@
#error "lwlock.h may not be included from frontend code"
#endif
-#include "lib/ilist.h"
+#include "storage/proclist_types.h"
#include "storage/s_lock.h"
#include "port/atomics.h"
@@ -59,7 +59,7 @@ typedef struct LWLock
{
uint16 tranche; /* tranche ID */
pg_atomic_uint32 state; /* state of exclusive/nonexclusive lockers */
- dlist_head waiters; /* list of waiting PGPROCs */
+ proclist_head waiters; /* list of waiting PGPROCs */
#ifdef LOCK_DEBUG
pg_atomic_uint32 nwaiters; /* number of waiters */
struct PGPROC *owner; /* last exclusive owner of the lock */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 775c66a..87a8e4c 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -19,6 +19,8 @@
#include "storage/latch.h"
#include "storage/lock.h"
#include "storage/pg_sema.h"
+#include "storage/proclist_types.h"
+#include "storage/shmem.h"
/*
* Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
@@ -112,7 +114,7 @@ struct PGPROC
/* Info about LWLock the process is currently waiting for, if any. */
bool lwWaiting; /* true if waiting for an LW lock */
uint8 lwWaitMode; /* lwlock mode being waited for */
- dlist_node lwWaitLink; /* position in LW lock wait list */
+ proclist_node lwWaitLink; /* position in LW lock wait list */
/* Info about lock the process is currently waiting for, if any. */
/* waitLock and waitProcLock are NULL if not currently waiting. */
@@ -243,6 +245,9 @@ extern PROC_HDR *ProcGlobal;
extern PGPROC *PreparedXactProcs;
+/* Accessor for PGPROC given a pgprocno. */
+#define GetPGProcByNumber(n) (&ProcGlobal->allProcs[(n)])
+
/*
* We set aside some extra PGPROC structures for auxiliary processes,
* ie things that aren't full-fledged backends but need shmem access.
diff --git a/src/include/storage/proclist.h b/src/include/storage/proclist.h
new file mode 100644
index 0000000..32569fd
--- /dev/null
+++ b/src/include/storage/proclist.h
@@ -0,0 +1,142 @@
+/*-------------------------------------------------------------------------
+ *
+ * proclist.h
+ * operations on doubly-linked lists of pgprocnos
+ *
+ * The interface is modelled on dlist from ilist.h, but uses pgprocno
+ * instead of pointers and doesn't use a circular list, so that a
+ * proclist_head object can be mapped at different addresses in different
+ * backends.
+ *
+ * See proclist_types.h for the structs that these functions operate on. They
+ * are separated to break a header dependency cycle with proc.h.
+ *
+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/include/storage/proclist.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PROCLIST_H
+#define PROCLIST_H
+
+#include "storage/proc.h"
+#include "storage/proclist_types.h"
+
+/*
+ * Initialize a proclist. The offset to the appropriate proclist_node within
+ * PGPROC should be provided as offsetof(PGPROC, <member>).
+ */
+static inline void
+proclist_init(proclist_head *list, size_t offset)
+{
+ list->head = list->tail = INVALID_PGPROCNO;
+ list->offset = offset;
+}
+
+/*
+ * Is the list empty?
+ */
+static inline bool
+proclist_is_empty(proclist_head *list)
+{
+ return list->head == INVALID_PGPROCNO;
+}
+
+/*
+ * Get a pointer to the appropriate proclist_node inside a PGPROC object,
+ * given a procno.
+ */
+static inline proclist_node *
+proclist_node_get(proclist_head *list, int procno)
+{
+ char *entry = (char *) GetPGProcByNumber(procno);
+
+ return (proclist_node *) (entry + list->offset);
+}
+
+/*
+ * Insert a node at the beginning of a list.
+ */
+static inline void
+proclist_push_head(proclist_head *list, int procno)
+{
+ proclist_node *node = proclist_node_get(list, procno);
+
+ if (list->head == INVALID_PGPROCNO)
+ {
+ Assert(list->tail == INVALID_PGPROCNO);
+ node->next = node->prev = INVALID_PGPROCNO;
+ list->head = list->tail = procno;
+ }
+ else
+ {
+ Assert(list->tail != INVALID_PGPROCNO);
+ node->next = list->head;
+ proclist_node_get(list, node->next)->prev = procno;
+ node->prev = INVALID_PGPROCNO;
+ list->head = procno;
+ }
+}
+
+/*
+ * Insert a node a the end of a list.
+ */
+static inline void
+proclist_push_tail(proclist_head *list, int procno)
+{
+ proclist_node *node = proclist_node_get(list, procno);
+
+ if (list->tail == INVALID_PGPROCNO)
+ {
+ Assert(list->head == INVALID_PGPROCNO);
+ node->next = node->prev = INVALID_PGPROCNO;
+ list->head = list->tail = procno;
+ }
+ else
+ {
+ Assert(list->head != INVALID_PGPROCNO);
+ node->prev = list->tail;
+ proclist_node_get(list, node->prev)->next = procno;
+ node->next = INVALID_PGPROCNO;
+ list->tail = procno;
+ }
+}
+
+/*
+ * Delete a node. The node must be in the list.
+ */
+static inline void
+proclist_delete(proclist_head *list, int procno)
+{
+ proclist_node *node = proclist_node_get(list, procno);
+
+ if (node->prev == INVALID_PGPROCNO)
+ list->head = node->next;
+ else
+ proclist_node_get(list, node->prev)->next = node->next;
+
+ if (node->next == INVALID_PGPROCNO)
+ list->tail = node->prev;
+ else
+ proclist_node_get(list, node->next)->prev = node->prev;
+}
+
+/*
+ * Iterate through the list pointed at by 'lhead', storing the current
+ * position in 'iter'. Access the current position with iter.cur.
+ *
+ * The only list modification allowed while iterating is deleting the current
+ * node with proclist_delete(list, iter.cur).
+ */
+#define proclist_foreach_modify(iter, lhead) \
+ for (AssertVariableIsOfTypeMacro(iter, proclist_mutable_iter), \
+ AssertVariableIsOfTypeMacro(lhead, proclist_head *), \
+ (iter).cur = (lhead)->head, \
+ (iter).next = (iter).cur == INVALID_PGPROCNO ? INVALID_PGPROCNO : \
+ proclist_node_get(lhead, (iter).cur)->next; \
+ (iter).cur != INVALID_PGPROCNO; \
+ (iter).cur = (iter).next, \
+ (iter).next = proclist_node_get(lhead, (iter).cur)->next) \
+
+#endif
diff --git a/src/include/storage/proclist_types.h b/src/include/storage/proclist_types.h
new file mode 100644
index 0000000..3b3ca99
--- /dev/null
+++ b/src/include/storage/proclist_types.h
@@ -0,0 +1,48 @@
+/*-------------------------------------------------------------------------
+ *
+ * proclist_types.h
+ * doubly-linked lists of pgprocnos
+ *
+ * See proclist.h for functions that operate on these types.
+ *
+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/include/storage/proclist_types.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PROCLIST_TYPES_H
+#define PROCLIST_TYPES_H
+
+#include <stddef.h>
+
+/*
+ * A node in a list of processes.
+ */
+typedef struct proclist_node
+{
+ int next; /* pgprocno of the next PGPROC */
+ int prev; /* pgprocno of the prev PGPROC */
+} proclist_node;
+
+/*
+ * Head of a doubly-linked list of PGPROCs, identified by pgprocno.
+ */
+typedef struct proclist_head
+{
+ size_t offset; /* offset in PGPROC of the proclist node */
+ int head; /* pgprocno of the head PGPROC */
+ int tail; /* pgprocno of the tail PGPROC */
+} proclist_head;
+
+/*
+ * List iterator allowing some modifications while iterating.
+ */
+typedef struct proclist_mutable_iter
+{
+ int cur; /* pgprocno of the current PGPROC */
+ int next; /* pgprocno of the next PGPROC */
+} proclist_mutable_iter;
+
+#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c61fc2..ae19379 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2660,6 +2660,9 @@ printTextRule
priv_map
process_file_callback_t
process_sublinks_context
+proclist_head
+proclist_mutable_iter
+proclist_node
promptStatus_t
pthread_attr_t
pthread_key_t
On Thu, Jul 28, 2016 at 12:12 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Tue, Jul 26, 2016 at 6:00 AM, Andres Freund <andres@anarazel.de> wrote:
I think the better fix here would acutally be to get rid of a pointer
based list here, and a) replace the queue with integer offsets ...Here is a prototype of that idea. It replaces that dlist with a
proclist, a new specialised doubly-linked list for pgprocno values,
using INVALID_PGPROCNO for termination. It works with proclist_node
objects inside PGPROC at an arbitrary offset which must be provided
when you initialise the proclist.
Aside from the fact that this allows LWLocks inside DSM segments,
which I definitely want to support, this seems to have the nice
property of reducing the size of an LWLock by 8 bytes. We need to
consider what to do about LWLOCK_MINIMAL_SIZE. We could (a) decide to
still pad all arrays of LWLocks to 32 bytes per LWLock so as to reduce
false sharing, and rename this constant not to imply minimality; or
(b) alter the macro definition to allow for 16 bytes as a possible
result.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 29, 2016 at 2:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 28, 2016 at 12:12 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Tue, Jul 26, 2016 at 6:00 AM, Andres Freund <andres@anarazel.de> wrote:
I think the better fix here would acutally be to get rid of a pointer
based list here, and a) replace the queue with integer offsets ...Here is a prototype of that idea. It replaces that dlist with a
proclist, a new specialised doubly-linked list for pgprocno values,
using INVALID_PGPROCNO for termination. It works with proclist_node
objects inside PGPROC at an arbitrary offset which must be provided
when you initialise the proclist.Aside from the fact that this allows LWLocks inside DSM segments,
which I definitely want to support, this seems to have the nice
property of reducing the size of an LWLock by 8 bytes. We need to
consider what to do about LWLOCK_MINIMAL_SIZE. We could (a) decide to
still pad all arrays of LWLocks to 32 bytes per LWLock so as to reduce
false sharing, and rename this constant not to imply minimality; or
(b) alter the macro definition to allow for 16 bytes as a possible
result.
That version didn't actually make LWLock any smaller, because of the
additional offset stored in proclist_head when initialising the list.
Here is a new version that requires callers to provide it when
accessing the list, bringing sizeof(LWLock) down to 16 on LP64/LLP64
systems. In the spirit of the dlist_container macro, I added macros
so you can name the link member rather than providing its offset:
proclist_push_tail(&list, procno, lwWaitLink).
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
lwlocks-in-dsm-v3.patchapplication/octet-stream; name=lwlocks-in-dsm-v3.patchDownload
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 7ffa87d..303e99c 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -84,6 +84,7 @@
#include "storage/ipc.h"
#include "storage/predicate.h"
#include "storage/proc.h"
+#include "storage/proclist.h"
#include "storage/spin.h"
#include "utils/memutils.h"
@@ -717,7 +718,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
pg_atomic_init_u32(&lock->nwaiters, 0);
#endif
lock->tranche = tranche_id;
- dlist_init(&lock->waiters);
+ proclist_init(&lock->waiters);
}
/*
@@ -920,25 +921,25 @@ LWLockWakeup(LWLock *lock)
{
bool new_release_ok;
bool wokeup_somebody = false;
- dlist_head wakeup;
- dlist_mutable_iter iter;
+ proclist_head wakeup;
+ proclist_mutable_iter iter;
- dlist_init(&wakeup);
+ proclist_init(&wakeup);
new_release_ok = true;
/* lock wait list while collecting backends to wake up */
LWLockWaitListLock(lock);
- dlist_foreach_modify(iter, &lock->waiters)
+ proclist_foreach_modify(iter, &lock->waiters, lwWaitLink)
{
- PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ PGPROC *waiter = GetPGProcByNumber(iter.cur);
if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
continue;
- dlist_delete(&waiter->lwWaitLink);
- dlist_push_tail(&wakeup, &waiter->lwWaitLink);
+ proclist_delete(&lock->waiters, iter.cur, lwWaitLink);
+ proclist_push_tail(&wakeup, iter.cur, lwWaitLink);
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
{
@@ -963,7 +964,7 @@ LWLockWakeup(LWLock *lock)
break;
}
- Assert(dlist_is_empty(&wakeup) || pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS);
+ Assert(proclist_is_empty(&wakeup) || pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS);
/* unset required flags, and release lock, in one fell swoop */
{
@@ -982,7 +983,7 @@ LWLockWakeup(LWLock *lock)
else
desired_state &= ~LW_FLAG_RELEASE_OK;
- if (dlist_is_empty(&wakeup))
+ if (proclist_is_empty(&wakeup))
desired_state &= ~LW_FLAG_HAS_WAITERS;
desired_state &= ~LW_FLAG_LOCKED; /* release lock */
@@ -994,12 +995,12 @@ LWLockWakeup(LWLock *lock)
}
/* Awaken any waiters I removed from the queue. */
- dlist_foreach_modify(iter, &wakeup)
+ proclist_foreach_modify(iter, &wakeup, lwWaitLink)
{
- PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ PGPROC *waiter = GetPGProcByNumber(iter.cur);
LOG_LWDEBUG("LWLockRelease", lock, "release waiter");
- dlist_delete(&waiter->lwWaitLink);
+ proclist_delete(&wakeup, iter.cur, lwWaitLink);
/*
* Guarantee that lwWaiting being unset only becomes visible once the
@@ -1046,9 +1047,9 @@ LWLockQueueSelf(LWLock *lock, LWLockMode mode)
/* LW_WAIT_UNTIL_FREE waiters are always at the front of the queue */
if (mode == LW_WAIT_UNTIL_FREE)
- dlist_push_head(&lock->waiters, &MyProc->lwWaitLink);
+ proclist_push_head(&lock->waiters, MyProc->pgprocno, lwWaitLink);
else
- dlist_push_tail(&lock->waiters, &MyProc->lwWaitLink);
+ proclist_push_tail(&lock->waiters, MyProc->pgprocno, lwWaitLink);
/* Can release the mutex now */
LWLockWaitListUnlock(lock);
@@ -1070,7 +1071,7 @@ static void
LWLockDequeueSelf(LWLock *lock)
{
bool found = false;
- dlist_mutable_iter iter;
+ proclist_mutable_iter iter;
#ifdef LWLOCK_STATS
lwlock_stats *lwstats;
@@ -1086,19 +1087,17 @@ LWLockDequeueSelf(LWLock *lock)
* Can't just remove ourselves from the list, but we need to iterate over
* all entries as somebody else could have unqueued us.
*/
- dlist_foreach_modify(iter, &lock->waiters)
+ proclist_foreach_modify(iter, &lock->waiters, lwWaitLink)
{
- PGPROC *proc = dlist_container(PGPROC, lwWaitLink, iter.cur);
-
- if (proc == MyProc)
+ if (iter.cur == MyProc->pgprocno)
{
found = true;
- dlist_delete(&proc->lwWaitLink);
+ proclist_delete(&lock->waiters, iter.cur, lwWaitLink);
break;
}
}
- if (dlist_is_empty(&lock->waiters) &&
+ if (proclist_is_empty(&lock->waiters) &&
(pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) != 0)
{
pg_atomic_fetch_and_u32(&lock->state, ~LW_FLAG_HAS_WAITERS);
@@ -1719,12 +1718,12 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
void
LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
{
- dlist_head wakeup;
- dlist_mutable_iter iter;
+ proclist_head wakeup;
+ proclist_mutable_iter iter;
PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE);
- dlist_init(&wakeup);
+ proclist_init(&wakeup);
LWLockWaitListLock(lock);
@@ -1737,15 +1736,15 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
* See if there are any LW_WAIT_UNTIL_FREE waiters that need to be woken
* up. They are always in the front of the queue.
*/
- dlist_foreach_modify(iter, &lock->waiters)
+ proclist_foreach_modify(iter, &lock->waiters, lwWaitLink)
{
- PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ PGPROC *waiter = GetPGProcByNumber(iter.cur);
if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
break;
- dlist_delete(&waiter->lwWaitLink);
- dlist_push_tail(&wakeup, &waiter->lwWaitLink);
+ proclist_delete(&lock->waiters, iter.cur, lwWaitLink);
+ proclist_push_tail(&wakeup, iter.cur, lwWaitLink);
}
/* We are done updating shared state of the lock itself. */
@@ -1754,11 +1753,11 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
/*
* Awaken any waiters I removed from the queue.
*/
- dlist_foreach_modify(iter, &wakeup)
+ proclist_foreach_modify(iter, &wakeup, lwWaitLink)
{
- PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
+ PGPROC *waiter = GetPGProcByNumber(iter.cur);
- dlist_delete(&waiter->lwWaitLink);
+ proclist_delete(&wakeup, iter.cur, lwWaitLink);
/* check comment in LWLockWakeup() about this barrier */
pg_write_barrier();
waiter->lwWaiting = false;
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3db11e4..959f5f1 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -18,7 +18,7 @@
#error "lwlock.h may not be included from frontend code"
#endif
-#include "lib/ilist.h"
+#include "storage/proclist_types.h"
#include "storage/s_lock.h"
#include "port/atomics.h"
@@ -59,7 +59,7 @@ typedef struct LWLock
{
uint16 tranche; /* tranche ID */
pg_atomic_uint32 state; /* state of exclusive/nonexclusive lockers */
- dlist_head waiters; /* list of waiting PGPROCs */
+ proclist_head waiters; /* list of waiting PGPROCs */
#ifdef LOCK_DEBUG
pg_atomic_uint32 nwaiters; /* number of waiters */
struct PGPROC *owner; /* last exclusive owner of the lock */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 775c66a..f576f05 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -19,6 +19,7 @@
#include "storage/latch.h"
#include "storage/lock.h"
#include "storage/pg_sema.h"
+#include "storage/proclist_types.h"
/*
* Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
@@ -112,7 +113,7 @@ struct PGPROC
/* Info about LWLock the process is currently waiting for, if any. */
bool lwWaiting; /* true if waiting for an LW lock */
uint8 lwWaitMode; /* lwlock mode being waited for */
- dlist_node lwWaitLink; /* position in LW lock wait list */
+ proclist_node lwWaitLink; /* position in LW lock wait list */
/* Info about lock the process is currently waiting for, if any. */
/* waitLock and waitProcLock are NULL if not currently waiting. */
@@ -243,6 +244,9 @@ extern PROC_HDR *ProcGlobal;
extern PGPROC *PreparedXactProcs;
+/* Accessor for PGPROC given a pgprocno. */
+#define GetPGProcByNumber(n) (&ProcGlobal->allProcs[(n)])
+
/*
* We set aside some extra PGPROC structures for auxiliary processes,
* ie things that aren't full-fledged backends but need shmem access.
diff --git a/src/include/storage/proclist.h b/src/include/storage/proclist.h
new file mode 100644
index 0000000..2013a40
--- /dev/null
+++ b/src/include/storage/proclist.h
@@ -0,0 +1,154 @@
+/*-------------------------------------------------------------------------
+ *
+ * proclist.h
+ * operations on doubly-linked lists of pgprocnos
+ *
+ * The interface is similar to dlist from ilist.h, but uses pgprocno instead
+ * of pointers. This allows proclist_head to be mapped at different addresses
+ * in different backends.
+ *
+ * See proclist_types.h for the structs that these functions operate on. They
+ * are separated to break a header dependency cycle with proc.h.
+ *
+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/include/storage/proclist.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PROCLIST_H
+#define PROCLIST_H
+
+#include "storage/proc.h"
+#include "storage/proclist_types.h"
+
+/*
+ * Initialize a proclist.
+ */
+static inline void
+proclist_init(proclist_head *list)
+{
+ list->head = list->tail = INVALID_PGPROCNO;
+}
+
+/*
+ * Is the list empty?
+ */
+static inline bool
+proclist_is_empty(proclist_head *list)
+{
+ return list->head == INVALID_PGPROCNO;
+}
+
+/*
+ * Get a pointer to a proclist_node inside a given PGPROC, given a procno and
+ * an offset.
+ */
+static inline proclist_node *
+proclist_node_get(int procno, size_t node_offset)
+{
+ char *entry = (char *) GetPGProcByNumber(procno);
+
+ return (proclist_node *) (entry + node_offset);
+}
+
+/*
+ * Insert a node at the beginning of a list.
+ */
+static inline void
+proclist_push_head_offset(proclist_head *list, int procno, size_t node_offset)
+{
+ proclist_node *node = proclist_node_get(procno, node_offset);
+
+ if (list->head == INVALID_PGPROCNO)
+ {
+ Assert(list->tail == INVALID_PGPROCNO);
+ node->next = node->prev = INVALID_PGPROCNO;
+ list->head = list->tail = procno;
+ }
+ else
+ {
+ Assert(list->tail != INVALID_PGPROCNO);
+ node->next = list->head;
+ proclist_node_get(node->next, node_offset)->prev = procno;
+ node->prev = INVALID_PGPROCNO;
+ list->head = procno;
+ }
+}
+
+/*
+ * Insert a node a the end of a list.
+ */
+static inline void
+proclist_push_tail_offset(proclist_head *list, int procno, size_t node_offset)
+{
+ proclist_node *node = proclist_node_get(procno, node_offset);
+
+ if (list->tail == INVALID_PGPROCNO)
+ {
+ Assert(list->head == INVALID_PGPROCNO);
+ node->next = node->prev = INVALID_PGPROCNO;
+ list->head = list->tail = procno;
+ }
+ else
+ {
+ Assert(list->head != INVALID_PGPROCNO);
+ node->prev = list->tail;
+ proclist_node_get(node->prev, node_offset)->next = procno;
+ node->next = INVALID_PGPROCNO;
+ list->tail = procno;
+ }
+}
+
+/*
+ * Delete a node. The node must be in the list.
+ */
+static inline void
+proclist_delete_offset(proclist_head *list, int procno, size_t node_offset)
+{
+ proclist_node *node = proclist_node_get(procno, node_offset);
+
+ if (node->prev == INVALID_PGPROCNO)
+ list->head = node->next;
+ else
+ proclist_node_get(node->prev, node_offset)->next = node->next;
+
+ if (node->next == INVALID_PGPROCNO)
+ list->tail = node->prev;
+ else
+ proclist_node_get(node->next, node_offset)->prev = node->prev;
+}
+
+/*
+ * Helper macros to avoid repetition of offsetof(PGPROC, <member>).
+ * 'link_member' is the name of a proclist_node member in PGPROC.
+ */
+#define proclist_delete(list, procno, link_member) \
+ proclist_delete_offset((list), (procno), offsetof(PGPROC, link_member))
+#define proclist_push_head(list, procno, link_member) \
+ proclist_push_head_offset((list), (procno), offsetof(PGPROC, link_member))
+#define proclist_push_tail(list, procno, link_member) \
+ proclist_push_tail_offset((list), (procno), offsetof(PGPROC, link_member))
+
+/*
+ * Iterate through the list pointed at by 'lhead', storing the current
+ * position in 'iter'. 'link_member' is the name of a proclist_node member in
+ * PGPROC. Access the current position with iter.cur.
+ *
+ * The only list modification allowed while iterating is deleting the current
+ * node with proclist_delete(list, iter.cur, node_offset).
+ */
+#define proclist_foreach_modify(iter, lhead, link_member) \
+ for (AssertVariableIsOfTypeMacro(iter, proclist_mutable_iter), \
+ AssertVariableIsOfTypeMacro(lhead, proclist_head *), \
+ (iter).cur = (lhead)->head, \
+ (iter).next = (iter).cur == INVALID_PGPROCNO ? INVALID_PGPROCNO : \
+ proclist_node_get((iter).cur, \
+ offsetof(PGPROC, link_member))->next; \
+ (iter).cur != INVALID_PGPROCNO; \
+ (iter).cur = (iter).next, \
+ (iter).next = (iter).cur == INVALID_PGPROCNO ? INVALID_PGPROCNO : \
+ proclist_node_get((iter).cur, \
+ offsetof(PGPROC, link_member))->next)
+
+#endif
diff --git a/src/include/storage/proclist_types.h b/src/include/storage/proclist_types.h
new file mode 100644
index 0000000..5f0adb9
--- /dev/null
+++ b/src/include/storage/proclist_types.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * proclist_types.h
+ * doubly-linked lists of pgprocnos
+ *
+ * See proclist.h for functions that operate on these types.
+ *
+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/include/storage/proclist_types.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PROCLIST_TYPES_H
+#define PROCLIST_TYPES_H
+
+#include <stddef.h>
+
+/*
+ * A node in a list of processes.
+ */
+typedef struct proclist_node
+{
+ int next; /* pgprocno of the next PGPROC */
+ int prev; /* pgprocno of the prev PGPROC */
+} proclist_node;
+
+/*
+ * Head of a doubly-linked list of PGPROCs, identified by pgprocno.
+ */
+typedef struct proclist_head
+{
+ int head; /* pgprocno of the head PGPROC */
+ int tail; /* pgprocno of the tail PGPROC */
+} proclist_head;
+
+/*
+ * List iterator allowing some modifications while iterating.
+ */
+typedef struct proclist_mutable_iter
+{
+ int cur; /* pgprocno of the current PGPROC */
+ int next; /* pgprocno of the next PGPROC */
+} proclist_mutable_iter;
+
+#endif
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c61fc2..ae19379 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2660,6 +2660,9 @@ printTextRule
priv_map
process_file_callback_t
process_sublinks_context
+proclist_head
+proclist_mutable_iter
+proclist_node
promptStatus_t
pthread_attr_t
pthread_key_t
On Thu, Jul 28, 2016 at 6:51 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
That version didn't actually make LWLock any smaller, because of the
additional offset stored in proclist_head when initialising the list.
Here is a new version that requires callers to provide it when
accessing the list, bringing sizeof(LWLock) down to 16 on LP64/LLP64
systems. In the spirit of the dlist_container macro, I added macros
so you can name the link member rather than providing its offset:
proclist_push_tail(&list, procno, lwWaitLink).
I have reviewed this patch. I like the design and overall I would say
that the patch looks quite elegant. The only problem I found is that
proclist_types.h doesn't seem to need to #include <stddef.h>. I tried
taking that out, and, at least on my machine, it still compiles just
fine.
Of course, performance is a concern, as with any change that touches
deep internals. Andres and Thomas seem to be in agreement that this
patch should cause a performance loss but that it should be
undetectable in practice, since only the slow path where we're
entering the kernel anyway is affected. I agree with that conclusion.
I spent a little bit of time trying to think of a case where this
would have a measurable impact, and I'm not coming up with anything.
We've done a lot of work over the last few releases to try to
eliminate LWLock contention, and the only way you could even
theoretically see an impact from this is if you can come up with a
workload with furious LWLock contention so that there is lots and lots
of linked-list manipulation going on inside lwlock.c. But I don't
really know of a workload that behaves that way, and even if I did, I
think the number of cycles we're talking about here is too small to be
measurable. Having to put processes to sleep is going to be multiple
orders of magnitude more costly than any possible details of how we
handle the linked list manipulation.
I also agree with the goal of the patch: the reason why I developed
the idea of LWLock tranches was with the goal of being able to put
LWLocks inside DSM segments, and that worked fine up until Andres
changed lwlock.c to use dlists. This change will get us back to being
able to use LWLocks inside of DSM segments. Of course, we have no
code like that today; if we did, it would be broken. But Thomas will
be submitting code that does exactly that in the near future, and I
suspect other people will want to do it, too. As a fringe benefit, I
have another patch I'm working on that can make use of this proclist
infrastructure.
Therefore, I plan to commit this patch, removing the #include
<stddef.h> unless someone convinces me we need it, shortly after
development for v10 opens, unless there are objections before then.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 11, 2016 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Therefore, I plan to commit this patch, removing the #include
<stddef.h> unless someone convinces me we need it, shortly after
development for v10 opens, unless there are objections before then.
Hearing no objections, done.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-15 18:15:23 -0400, Robert Haas wrote:
On Thu, Aug 11, 2016 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Therefore, I plan to commit this patch, removing the #include
<stddef.h> unless someone convinces me we need it, shortly after
development for v10 opens, unless there are objections before then.Hearing no objections, done.
I'd have objected, if I hadn't been on vacation. While I intuitively
*do* think that the increased wait-list overhead won't be relevant, I
also know that my intuition has frequently been wrong around the lwlock
code. This needs some benchmarks on a 4+ socket machine,
first. Something exercising the slow path obviously. E.g. a pgbench with
a small number of writers, and a large number of writers.
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 16, 2016 at 5:03 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-15 18:15:23 -0400, Robert Haas wrote:
On Thu, Aug 11, 2016 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Therefore, I plan to commit this patch, removing the #include
<stddef.h> unless someone convinces me we need it, shortly after
development for v10 opens, unless there are objections before then.Hearing no objections, done.
I'd have objected, if I hadn't been on vacation. While I intuitively
*do* think that the increased wait-list overhead won't be relevant, I
also know that my intuition has frequently been wrong around the lwlock
code. This needs some benchmarks on a 4+ socket machine,
first. Something exercising the slow path obviously. E.g. a pgbench with
a small number of writers, and a large number of writers.
I have to admit that I totally blanked about you being on vacation.
Thanks for mentioning the workload you think might be adversely
affected, but to be honest, even if there's some workload where this
causes a small regression, I'm not really sure what you think we
should do instead. Should we have a separate copy of lwlock.c just
for parallel query and other stuff that uses DSM? Won't that slow
down every error-handling path in the system, if they all have to
release two kinds of lwlocks rather than one? And bloat the binary?
Or are you going to argue that parallel query doesn't really need
LWLocks? I'm sure that's not true. We got by without it for this
release, but that's because the only truly parallel operation as yet
is Parallel Seq Scan whose requirements are simple enough to be
handled with a spinlock.
Anyway, I guess we should wait for the benchmark results and then see,
but if we're not going to do this then we need some reasonable
alternative.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 16, 2016 at 5:03 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-15 18:15:23 -0400, Robert Haas wrote:
On Thu, Aug 11, 2016 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Therefore, I plan to commit this patch, removing the #include
<stddef.h> unless someone convinces me we need it, shortly after
development for v10 opens, unless there are objections before then.Hearing no objections, done.
I'd have objected, if I hadn't been on vacation. While I intuitively
*do* think that the increased wait-list overhead won't be relevant, I
also know that my intuition has frequently been wrong around the lwlock
code. This needs some benchmarks on a 4+ socket machine,
first. Something exercising the slow path obviously. E.g. a pgbench with
a small number of writers, and a large number of writers.
Amit just pointed out to me that you wrote "a small number of writers,
and a large number of writers". I assume one of those is supposed to
say "readers"? Probably the second one?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-17 08:31:36 -0400, Robert Haas wrote:
On Tue, Aug 16, 2016 at 5:03 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-15 18:15:23 -0400, Robert Haas wrote:
On Thu, Aug 11, 2016 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Therefore, I plan to commit this patch, removing the #include
<stddef.h> unless someone convinces me we need it, shortly after
development for v10 opens, unless there are objections before then.Hearing no objections, done.
I'd have objected, if I hadn't been on vacation. While I intuitively
*do* think that the increased wait-list overhead won't be relevant, I
also know that my intuition has frequently been wrong around the lwlock
code. This needs some benchmarks on a 4+ socket machine,
first. Something exercising the slow path obviously. E.g. a pgbench with
a small number of writers, and a large number of writers.Amit just pointed out to me that you wrote "a small number of writers,
and a large number of writers". I assume one of those is supposed to
say "readers"? Probably the second one?
Yes. I want a long wait list, modified in bulk - which should be the
case with the above.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-16 23:09:07 -0400, Robert Haas wrote:
On Tue, Aug 16, 2016 at 5:03 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-15 18:15:23 -0400, Robert Haas wrote:
On Thu, Aug 11, 2016 at 2:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Therefore, I plan to commit this patch, removing the #include
<stddef.h> unless someone convinces me we need it, shortly after
development for v10 opens, unless there are objections before then.Hearing no objections, done.
I'd have objected, if I hadn't been on vacation. While I intuitively
*do* think that the increased wait-list overhead won't be relevant, I
also know that my intuition has frequently been wrong around the lwlock
code. This needs some benchmarks on a 4+ socket machine,
first. Something exercising the slow path obviously. E.g. a pgbench with
a small number of writers, and a large number of writers.I have to admit that I totally blanked about you being on vacation.
Thanks for mentioning the workload you think might be adversely
affected, but to be honest, even if there's some workload where this
causes a small regression, I'm not really sure what you think we
should do instead.
Well, you convincingly argued against that approach in a nearby thread
;). Joking aside: I do think that we should make such a change
knowingly. It might also be possible to address it somehow.
I really hope there's no slowdown.
Should we have a separate copy of lwlock.c just
for parallel query and other stuff that uses DSM?
No, that'd be horrible.
Or are you going to argue that parallel query doesn't really need
LWLocks?
Definitely not.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 17, 2016 at 9:12 PM, Andres Freund <andres@anarazel.de> wrote:
Yes. I want a long wait list, modified in bulk - which should be the
case with the above.
I ran some pgbench. And, I do not see much difference in performance, small
variance in perf can be attributed to variance in probability of drawing
the particular built-in script.
Server configuration:
./postgres -c shared_buffers=8GB -N 200 -c
min_wal_size=15GB -c max_wal_size=20GB -c checkpoint_timeout=900 -c
maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 &
pgbench configuration: initialized with scale_factor 300
./pgbench -c $threads -j $threads -T 1800 -M prepared -b simple-update@1
-b select-only@20 postgres
Simple-update : select-only = 5:95
*cilents* *8* *64* *128*
*Current Code* 38279.663784 258196.067342 283150.495921
*After Patch revert* 37316.09022 268285.506338 280077.913954
*% diff * -2.5171944284 3.9076656356 -1.0851409449
Simple-update : selec-only = 20:80
*cilents* *8* *64* *128*
*Current Code* 23169.021469 134791.996882 154057.101004
*After Patch revert* 23892.91539 135091.645551 150402.80543
%diff 3.1244043775 0.2223044958 -2.3720396854
And this was done on our 8 socket intel machine machine
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
On Fri, Aug 19, 2016 at 4:29 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:
On Wed, Aug 17, 2016 at 9:12 PM, Andres Freund <andres@anarazel.de> wrote:
Yes. I want a long wait list, modified in bulk - which should be the
case with the above.I ran some pgbench. And, I do not see much difference in performance,
small variance in perf can be attributed to variance in probability of
drawing the particular built-in script.
Can you specify the m/c details as Andres wants tests to be conducted on
some high socket m/c?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Aug 19, 2016 at 4:44 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
Can you specify the m/c details as Andres wants tests to be conducted on
some high socket m/c?
As I have specified at the last line of my mail it is a 8 socket intel
machine.
available: 8 nodes (0-7)
node 0 cpus: 0 65 66 67 68 69 70 71 96 97 98 99 100 101 102 103
node 0 size: 65498 MB
node 0 free: 50308 MB
node 1 cpus: 72 73 74 75 76 77 78 79 104 105 106 107 108 109 110 111
node 1 size: 65536 MB
node 1 free: 44594 MB
node 2 cpus: 80 81 82 83 84 85 86 87 112 113 114 115 116 117 118 119
node 2 size: 65536 MB
node 2 free: 61814 MB
node 3 cpus: 88 89 90 91 92 93 94 95 120 121 122 123 124 125 126 127
node 3 size: 65536 MB
node 3 free: 55258 MB
node 4 cpus: 1 2 3 4 5 6 7 8 33 34 35 36 37 38 39 40
node 4 size: 65536 MB
node 4 free: 46705 MB
node 5 cpus: 9 10 11 12 13 14 15 16 41 42 43 44 45 46 47 48
node 5 size: 65536 MB
node 5 free: 40948 MB
node 6 cpus: 17 18 19 20 21 22 23 24 49 50 51 52 53 54 55 56
node 6 size: 65536 MB
node 6 free: 47468 MB
node 7 cpus: 25 26 27 28 29 30 31 32 57 58 59 60 61 62 63 64
node 7 size: 65536 MB
node 7 free: 17285 MB
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com