Speed up transaction completion faster after many relations are accessed in a transaction

Started by Tsunakawa, Takayukiabout 7 years ago125 messageshackers
Jump to latest
#1Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com

Hello,

The attached patch speeds up transaction completion when any prior transaction accessed many relations in the same session.

The transaction records its own acquired lock information in the LOCALLOCK structure (a pair of lock object and lock mode). It stores LOCALLOCKs in a hash table in its local memory. The hash table automatically expands when the transaction acquires many relations. The hash table doesn't shrink. When the transaction commits or aborts, it scans the hash table to find LOCALLOCKs to release locks.

The problem is that once some transaction accesses many relations, even subsequent transactions in the same session that only access a few relations take unreasonably long time to complete, because it has to scan the expanded hash table.

The attached patch links LOCALLOCKS to PGPROC, so that releasing locks should only scan the list instead of the hash table. The hash table is kept because some functions want to find a particular LOCALLOCK quickly based on its hash value.

This problem was uncovered while evaluating partitioning performance. When the application PREPAREs a statement once and then EXECUTE-COMMIT repeatedly, the server creates a generic plan on the 6th EXECUTE. Unfortunately, creation of the generic plan of UPDATE/DELETE currently accesses all partitions of the target table (this itself needs improvement), expanding the LOCALLOCK hash table. As a result, 7th and later EXECUTEs get slower.

Imai-san confirmed performance improvement with this patch:

https://commitfest.postgresql.org/22/1993/

Regards
Takayuki Tsunakawa

Attachments:

faster-locallock-scan.patchapplication/octet-stream; name=faster-locallock-scan.patchDownload+31-26
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tsunakawa, Takayuki (#1)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:

The attached patch speeds up transaction completion when any prior transaction accessed many relations in the same session.

Hm. Putting a list header for a purely-local data structure into shared
memory seems quite ugly. Isn't there a better place to keep that?

Do we really want a dlist here at all? I'm concerned that bloating
LOCALLOCK will cost us when there are many locks involved. This patch
increases the size of LOCALLOCK by 25% if I counted right, which does
not seem like a negligible penalty.

My own thought about how to improve this situation was just to destroy
and recreate LockMethodLocalHash at transaction end (or start)
if its size exceeded $some-value. Leaving it permanently bloated seems
like possibly a bad idea, even if we get rid of all the hash_seq_searches
on it.

regards, tom lane

#3David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#2)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

On Tue, 19 Feb 2019 at 12:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My own thought about how to improve this situation was just to destroy
and recreate LockMethodLocalHash at transaction end (or start)
if its size exceeded $some-value. Leaving it permanently bloated seems
like possibly a bad idea, even if we get rid of all the hash_seq_searches
on it.

That seems like a good idea. Although, it would be good to know that
it didn't add too much overhead dropping and recreating the table when
every transaction happened to obtain more locks than $some-value. If
it did, then maybe we could track the average locks per of recent
transactions and just ditch the table after the locks are released if
the locks held by the last transaction exceeded the average *
1.something. No need to go near shared memory to do that.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4Andres Freund
andres@anarazel.de
In reply to: David Rowley (#3)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Hi,

On 2019-02-19 12:52:08 +1300, David Rowley wrote:

On Tue, 19 Feb 2019 at 12:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My own thought about how to improve this situation was just to destroy
and recreate LockMethodLocalHash at transaction end (or start)
if its size exceeded $some-value. Leaving it permanently bloated seems
like possibly a bad idea, even if we get rid of all the hash_seq_searches
on it.

That seems like a good idea. Although, it would be good to know that
it didn't add too much overhead dropping and recreating the table when
every transaction happened to obtain more locks than $some-value. If
it did, then maybe we could track the average locks per of recent
transactions and just ditch the table after the locks are released if
the locks held by the last transaction exceeded the average *
1.something. No need to go near shared memory to do that.

Isn't a large portion of benefits in this patch going to be mooted by
the locking improvements discussed in the other threads? I.e. there's
hopefully not going to be a ton of cases with low overhead where we
acquire a lot of locks and release them very soon after. Sure, for DDL
etc we will, but I can't see this mattering from a performance POV?

I'm not against doing something like Tom proposes, but heuristics with
magic constants like this tend to age purely / are hard to tune well
across systems.

Greetings,

Andres Freund

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#3)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

David Rowley <david.rowley@2ndquadrant.com> writes:

On Tue, 19 Feb 2019 at 12:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My own thought about how to improve this situation was just to destroy
and recreate LockMethodLocalHash at transaction end (or start)
if its size exceeded $some-value. Leaving it permanently bloated seems
like possibly a bad idea, even if we get rid of all the hash_seq_searches
on it.

That seems like a good idea. Although, it would be good to know that
it didn't add too much overhead dropping and recreating the table when
every transaction happened to obtain more locks than $some-value. If
it did, then maybe we could track the average locks per of recent
transactions and just ditch the table after the locks are released if
the locks held by the last transaction exceeded the average *
1.something. No need to go near shared memory to do that.

Yeah, I'd deliberately avoided saying how we'd choose $some-value ;-).
Making it adaptive might not be a bad plan.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund <andres@anarazel.de> writes:

Isn't a large portion of benefits in this patch going to be mooted by
the locking improvements discussed in the other threads? I.e. there's
hopefully not going to be a ton of cases with low overhead where we
acquire a lot of locks and release them very soon after. Sure, for DDL
etc we will, but I can't see this mattering from a performance POV?

Mmm ... AIUI, the patches currently proposed can only help for what
David called "point lookup" queries. There are still going to be
queries that scan a large proportion of a partition tree, so if you've
got tons of partitions, you'll be concerned about this sort of thing.

I'm not against doing something like Tom proposes, but heuristics with
magic constants like this tend to age purely / are hard to tune well
across systems.

I didn't say it had to be a constant ...

regards, tom lane

#7David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#4)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

On Tue, 19 Feb 2019 at 12:56, Andres Freund <andres@anarazel.de> wrote:

Isn't a large portion of benefits in this patch going to be mooted by
the locking improvements discussed in the other threads? I.e. there's
hopefully not going to be a ton of cases with low overhead where we
acquire a lot of locks and release them very soon after. Sure, for DDL
etc we will, but I can't see this mattering from a performance POV?

I think this patch was born from Amit's partition planner improvement
patch. If not that one, which other threads did you have in mind?

A problem exists where, if using a PREPAREd statement to plan a query
to a partitioned table containing many partitions that a generic plan
will never be favoured over a custom plan since the generic plan might
not be able to prune partitions like the custom plan can. The actual
problem is around that we do need to at some point generate a generic
plan in order to know it's more costly and that requires locking
possibly every partition. When plan_cache_mode = auto, this is done
on the 6th execution of the statement. After Amit's partition planner
changes [1]https://commitfest.postgresql.org/22/1778/, the custom plan will only lock partitions that are not
pruned, so the 6th execution of the statement bloats the local lock
table.

[1]: https://commitfest.postgresql.org/22/1778/

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#6)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Hi,

On 2019-02-18 19:01:06 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Isn't a large portion of benefits in this patch going to be mooted by
the locking improvements discussed in the other threads? I.e. there's
hopefully not going to be a ton of cases with low overhead where we
acquire a lot of locks and release them very soon after. Sure, for DDL
etc we will, but I can't see this mattering from a performance POV?

Mmm ... AIUI, the patches currently proposed can only help for what
David called "point lookup" queries. There are still going to be
queries that scan a large proportion of a partition tree, so if you've
got tons of partitions, you'll be concerned about this sort of thing.

Agreed - but it seems not unlikely that for those the rest of the
planner / executor overhead will entirely swamp any improvement we could
make here. If I understand correctly the benchmarks here were made with
"point" update and select queries, although the reference in the first
post in this thread is a bit vague.

I'm not against doing something like Tom proposes, but heuristics with
magic constants like this tend to age purely / are hard to tune well
across systems.

I didn't say it had to be a constant ...

Do you have good idea?

Greetings,

Andres Freund

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund <andres@anarazel.de> writes:

On 2019-02-18 19:01:06 -0500, Tom Lane wrote:

Mmm ... AIUI, the patches currently proposed can only help for what
David called "point lookup" queries. There are still going to be
queries that scan a large proportion of a partition tree, so if you've
got tons of partitions, you'll be concerned about this sort of thing.

Agreed - but it seems not unlikely that for those the rest of the
planner / executor overhead will entirely swamp any improvement we could
make here. If I understand correctly the benchmarks here were made with
"point" update and select queries, although the reference in the first
post in this thread is a bit vague.

I think what Maumau-san is on about here is that not only does your
$giant-query take a long time, but it has a permanent negative effect
on all subsequent transactions in the session. That seems worth
doing something about.

I didn't say it had to be a constant ...

Do you have good idea?

I think David's on the right track --- keep some kind of moving average of
the LOCALLOCK table size for each transaction, and nuke it if it exceeds
some multiple of the recent average. Not sure offhand about how to get
the data cheaply --- it might not be sufficient to look at transaction
end, if we release LOCALLOCK entries before that (but do we?)

regards, tom lane

#10Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Hi,

On 2019-02-18 18:42:32 -0500, Tom Lane wrote:

"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:

The attached patch speeds up transaction completion when any prior transaction accessed many relations in the same session.

Hm. Putting a list header for a purely-local data structure into shared
memory seems quite ugly. Isn't there a better place to keep that?

Yea, I think it'd be just as fine to store that in a static
variable (best defined directly besides LockMethodLocalHash).

(Btw, I'd be entirely unsurprised if moving away from a dynahash for
LockMethodLocalHash would be beneficial)

Do we really want a dlist here at all? I'm concerned that bloating
LOCALLOCK will cost us when there are many locks involved. This patch
increases the size of LOCALLOCK by 25% if I counted right, which does
not seem like a negligible penalty.

It's currently

struct LOCALLOCK {
LOCALLOCKTAG tag; /* 0 20 */

/* XXX 4 bytes hole, try to pack */

LOCK * lock; /* 24 8 */
PROCLOCK * proclock; /* 32 8 */
uint32 hashcode; /* 40 4 */

/* XXX 4 bytes hole, try to pack */

int64 nLocks; /* 48 8 */
_Bool holdsStrongLockCount; /* 56 1 */
_Bool lockCleared; /* 57 1 */

/* XXX 2 bytes hole, try to pack */

int numLockOwners; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
int maxLockOwners; /* 64 4 */

/* XXX 4 bytes hole, try to pack */

LOCALLOCKOWNER * lockOwners; /* 72 8 */

/* size: 80, cachelines: 2, members: 10 */
/* sum members: 66, holes: 4, sum holes: 14 */
/* last cacheline: 16 bytes */
};

seems we could trivially squeeze most of the bytes for a dlist node out
of padding.

My own thought about how to improve this situation was just to destroy
and recreate LockMethodLocalHash at transaction end (or start)
if its size exceeded $some-value. Leaving it permanently bloated seems
like possibly a bad idea, even if we get rid of all the hash_seq_searches
on it.

OTOH, that'll force constant incremental resizing of the hashtable, for
workloads that regularly need a lot of locks. And I'd assume in most
cases if one transaction needs a lot of locks it's quite likely that
future ones will need a lot of locks, too.

Greetings,

Andres Freund

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#9)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Hi,

On 2019-02-18 19:13:31 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2019-02-18 19:01:06 -0500, Tom Lane wrote:

Mmm ... AIUI, the patches currently proposed can only help for what
David called "point lookup" queries. There are still going to be
queries that scan a large proportion of a partition tree, so if you've
got tons of partitions, you'll be concerned about this sort of thing.

Agreed - but it seems not unlikely that for those the rest of the
planner / executor overhead will entirely swamp any improvement we could
make here. If I understand correctly the benchmarks here were made with
"point" update and select queries, although the reference in the first
post in this thread is a bit vague.

I think what Maumau-san is on about here is that not only does your
$giant-query take a long time, but it has a permanent negative effect
on all subsequent transactions in the session. That seems worth
doing something about.

Ah, yes, that makes sense. I'm inclined to think however that the
original approach presented in this thread is better than the
reset-the-whole-hashtable approach. Because:

I think David's on the right track --- keep some kind of moving average of
the LOCALLOCK table size for each transaction, and nuke it if it exceeds
some multiple of the recent average. Not sure offhand about how to get
the data cheaply --- it might not be sufficient to look at transaction
end, if we release LOCALLOCK entries before that (but do we?)

Seems too complicated for my taste. And it doesn't solve the issue of
having some transactions with few locks (say because the plan can be
nicely pruned) interspersed with transactions where a lot of locks are
acquired.

Greetings,

Andres Freund

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#10)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund <andres@anarazel.de> writes:

On 2019-02-18 18:42:32 -0500, Tom Lane wrote:

Do we really want a dlist here at all? I'm concerned that bloating
LOCALLOCK will cost us when there are many locks involved. This patch
increases the size of LOCALLOCK by 25% if I counted right, which does
not seem like a negligible penalty.

It's currently [ 80 bytes with several padding holes ]
seems we could trivially squeeze most of the bytes for a dlist node out
of padding.

Yeah, but if we want to rearrange the members into an illogical order
to save some space, we should do that independently of this patch ---
and then the overhead of this patch would be even worse than 25%.

regards, tom lane

#13Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#12)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Hi,

On 2019-02-18 19:24:54 -0500, Tom Lane wrote:

Yeah, but if we want to rearrange the members into an illogical order
to save some space, we should do that independently of this patch ---

Sure, we should do that. I don't buy the "illogical" bit, just moving
hashcode up to after tag isn't more or less logical, and saves most of
the padding, and moving the booleans to the end isn't better/worse
either.

You always bring up that argument. While I agree that sometimes the most
optimal ordering can be less natural, I think most of the time it vastly
overstates how intelligent the original ordering was. Often new elements
were either just added iteratively without consideration for padding, or
the attention to padding was paid in 32bit times.

I don't find

struct LOCALLOCK {
LOCALLOCKTAG tag; /* 0 20 */
uint32 hashcode; /* 20 4 */
LOCK * lock; /* 24 8 */
PROCLOCK * proclock; /* 32 8 */
int64 nLocks; /* 40 8 */
int numLockOwners; /* 48 4 */
int maxLockOwners; /* 52 4 */
LOCALLOCKOWNER * lockOwners; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
_Bool holdsStrongLockCount; /* 64 1 */
_Bool lockCleared; /* 65 1 */

/* size: 72, cachelines: 2, members: 10 */
/* padding: 6 */
/* last cacheline: 8 bytes */
};

less clear than

struct LOCALLOCK {
LOCALLOCKTAG tag; /* 0 20 */

/* XXX 4 bytes hole, try to pack */

LOCK * lock; /* 24 8 */
PROCLOCK * proclock; /* 32 8 */
uint32 hashcode; /* 40 4 */

/* XXX 4 bytes hole, try to pack */

int64 nLocks; /* 48 8 */
_Bool holdsStrongLockCount; /* 56 1 */
_Bool lockCleared; /* 57 1 */

/* XXX 2 bytes hole, try to pack */

int numLockOwners; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
int maxLockOwners; /* 64 4 */

/* XXX 4 bytes hole, try to pack */

LOCALLOCKOWNER * lockOwners; /* 72 8 */

/* size: 80, cachelines: 2, members: 10 */
/* sum members: 66, holes: 4, sum holes: 14 */
/* last cacheline: 16 bytes */
};

but it's smaller (althoug there's plenty trailing space).

and then the overhead of this patch would be even worse than 25%.

IDK, we, including you, very often make largely independent improvements
to make the cost of something else more palpable. Why's that not OK
here? Especially because we're not comparing to an alternative where no
cost is added, keeping track of e.g. a running average of the hashtable
size isn't free either; nor does it help in the intermittent cases.

- Andres

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#13)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Andres Freund <andres@anarazel.de> writes:

On 2019-02-18 19:24:54 -0500, Tom Lane wrote:

Yeah, but if we want to rearrange the members into an illogical order
to save some space, we should do that independently of this patch ---

Sure, we should do that. I don't buy the "illogical" bit, just moving
hashcode up to after tag isn't more or less logical, and saves most of
the padding, and moving the booleans to the end isn't better/worse
either.

I hadn't looked at the details closely, but if we can squeeze out the
padding space without any loss of intelligibility, sure let's do so.
I still say that's independent of whether to adopt this patch though.

but it's smaller (althoug there's plenty trailing space).

I think you're supposing that these things are independently palloc'd, but
they aren't --- dynahash lays them out in arrays without palloc padding.

IDK, we, including you, very often make largely independent improvements
to make the cost of something else more palpable. Why's that not OK
here?

When we do that, we aren't normally talking about overheads as high as
25% (even more, if it's measured as I think it ought to be). What I'm
concerned about is that the patch is being advocated for cases where
there are lots of LOCALLOCK entries --- which is exactly where the
space overhead is going to hurt the most.

Especially because we're not comparing to an alternative where no
cost is added, keeping track of e.g. a running average of the hashtable
size isn't free either; nor does it help in the intermittent cases.

What I was hoping for --- though perhaps it's not achievable --- was
statistical overhead amounting to just a few more instructions per
transaction. Adding dlist linking would add more instructions per
hashtable entry/removal, which seems like it'd be a substantially
bigger time penalty. As for the intermittent-usage issue, that largely
depends on the details of the when-to-reset heuristic, which we don't
have a concrete design for yet. But I could certainly imagine it waiting
for a few transactions before deciding to chomp.

Anyway, I'm not trying to veto the patch in this form, just suggesting
that there are alternatives worth thinking about.

regards, tom lane

#15Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#14)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

Hi,

On 2019-02-18 20:29:29 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

but it's smaller (althoug there's plenty trailing space).

I think you're supposing that these things are independently palloc'd, but
they aren't --- dynahash lays them out in arrays without palloc padding.

I don't think that matters, given that the trailing six bytes are
included in sizeof() (and have to, to guarantee suitable alignment in
arrays etc).

Greetings,

Andres Freund

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#11)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

On Tue, 19 Feb 2019 at 00:20, Andres Freund <andres@anarazel.de> wrote:

On 2019-02-18 19:13:31 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2019-02-18 19:01:06 -0500, Tom Lane wrote:

Mmm ... AIUI, the patches currently proposed can only help for what
David called "point lookup" queries. There are still going to be
queries that scan a large proportion of a partition tree, so if you've
got tons of partitions, you'll be concerned about this sort of thing.

I think what Maumau-san is on about here is that not only does your
$giant-query take a long time, but it has a permanent negative effect
on all subsequent transactions in the session. That seems worth
doing something about.

Ah, yes, that makes sense. I'm inclined to think however that the
original approach presented in this thread is better than the
reset-the-whole-hashtable approach.

If it was just many-tables then blowing away the hash table would work fine.

The main issue seems to be with partitioning, not with the general case of
many-tables. For that case, it seems like reset hashtable is too much.

Can we use our knowledge of the structure of locks, i.e. partition locks
are all children of the partitioned table, to do a better job?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tsunakawa, Takayuki (#1)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

On 2/12/19 7:33 AM, Tsunakawa, Takayuki wrote:

...

This problem was uncovered while evaluating partitioning performance.
When the application PREPAREs a statement once and then
EXECUTE-COMMIT repeatedly, the server creates a generic plan on the
6th EXECUTE. Unfortunately, creation of the generic plan of
UPDATE/DELETE currently accesses all partitions of the target table
(this itself needs improvement), expanding the LOCALLOCK hash table.
As a result, 7th and later EXECUTEs get slower.

Imai-san confirmed performance improvement with this patch:

https://commitfest.postgresql.org/22/1993/

Can you quantify the effects? That is, how much slower/faster does it get?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tomas Vondra (#17)
RE: Speed up transaction completion faster after many relations are accessed in a transaction

From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]

On 2/12/19 7:33 AM, Tsunakawa, Takayuki wrote:

Imai-san confirmed performance improvement with this patch:

https://commitfest.postgresql.org/22/1993/

Can you quantify the effects? That is, how much slower/faster does it get?

Ugh, sorry, I wrote a wrong URL. The correct page is:

/messages/by-id/0F97FA9ABBDBE54F91744A9B37151A512787EC@g01jpexmbkw24

The quoted figures re:

[v20 + faster-locallock-scan.patch]
auto: 9,069 TPS
custom: 9,015 TPS

[v20]
auto: 8,037 TPS
custom: 8,933 TPS

In the original problematic case, plan_cache_mode = auto (default), we can see about 13% improvement.

Regards
Takayuki Tsunakawa

#19Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Simon Riggs (#16)
RE: Speed up transaction completion faster after many relations are accessed in a transaction

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Hm. Putting a list header for a purely-local data structure into shared
memory seems quite ugly. Isn't there a better place to keep that?

Agreed. I put it in the global variable.

Do we really want a dlist here at all? I'm concerned that bloating
LOCALLOCK will cost us when there are many locks involved. This patch
increases the size of LOCALLOCK by 25% if I counted right, which does
not seem like a negligible penalty.

To delete the LOCALLOCK in RemoveLocalLock(), we need a dlist. slist requires the list iterator to be passed from callers.

From: Andres Freund [mailto:andres@anarazel.de]

Sure, we should do that. I don't buy the "illogical" bit, just moving
hashcode up to after tag isn't more or less logical, and saves most of
the padding, and moving the booleans to the end isn't better/worse
either.

I don't find

Thanks, I've done it.

From: Simon Riggs [mailto:simon@2ndquadrant.com]

Can we use our knowledge of the structure of locks, i.e. partition locks
are all children of the partitioned table, to do a better job?

I couldn't come up with a idea.

Regards
Takayuki Tsunakawa

Attachments:

faster-locallock-scan_v2.patchapplication/octet-stream; name=faster-locallock-scan_v2.patchDownload+35-29
#20Peter Eisentraut
peter_e@gmx.net
In reply to: Tsunakawa, Takayuki (#19)
Re: Speed up transaction completion faster after many relations are accessed in a transaction

On 2019-02-20 07:20, Tsunakawa, Takayuki wrote:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Hm. Putting a list header for a purely-local data structure into shared
memory seems quite ugly. Isn't there a better place to keep that?

Agreed. I put it in the global variable.

I think there is agreement on the principles of this patch. Perhaps it
could be polished a bit.

Your changes in LOCALLOCK still refer to PGPROC, from your first version
of the patch.

I think the reordering of struct members could be done as a separate
preliminary patch.

Some more documentation in the comment before dlist_head LocalLocks to
explain this whole mechanism would be nice.

You posted a link to some performance numbers, but I didn't see the test
setup explained there. I'd like to get some more information on this
impact of this. Is there an effect with 100 tables, or do you need 100000?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Peter Eisentraut (#20)
#22Imai, Yoshikazu
imai.yoshikazu@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#21)
#23Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#21)
#24Peter Eisentraut
peter_e@gmx.net
In reply to: Tsunakawa, Takayuki (#23)
#25Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#24)
#26David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#25)
#27Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#26)
#28Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tsunakawa, Takayuki (#27)
#29David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#27)
#30David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#29)
#31Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Amit Langote (#28)
#32Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#30)
#33Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Peter Eisentraut (#25)
#34Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tsunakawa, Takayuki (#33)
#35Peter Eisentraut
peter_e@gmx.net
In reply to: Amit Langote (#34)
#36Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Peter Eisentraut (#35)
#37Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Peter Eisentraut (#35)
#38Imai, Yoshikazu
imai.yoshikazu@jp.fujitsu.com
In reply to: Amit Langote (#37)
#39Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Amit Langote (#37)
#40Imai, Yoshikazu
imai.yoshikazu@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#36)
#41Peter Eisentraut
peter_e@gmx.net
In reply to: Tsunakawa, Takayuki (#23)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#35)
#43Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#43)
#45Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#42)
#46David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#42)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#46)
#48David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#47)
#49David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#48)
#51David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#50)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#51)
#53David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#52)
#54Andres Freund
andres@anarazel.de
In reply to: David Rowley (#53)
#55David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#54)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#53)
#57David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#56)
#58Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Andres Freund (#43)
#59Andres Freund
andres@anarazel.de
In reply to: Tsunakawa, Takayuki (#58)
#60Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Andres Freund (#59)
#61David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#60)
#62Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#61)
#63Peter Eisentraut
peter_e@gmx.net
In reply to: Tsunakawa, Takayuki (#62)
#64David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#56)
#65Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#64)
#66David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#65)
#67Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#66)
#68David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#67)
#69David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#68)
#70Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#69)
#71David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#70)
#72Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#71)
#73David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#72)
#74Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#73)
#75David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#74)
#76David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#70)
#77Andres Freund
andres@anarazel.de
In reply to: David Rowley (#69)
#78Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Rowley (#76)
#79David Rowley
dgrowleyml@gmail.com
In reply to: Tsunakawa, Takayuki (#78)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#79)
#81David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#80)
#82David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#81)
#83David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#82)
#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#83)
#85Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#84)
#86David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#84)
#87Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#86)
#88Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#86)
#89Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Alvaro Herrera (#88)
#90Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tsunakawa, Takayuki (#89)
#91Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Alvaro Herrera (#90)
#92Michael Paquier
michael@paquier.xyz
In reply to: Tsunakawa, Takayuki (#91)
#93Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Michael Paquier (#92)
#94David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#86)
#95Zhihong Yu
zyu@yugabyte.com
In reply to: David Rowley (#94)
#96David Rowley
dgrowleyml@gmail.com
In reply to: Zhihong Yu (#95)
#97David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#94)
#98David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#97)
#99Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#98)
#100Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: David Rowley (#98)
#101Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#99)
#102David Rowley
dgrowleyml@gmail.com
In reply to: Michael Paquier (#101)
#103Zhihong Yu
zyu@yugabyte.com
In reply to: David Rowley (#102)
#104David Rowley
dgrowleyml@gmail.com
In reply to: Zhihong Yu (#103)
#105Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: David Rowley (#104)
#106David Rowley
dgrowleyml@gmail.com
In reply to: Yura Sokolov (#105)
#107Jacob Champion
jacob.champion@enterprisedb.com
In reply to: David Rowley (#106)
#108David Rowley
dgrowleyml@gmail.com
In reply to: Jacob Champion (#107)
#109Ankit Kumar Pandey
itsankitkp@gmail.com
In reply to: David Rowley (#108)
#110David Rowley
dgrowleyml@gmail.com
In reply to: Ankit Kumar Pandey (#109)
#111vignesh C
vignesh21@gmail.com
In reply to: David Rowley (#108)
#112David Rowley
dgrowleyml@gmail.com
In reply to: vignesh C (#111)
#113Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#112)
#114Andres Freund
andres@anarazel.de
In reply to: David Rowley (#112)
#115David Rowley
dgrowleyml@gmail.com
In reply to: Amit Langote (#113)
#116Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: David Rowley (#115)
#117David Rowley
dgrowleyml@gmail.com
In reply to: Heikki Linnakangas (#116)
#118Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: David Rowley (#117)
#119David Rowley
dgrowleyml@gmail.com
In reply to: Heikki Linnakangas (#118)
#120Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: David Rowley (#119)
#121vignesh C
vignesh21@gmail.com
In reply to: Heikki Linnakangas (#120)
#122Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: vignesh C (#121)
#123Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#122)
#124Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#123)
#125Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#124)