Improving connection scalability (src/backend/storage/ipc/procarray.c)

ranier.vf@gmail.com

over 3 years ago

In reply to: Robert Haas (#2)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em ter., 24 de mai. de 2022 às 13:06, Robert Haas <robertmhaas@gmail.com>
escreveu:

On Tue, May 24, 2022 at 11:28 AM Ranier Vilela <ranier.vf@gmail.com>
wrote:

I think that I got something.

You might have something, but it's pretty hard to tell based on
looking at this patch. Whatever relevant changes it has are mixed with
a bunch of changes that are probably not relevant. For example, it's
hard to believe that moving "uint32 i" to an inner scope in
XidInMVCCSnapshot() is causing a performance gain, because an
optimizing compiler should figure that out anyway.

I believe that even these small changes are helpful and favorable.
Improves code readability and helps the compiler generate better code,
especially for older compilers.

An even bigger issue is that it's not sufficient to just demonstrate
that the patch improves performance. It's also necessary to make an
argument as to why it is safe and correct, and "I tried it out and
nothing seemed to break" does not qualify as an argument.

Ok, certainly the convincing work is not good.

I'd guess that most or maybe all of the performance gain that you've

observed
here is attributable to changing GetSnapshotData() to call
GetSnapshotDataReuse() without first acquiring ProcArrayLock.

It certainly helps, but I trust that's not the only reason, in all the
tests I did, there was an improvement in performance, even before using
this feature.
If you look closely at GetSnapShotData() you will see that
GetSnapshotDataReuse is called for all snapshots, even the new ones, which
is unnecessary.
Another example NormalTransactionIdPrecedes is more expensive than testing
statusFlags.

That

doesn't seem like a completely hopeless idea, because the comments for
GetSnapshotDataReuse() say this:

* This very likely can be evolved to not need ProcArrayLock held (at very
* least in the case we already hold a snapshot), but that's for another
day.

However, those comment seem to imply that it might not be safe in all
cases, and that changes might be needed someplace in order to make it
safe, but you haven't updated these comments, or changed the function
in any way, so it's not really clear how or whether whatever problems
Andres was worried about have been handled.

I think it's worth trying and testing to see if everything goes well,
so in the final patch apply whatever comments are needed.

regards,
Ranier Vilela

andres@anarazel.de

over 3 years ago

In reply to: Ranier Vilela (#1)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Hi,

On 2022-05-24 12:28:20 -0300, Ranier Vilela wrote:

Linux Ubuntu 64 bits (gcc 9.4)
./pgbench -M prepared -c $conns -j $conns -S -n -U postgres

conns tps head tps patched
1 2918.004085 3190.810466
10 12262.415696 17199.862401
50 13656.724571 18278.194114
80 14338.202348 17955.336101
90 16597.510373 18269.660184
100 17706.775793 18349.650150
200 16877.067441 17881.250615
300 16942.260775 17181.441752
400 16794.514911 17124.533892
500 16598.502151 17181.244953
600 16717.935001 16961.130742
700 16651.204834 16959.172005
800 16467.546583 16834.591719
900 16588.241149 16693.902459
1000 16564.985265 16936.952195

17-18k tps is pretty low for pgbench -S. For a shared_buffers resident run, I
can get 40k in a single connection in an optimized build. If you're testing a
workload >> shared_buffers, GetSnapshotData() isn't the bottleneck. And
testing an assert build isn't a meaningful exercise either, unless you have
way way higher gains (i.e. stuff like turning O(n^2) into O(n)).

What pgbench scale is this and are you using an optimized build?

Greetings,

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Ranier Vilela (#3)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Hi,

On 2022-05-24 13:23:43 -0300, Ranier Vilela wrote:

It certainly helps, but I trust that's not the only reason, in all the
tests I did, there was an improvement in performance, even before using
this feature.
If you look closely at GetSnapShotData() you will see that
GetSnapshotDataReuse is called for all snapshots, even the new ones, which
is unnecessary.

That only happens a handful of times as snapshots are persistently
allocated. Doing an extra GetSnapshotDataReuse() in those cases doesn't matter
for performance. If anything this increases the number of jumps for the common
case.

It'd be a huge win to avoid needing ProcArrayLock when reusing a snapshot, but
it's not at all easy to guarantee that it's correct / see how to make it
correct. I'm fairly sure it can be made correct, but ...

Another example NormalTransactionIdPrecedes is more expensive than testing
statusFlags.

That may be true when you count instructions, but isn't at all true when you
take into account that the cachelines containing status flags are hotly
contended.

Also, the likelihood of filtering out a proc due to
NormalTransactionIdPrecedes(xid, xmax) is *vastly* higher than the due to the
statusFlags check. There may be a lot of procs failing that test, but
typically there will be far fewer backends in vacuum or logical decoding.

Greetings,

Andres Freund

ranier.vf@gmail.com

over 3 years ago

In reply to: Andres Freund (#4)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em qua., 25 de mai. de 2022 às 00:46, Andres Freund <andres@anarazel.de>
escreveu:

Hi Andres, thank you for taking a look.

On 2022-05-24 12:28:20 -0300, Ranier Vilela wrote:

Linux Ubuntu 64 bits (gcc 9.4)
./pgbench -M prepared -c $conns -j $conns -S -n -U postgres

conns tps head tps patched
1 2918.004085 3190.810466
10 12262.415696 17199.862401
50 13656.724571 18278.194114
80 14338.202348 17955.336101
90 16597.510373 18269.660184
100 17706.775793 18349.650150
200 16877.067441 17881.250615
300 16942.260775 17181.441752
400 16794.514911 17124.533892
500 16598.502151 17181.244953
600 16717.935001 16961.130742
700 16651.204834 16959.172005
800 16467.546583 16834.591719
900 16588.241149 16693.902459
1000 16564.985265 16936.952195

17-18k tps is pretty low for pgbench -S. For a shared_buffers resident
run, I
can get 40k in a single connection in an optimized build. If you're
testing a
workload >> shared_buffers, GetSnapshotData() isn't the bottleneck. And
testing an assert build isn't a meaningful exercise either, unless you have
way way higher gains (i.e. stuff like turning O(n^2) into O(n)).

Thanks for sharing these hits.
Yes, their 17-18k tps are disappointing.

What pgbench scale is this and are you using an optimized build?

Yes this optimized build.
CFLAGS='-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation -Wno-stringop-truncation -O2'
from config.log

pgbench was initialized with:
pgbench -i -p 5432 -d postgres

pgbench -M prepared -c 100 -j 100 -S -n -U postgres
pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100

The shared_buffers is default:
shared_buffers = 128MB

Intel® Core™ i5-8250U CPU Quad Core
RAM 8GB
SSD 256 GB

Can you share the pgbench configuration and shared buffers
this benchmark?
/messages/by-id/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de

regards,
Ranier Vilela

ranier.vf@gmail.com

over 3 years ago

In reply to: Andres Freund (#5)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em qua., 25 de mai. de 2022 às 00:56, Andres Freund <andres@anarazel.de>
escreveu:

Hi,

On 2022-05-24 13:23:43 -0300, Ranier Vilela wrote:

It certainly helps, but I trust that's not the only reason, in all the
tests I did, there was an improvement in performance, even before using
this feature.
If you look closely at GetSnapShotData() you will see that
GetSnapshotDataReuse is called for all snapshots, even the new ones,

which

is unnecessary.

That only happens a handful of times as snapshots are persistently
allocated.

Yes, but now this does not happen with new snapshots.

Doing an extra GetSnapshotDataReuse() in those cases doesn't matter

for performance. If anything this increases the number of jumps for the
common
case.

IMHO with GetSnapShotData(), any gain makes a difference.

It'd be a huge win to avoid needing ProcArrayLock when reusing a snapshot,
but
it's not at all easy to guarantee that it's correct / see how to make it
correct. I'm fairly sure it can be made correct, but ...

I believe it's worth the effort to make sure everything goes well and use
this feature.

Another example NormalTransactionIdPrecedes is more expensive than

testing

statusFlags.

That may be true when you count instructions, but isn't at all true when
you
take into account that the cachelines containing status flags are hotly
contended.

Also, the likelihood of filtering out a proc due to
NormalTransactionIdPrecedes(xid, xmax) is *vastly* higher than the due to
the
statusFlags check. There may be a lot of procs failing that test, but
typically there will be far fewer backends in vacuum or logical decoding.

I believe that keeping the instructions in the cache together works better
than having the status flags test in the middle.
But I will test this to be sure.

regards,
Ranier Vilela

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Ranier Vilela (#6)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 5/25/22 11:07, Ranier Vilela wrote:

Em qua., 25 de mai. de 2022 às 00:46, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> escreveu:

Hi Andres, thank you for taking a look.

On 2022-05-24 12:28:20 -0300, Ranier Vilela wrote:

Linux Ubuntu 64 bits (gcc 9.4)
./pgbench -M prepared -c $conns -j $conns -S -n -U postgres

conns tps head tps patched
1 2918.004085 3190.810466
10 12262.415696 17199.862401
50 13656.724571 18278.194114
80 14338.202348 17955.336101
90 16597.510373 18269.660184
100 17706.775793 18349.650150
200 16877.067441 17881.250615
300 16942.260775 17181.441752
400 16794.514911 17124.533892
500 16598.502151 17181.244953
600 16717.935001 16961.130742
700 16651.204834 16959.172005
800 16467.546583 16834.591719
900 16588.241149 16693.902459
1000 16564.985265 16936.952195

17-18k tps is pretty low for pgbench -S. For a shared_buffers
resident run, I
can get 40k in a single connection in an optimized build. If you're
testing a
workload >> shared_buffers, GetSnapshotData() isn't the bottleneck. And
testing an assert build isn't a meaningful exercise either, unless
you have
way way higher gains (i.e. stuff like turning O(n^2) into O(n)).

Thanks for sharing these hits.
Yes, their 17-18k tps are disappointing.

What pgbench scale is this and are you using an optimized build?

Yes this optimized build.
CFLAGS='-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncation -O2'
from config.log

That can still be assert-enabled build. We need to see configure flags.

pgbench was initialized with:
pgbench -i -p 5432 -d postgres

pgbench -M prepared -c 100 -j 100 -S -n -U postgres

You're not specifying duration/number of transactions to execute. So
it's using just 10 transactions per client, which is bound to give you
bogus results due to not having anything in relcache etc. Use -T 60 or
something like that.

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100

The shared_buffers is default:
shared_buffers = 128MB

Intel® Core™ i5-8250U CPU Quad Core
RAM 8GB
SSD 256 GB

Well, quick results on my laptop (i7-9750H, so not that different from
what you have):

1 = 18908.080126
2 = 32943.953182
3 = 42316.079028
4 = 46700.087645

So something is likely wrong in your setup.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

ranier.vf@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#8)

1 attachment(s)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em qua., 25 de mai. de 2022 às 07:13, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/25/22 11:07, Ranier Vilela wrote:

Em qua., 25 de mai. de 2022 às 00:46, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> escreveu:

Hi Andres, thank you for taking a look.

On 2022-05-24 12:28:20 -0300, Ranier Vilela wrote:

Linux Ubuntu 64 bits (gcc 9.4)
./pgbench -M prepared -c $conns -j $conns -S -n -U postgres

conns tps head tps patched
1 2918.004085 3190.810466
10 12262.415696 17199.862401
50 13656.724571 18278.194114
80 14338.202348 17955.336101
90 16597.510373 18269.660184
100 17706.775793 18349.650150
200 16877.067441 17881.250615
300 16942.260775 17181.441752
400 16794.514911 17124.533892
500 16598.502151 17181.244953
600 16717.935001 16961.130742
700 16651.204834 16959.172005
800 16467.546583 16834.591719
900 16588.241149 16693.902459
1000 16564.985265 16936.952195

17-18k tps is pretty low for pgbench -S. For a shared_buffers
resident run, I
can get 40k in a single connection in an optimized build. If you're
testing a
workload >> shared_buffers, GetSnapshotData() isn't the bottleneck.

And

testing an assert build isn't a meaningful exercise either, unless
you have
way way higher gains (i.e. stuff like turning O(n^2) into O(n)).

Thanks for sharing these hits.
Yes, their 17-18k tps are disappointing.

What pgbench scale is this and are you using an optimized build?

Yes this optimized build.
CFLAGS='-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncation -O2'
from config.log

That can still be assert-enabled build. We need to see configure flags.

./configure
Attached the config.log (compressed)

pgbench was initialized with:
pgbench -i -p 5432 -d postgres

pgbench -M prepared -c 100 -j 100 -S -n -U postgres

You're not specifying duration/number of transactions to execute. So
it's using just 10 transactions per client, which is bound to give you
bogus results due to not having anything in relcache etc. Use -T 60 or
something like that.

Ok, I will try with -T 60.

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100

The shared_buffers is default:
shared_buffers = 128MB

Intel® Core™ i5-8250U CPU Quad Core
RAM 8GB
SSD 256 GB

Well, quick results on my laptop (i7-9750H, so not that different from
what you have):

1 = 18908.080126
2 = 32943.953182
3 = 42316.079028
4 = 46700.087645

So something is likely wrong in your setup.

select version();
version

----------------------------------------------------------------------------------------------------------
PostgreSQL 15beta1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
9.4.0-1ubuntu1~20.04.1'
--with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2
--prefix=/usr --with-gcc-major-version-only --program-suffix=-9
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie
--with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

regards,
Ranier Vilela

#10

ranier.vf@gmail.com

over 3 years ago

In reply to: Ranier Vilela (#9)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em qua., 25 de mai. de 2022 às 08:26, Ranier Vilela <ranier.vf@gmail.com>
escreveu:

Em qua., 25 de mai. de 2022 às 07:13, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/25/22 11:07, Ranier Vilela wrote:

Em qua., 25 de mai. de 2022 às 00:46, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> escreveu:

Hi Andres, thank you for taking a look.

On 2022-05-24 12:28:20 -0300, Ranier Vilela wrote:

Linux Ubuntu 64 bits (gcc 9.4)
./pgbench -M prepared -c $conns -j $conns -S -n -U postgres

conns tps head tps patched
1 2918.004085 3190.810466
10 12262.415696 17199.862401
50 13656.724571 18278.194114
80 14338.202348 17955.336101
90 16597.510373 18269.660184
100 17706.775793 18349.650150
200 16877.067441 17881.250615
300 16942.260775 17181.441752
400 16794.514911 17124.533892
500 16598.502151 17181.244953
600 16717.935001 16961.130742
700 16651.204834 16959.172005
800 16467.546583 16834.591719
900 16588.241149 16693.902459
1000 16564.985265 16936.952195

17-18k tps is pretty low for pgbench -S. For a shared_buffers
resident run, I
can get 40k in a single connection in an optimized build. If you're
testing a
workload >> shared_buffers, GetSnapshotData() isn't the bottleneck.

And

testing an assert build isn't a meaningful exercise either, unless
you have
way way higher gains (i.e. stuff like turning O(n^2) into O(n)).

Thanks for sharing these hits.
Yes, their 17-18k tps are disappointing.

What pgbench scale is this and are you using an optimized build?

Yes this optimized build.
CFLAGS='-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncation -O2'
from config.log

That can still be assert-enabled build. We need to see configure flags.

./configure
Attached the config.log (compressed)

pgbench was initialized with:
pgbench -i -p 5432 -d postgres

pgbench -M prepared -c 100 -j 100 -S -n -U postgres

You're not specifying duration/number of transactions to execute. So
it's using just 10 transactions per client, which is bound to give you
bogus results due to not having anything in relcache etc. Use -T 60 or
something like that.

Ok, I will try with -T 60.

Here the results with -T 60:
Linux Ubuntu 64 bits
shared_buffers = 128MB

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17126.326108 17792.414234
10 82068.123383 82468.334836
50 73808.731404 74678.839428
80 73290.191713 73116.553986
90 67558.483043 68384.906949
100 65960.982801 66997.793777
200 62216.011998 62870.243385
300 62924.225658 62796.157548
400 62278.099704 63129.555135
500 63257.930870 62188.825044
600 61479.890611 61517.913967
700 61139.354053 61327.898847
800 60833.663791 61517.913967
900 61305.129642 61248.336593
1000 60990.918719 61041.670996

Linux Ubuntu 64 bits
shared_buffers = 2048MB

./pgbench -M prepared -c $conns -j $conns -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
number of transactions per client: 10

conns tps head tps patched

1 2918.004085 3211.303789
10 12262.415696 15540.015540
50 13656.724571 16701.182444
80 14338.202348 16628.559551
90 16597.510373 16835.016835
100 17706.775793 16607.433487
200 16877.067441 16426.969799
300 16942.260775 16319.780662
400 16794.514911 16155.023607
500 16598.502151 16051.106724
600 16717.935001 16007.171213
700 16651.204834 16004.353184
800 16467.546583 16834.591719
900 16588.241149 16693.902459
1000 16564.985265 16936.952195

Linux Ubuntu 64 bits
shared_buffers = 2048MB

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17174.265804 17792.414234
10 82365.634750 82468.334836
50 74593.714180 74678.839428
80 69219.756038 73116.553986
90 67419.574189 68384.906949
100 66613.771701 66997.793777
200 61739.784830 62870.243385
300 62109.691298 62796.157548
400 61630.822446 63129.555135
500 61711.019964 62755.190389
600 60620.010181 61517.913967
700 60303.317736 61688.044232
800 60451.113573 61076.666572
900 60017.327157 61256.290037
1000 60088.823434 60986.799312

regards,
Ranier Vilela

#11

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Ranier Vilela (#10)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 5/27/22 02:11, Ranier Vilela wrote:

...

Here the results with -T 60:

Might be a good idea to share your analysis / interpretation of the
results, not just the raw data. After all, the change is being proposed
by you, so do you think this shows the change is beneficial?

Linux Ubuntu 64 bits
shared_buffers = 128MB

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17126.326108 17792.414234
10 82068.123383 82468.334836
50 73808.731404 74678.839428
80 73290.191713 73116.553986
90 67558.483043 68384.906949
100 65960.982801 66997.793777
200 62216.011998 62870.243385
300 62924.225658 62796.157548
400 62278.099704 63129.555135
500 63257.930870 62188.825044
600 61479.890611 61517.913967
700 61139.354053 61327.898847
800 60833.663791 61517.913967
900 61305.129642 61248.336593
1000 60990.918719 61041.670996

These results look much saner, but IMHO it also does not show any clear
benefit of the patch. Or are you still claiming there is a benefit?

BTW it's generally a good idea to do multiple runs and then use the
average and/or median. Results from a single may be quite noisy.

Linux Ubuntu 64 bits
shared_buffers = 2048MB

./pgbench -M prepared -c $conns -j $conns -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
number of transactions per client: 10

conns          tps head              tps patched

1 2918.004085    3211.303789
10     12262.415696 15540.015540
50     13656.724571 16701.182444
80     14338.202348 16628.559551
90     16597.510373 16835.016835
100 17706.775793 16607.433487
200 16877.067441 16426.969799
300 16942.260775 16319.780662
400 16794.514911 16155.023607
500 16598.502151 16051.106724
600 16717.935001 16007.171213
700 16651.204834 16004.353184
800 16467.546583 16834.591719
900 16588.241149 16693.902459
1000   16564.985265 16936.952195

I think we've agreed these results are useless.

Linux Ubuntu 64 bits
shared_buffers = 2048MB

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns       tps head tps patched

1 17174.265804 17792.414234
10 82365.634750 82468.334836
50 74593.714180 74678.839428
80 69219.756038 73116.553986
90 67419.574189 68384.906949
100     66613.771701 66997.793777
200 61739.784830 62870.243385
300 62109.691298 62796.157548
400 61630.822446 63129.555135
500 61711.019964 62755.190389
600 60620.010181 61517.913967
700 60303.317736 61688.044232
800 60451.113573 61076.666572
900 60017.327157 61256.290037
1000   60088.823434 60986.799312

I have no idea why shared buffers 2GB would be interesting. The proposed
change was related to procarray, not shared buffers. And scale 1 is
~15MB of data, so it fits into 128MB just fine.

Also, the first ~10 results for "patched" case match results for 128MB
shared buffers. That seems very unlikely to happen by chance, so this
seems rather suspicious.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

ranier.vf@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#11)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/27/22 02:11, Ranier Vilela wrote:

...

Here the results with -T 60:

Might be a good idea to share your analysis / interpretation of the
results, not just the raw data. After all, the change is being proposed
by you, so do you think this shows the change is beneficial?

I think so, but the expectation has diminished.
I expected that the more connections, the better the performance.
And for both patch and head, this doesn't happen in tests.
Performance degrades with a greater number of connections.
GetSnapShowData() isn't a bottleneck?

Linux Ubuntu 64 bits
shared_buffers = 128MB

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17126.326108 17792.414234
10 82068.123383 82468.334836
50 73808.731404 74678.839428
80 73290.191713 73116.553986
90 67558.483043 68384.906949
100 65960.982801 66997.793777
200 62216.011998 62870.243385
300 62924.225658 62796.157548
400 62278.099704 63129.555135
500 63257.930870 62188.825044
600 61479.890611 61517.913967
700 61139.354053 61327.898847
800 60833.663791 61517.913967
900 61305.129642 61248.336593
1000 60990.918719 61041.670996

These results look much saner, but IMHO it also does not show any clear
benefit of the patch. Or are you still claiming there is a benefit?

We agree that they are micro-optimizations.
However, I think they should be considered micro-optimizations in inner
loops,
because all in procarray.c is a hotpath.
The first objective, I believe, was achieved, with no performance
regression.
I agree, the gains are small, by the tests done.
But, IMHO, this is a good way, small gains turn into big gains in the end,
when applied to all code.

Consider GetSnapShotData()
1. Most of the time the snapshot is not null, so:
if (snaphost == NULL), will fail most of the time.

With the patch:
if (snapshot->xip != NULL)
{
if (GetSnapshotDataReuse(snapshot))
return snapshot;
}

Most of the time the test is true and GetSnapshotDataReuse is not called
for new
snapshots.

count, subcount and suboverflowed, will not be initialized, for all
snapshots.

2. If snapshot is taken during recoverys
The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily.
Only if is not suboverflowed.
Skipping all InvalidTransactionId, mypgxactoff, backends doing logical
decoding,
and XID is >= xmax.

3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock.
There's an agreement that this would be fine, for now.

Consider ComputeXidHorizons()
1. ProcGlobal->statusFlags is touched before the lock.
2. allStatusFlags[index] is not touched for all numProcs.

All changes were made with the aim of avoiding or postponing unnecessary
work.

BTW it's generally a good idea to do multiple runs and then use the
average and/or median. Results from a single may be quite noisy.

Linux Ubuntu 64 bits
shared_buffers = 2048MB

./pgbench -M prepared -c $conns -j $conns -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
number of transactions per client: 10

conns tps head tps patched

1 2918.004085 3211.303789
10 12262.415696 15540.015540
50 13656.724571 16701.182444
80 14338.202348 16628.559551
90 16597.510373 16835.016835
100 17706.775793 16607.433487
200 16877.067441 16426.969799
300 16942.260775 16319.780662
400 16794.514911 16155.023607
500 16598.502151 16051.106724
600 16717.935001 16007.171213
700 16651.204834 16004.353184
800 16467.546583 16834.591719
900 16588.241149 16693.902459
1000 16564.985265 16936.952195

I think we've agreed these results are useless.

Linux Ubuntu 64 bits
shared_buffers = 2048MB

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17174.265804 17792.414234
10 82365.634750 82468.334836
50 74593.714180 74678.839428
80 69219.756038 73116.553986
90 67419.574189 68384.906949
100 66613.771701 66997.793777
200 61739.784830 62870.243385
300 62109.691298 62796.157548
400 61630.822446 63129.555135
500 61711.019964 62755.190389
600 60620.010181 61517.913967
700 60303.317736 61688.044232
800 60451.113573 61076.666572
900 60017.327157 61256.290037
1000 60088.823434 60986.799312

I have no idea why shared buffers 2GB would be interesting. The proposed
change was related to procarray, not shared buffers. And scale 1 is
~15MB of data, so it fits into 128MB just fine.

I thought about doing this benchmark, in the most common usage situation
(25% of RAM).

Also, the first ~10 results for "patched" case match results for 128MB
shared buffers. That seems very unlikely to happen by chance, so this
seems rather suspicious.

Probably, copy and paste mistake.
I redid this test, for patched:

Linux Ubuntu 64 bits
shared_buffers = 2048MB

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17174.265804 17524.482668
10 82365.634750 81840.537713
50 74593.714180 74806.729434
80 69219.756038 73116.553986
90 67419.574189 69130.749209
100 66613.771701 67478.234595
200 61739.784830 63094.202413
300 62109.691298 62984.501251
400 61630.822446 63243.232816
500 61711.019964 62827.977636
600 60620.010181 62838.051693
700 60303.317736 61594.629618
800 60451.113573 61208.629058
900 60017.327157 61171.001256
1000 60088.823434 60558.067810

regards,
Ranier Vilela

#13

andres@anarazel.de

over 3 years ago

In reply to: Tomas Vondra (#11)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Hi,

On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:

On 5/27/22 02:11, Ranier Vilela wrote:

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17126.326108 17792.414234
10 82068.123383 82468.334836
50 73808.731404 74678.839428
80 73290.191713 73116.553986
90 67558.483043 68384.906949
100 65960.982801 66997.793777
200 62216.011998 62870.243385
300 62924.225658 62796.157548
400 62278.099704 63129.555135
500 63257.930870 62188.825044
600 61479.890611 61517.913967
700 61139.354053 61327.898847
800 60833.663791 61517.913967
900 61305.129642 61248.336593
1000 60990.918719 61041.670996

These results look much saner, but IMHO it also does not show any clear
benefit of the patch. Or are you still claiming there is a benefit?

They don't look all that sane to me - isn't that way lower than one would
expect? Restricting both client and server to the same four cores, a
thermically challenged older laptop I have around I get 150k tps at both 10
and 100 clients.

Either way, I'd not expect to see any GetSnapshotData() scalability effects to
show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just not enough
concurrency.

The correct pieces of these changes seem very unlikely to affect
GetSnapshotData() performance meaningfully.

To improve something like GetSnapshotData() you first have to come up with a
workload that shows it being a meaningful part of a profile. Unless it is,
performance differences are going to just be due to various forms of noise.

Greetings,

Andres Freund

#14

andres@anarazel.de

over 3 years ago

In reply to: Ranier Vilela (#12)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Hi,

On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:

Em qui., 26 de mai. de 2022 ï¿½s 22:30, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/27/22 02:11, Ranier Vilela wrote:

...

Here the results with -T 60:

Might be a good idea to share your analysis / interpretation of the
results, not just the raw data. After all, the change is being proposed
by you, so do you think this shows the change is beneficial?

I think so, but the expectation has diminished.
I expected that the more connections, the better the performance.
And for both patch and head, this doesn't happen in tests.
Performance degrades with a greater number of connections.

Your system has four CPUs. Once they're all busy, adding more connections
won't improve performance. It'll just add more and more context switching,
cache misses, and make the OS scheduler do more work.

GetSnapShowData() isn't a bottleneck?

I'd be surprised if it showed up in a profile on your machine with that
workload in any sort of meaningful way. The snapshot reuse logic will always
work - because there are no writes - and thus the only work that needs to be
done is to acquire the ProcArrayLock briefly. And because there is only a
small number of cores, contention on the cacheline for that isn't a problem.

These results look much saner, but IMHO it also does not show any clear
benefit of the patch. Or are you still claiming there is a benefit?

We agree that they are micro-optimizations. However, I think they should be
considered micro-optimizations in inner loops, because all in procarray.c is
a hotpath.

As explained earlier, I don't agree that they optimize anything - you're
making some of the scalability behaviour *worse*, if it's changed at all.

The first objective, I believe, was achieved, with no performance
regression.
I agree, the gains are small, by the tests done.

There are no gains.

But, IMHO, this is a good way, small gains turn into big gains in the end,
when applied to all code.

Consider GetSnapShotData()
1. Most of the time the snapshot is not null, so:
if (snaphost == NULL), will fail most of the time.

With the patch:
if (snapshot->xip != NULL)
{
if (GetSnapshotDataReuse(snapshot))
return snapshot;
}

Most of the time the test is true and GetSnapshotDataReuse is not called
for new
snapshots.
count, subcount and suboverflowed, will not be initialized, for all
snapshots.

But that's irrelevant. There's only a few "new" snapshots in the life of a
connection. You're optimizing something irrelevant.

2. If snapshot is taken during recoverys
The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily.

That code isn't reached when in recovery?

3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock.
There's an agreement that this would be fine, for now.

There's no such agreement at all. It's not correct.

Consider ComputeXidHorizons()
1. ProcGlobal->statusFlags is touched before the lock.

Hard to believe that'd have a measurable effect.

2. allStatusFlags[index] is not touched for all numProcs.

I'd be surprised if the compiler couldn't defer that load on its own.

Greetings,

Andres Freund

#15

ranier.vf@gmail.com

over 3 years ago

In reply to: Andres Freund (#13)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em sex., 27 de mai. de 2022 às 18:08, Andres Freund <andres@anarazel.de>
escreveu:

Hi,

On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:

On 5/27/22 02:11, Ranier Vilela wrote:

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17126.326108 17792.414234
10 82068.123383 82468.334836
50 73808.731404 74678.839428
80 73290.191713 73116.553986
90 67558.483043 68384.906949
100 65960.982801 66997.793777
200 62216.011998 62870.243385
300 62924.225658 62796.157548
400 62278.099704 63129.555135
500 63257.930870 62188.825044
600 61479.890611 61517.913967
700 61139.354053 61327.898847
800 60833.663791 61517.913967
900 61305.129642 61248.336593
1000 60990.918719 61041.670996

These results look much saner, but IMHO it also does not show any clear
benefit of the patch. Or are you still claiming there is a benefit?

They don't look all that sane to me - isn't that way lower than one would
expect?

Yes, quite disappointing.

Restricting both client and server to the same four cores, a

thermically challenged older laptop I have around I get 150k tps at both 10
and 100 clients.

And you can share the benchmark details? Hardware, postgres and pgbench,
please?

Either way, I'd not expect to see any GetSnapshotData() scalability
effects to
show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just not
enough
concurrency.

This means that our customers will not see any connections scalability with
PG15, using the simplest hardware?

The correct pieces of these changes seem very unlikely to affect
GetSnapshotData() performance meaningfully.

To improve something like GetSnapshotData() you first have to come up with
a
workload that shows it being a meaningful part of a profile. Unless it is,
performance differences are going to just be due to various forms of noise.

Actually in the profiles I got with perf, GetSnapShotData() didn't show up.

regards,
Ranier Vilela

#16

ranier.vf@gmail.com

over 3 years ago

In reply to: Andres Freund (#14)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em sex., 27 de mai. de 2022 às 18:22, Andres Freund <andres@anarazel.de>
escreveu:

Hi,

On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:

Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/27/22 02:11, Ranier Vilela wrote:

...

Here the results with -T 60:

Might be a good idea to share your analysis / interpretation of the
results, not just the raw data. After all, the change is being proposed
by you, so do you think this shows the change is beneficial?

I think so, but the expectation has diminished.
I expected that the more connections, the better the performance.
And for both patch and head, this doesn't happen in tests.
Performance degrades with a greater number of connections.

Your system has four CPUs. Once they're all busy, adding more connections
won't improve performance. It'll just add more and more context switching,
cache misses, and make the OS scheduler do more work.

conns tps head
10 82365.634750
50 74593.714180
80 69219.756038
90 67419.574189
100 66613.771701
Yes it is quite disappointing that with 100 connections, tps loses to 10
connections.

GetSnapShowData() isn't a bottleneck?

I'd be surprised if it showed up in a profile on your machine with that
workload in any sort of meaningful way. The snapshot reuse logic will
always
work - because there are no writes - and thus the only work that needs to
be
done is to acquire the ProcArrayLock briefly. And because there is only a
small number of cores, contention on the cacheline for that isn't a
problem.

Thanks for sharing this.

These results look much saner, but IMHO it also does not show any clear
benefit of the patch. Or are you still claiming there is a benefit?

We agree that they are micro-optimizations. However, I think they

should be

considered micro-optimizations in inner loops, because all in

procarray.c is

a hotpath.

As explained earlier, I don't agree that they optimize anything - you're
making some of the scalability behaviour *worse*, if it's changed at all.

The first objective, I believe, was achieved, with no performance
regression.
I agree, the gains are small, by the tests done.

There are no gains.

IMHO, I must disagree.

But, IMHO, this is a good way, small gains turn into big gains in the

end,

when applied to all code.

Consider GetSnapShotData()
1. Most of the time the snapshot is not null, so:
if (snaphost == NULL), will fail most of the time.

With the patch:
if (snapshot->xip != NULL)
{
if (GetSnapshotDataReuse(snapshot))
return snapshot;
}

Most of the time the test is true and GetSnapshotDataReuse is not called
for new
snapshots.
count, subcount and suboverflowed, will not be initialized, for all
snapshots.

But that's irrelevant. There's only a few "new" snapshots in the life of a
connection. You're optimizing something irrelevant.

IMHO, when GetSnapShotData() is the bottleneck, all is relevant.

2. If snapshot is taken during recoverys
The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily.

That code isn't reached when in recovery?

Currently it is reached *even* when not in recovery.
With the patch, *only* is reached when in recovery.

3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock.
There's an agreement that this would be fine, for now.

There's no such agreement at all. It's not correct.

Ok, but there is a chance it will work correctly.

Consider ComputeXidHorizons()
1. ProcGlobal->statusFlags is touched before the lock.

Hard to believe that'd have a measurable effect.

IMHO, anything you take out of the lock is a benefit.

2. allStatusFlags[index] is not touched for all numProcs.

I'd be surprised if the compiler couldn't defer that load on its own.

Better be sure of that, no?

regards,
Ranier Vilela

#17

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Ranier Vilela (#15)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 5/28/22 02:15, Ranier Vilela wrote:

Em sex., 27 de mai. de 2022 às 18:08, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> escreveu:

Hi,

On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:

On 5/27/22 02:11, Ranier Vilela wrote:

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17126.326108 17792.414234
10 82068.123383 82468.334836
50 73808.731404 74678.839428
80 73290.191713 73116.553986
90 67558.483043 68384.906949
100 65960.982801 66997.793777
200 62216.011998 62870.243385
300 62924.225658 62796.157548
400 62278.099704 63129.555135
500 63257.930870 62188.825044
600 61479.890611 61517.913967
700 61139.354053 61327.898847
800 60833.663791 61517.913967
900 61305.129642 61248.336593
1000 60990.918719 61041.670996

These results look much saner, but IMHO it also does not show any

clear

benefit of the patch. Or are you still claiming there is a benefit?

They don't look all that sane to me - isn't that way lower than one
would
expect?

Yes, quite disappointing.

Restricting both client and server to the same four cores, a
thermically challenged older laptop I have around I get 150k tps at
both 10
and 100 clients.

And you can share the benchmark details? Hardware, postgres and pgbench,
please?

Either way, I'd not expect to see any GetSnapshotData() scalability
effects to
show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just
not enough
concurrency.

This means that our customers will not see any connections scalability
with PG15, using the simplest hardware?

No. It means that on 4-core machine GetSnapshotData() is unlikely to be
a bottleneck, because you'll hit various other bottlenecks way earlier.

I personally doubt it even makes sense to worry about scaling to this
many connections on such tiny system too much.

The correct pieces of these changes seem very unlikely to affect
GetSnapshotData() performance meaningfully.

To improve something like GetSnapshotData() you first have to come
up with a
workload that shows it being a meaningful part of a profile. Unless
it is,
performance differences are going to just be due to various forms of
noise.

Actually in the profiles I got with perf, GetSnapShotData() didn't show up.

But that's exactly the point Andres is trying to make - if you don't see
GetSnapshotData() in the perf profile, why do you think optimizing it
will have any meaningful impact on throughput?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Ranier Vilela (#16)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 5/28/22 02:36, Ranier Vilela wrote:

Em sex., 27 de mai. de 2022 às 18:22, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> escreveu:

Hi,

On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:

Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
tomas.vondra@enterprisedb.com

<mailto:tomas.vondra@enterprisedb.com>> escreveu:

On 5/27/22 02:11, Ranier Vilela wrote:

...

Here the results with -T 60:

Might be a good idea to share your analysis / interpretation of the
results, not just the raw data. After all, the change is being

proposed

by you, so do you think this shows the change is beneficial?

I think so, but the expectation has diminished.
I expected that the more connections, the better the performance.
And for both patch and head, this doesn't happen in tests.
Performance degrades with a greater number of connections.

Your system has four CPUs. Once they're all busy, adding more
connections
won't improve performance. It'll just add more and more context
switching,
cache misses, and make the OS scheduler do more work.

conns tps head
10 82365.634750
50 74593.714180
80 69219.756038
90 67419.574189
100 66613.771701
Yes it is quite disappointing that with 100 connections, tps loses to 10
connections.

IMO that's entirely expected on a system with only 4 cores. Increasing
the number of connections inevitably means more overhead (you have to
track/manage more stuff). And at some point the backends start competing
for L2/L3 caches, context switches are not free either, etc. So once you
cross ~2-3x the number of cores, you should expect this.

This behavior is natural/inherent, it's unlikely to go away, and it's
one of the reasons why we recommend not to use too many connections. If
you try to maximize throughput, just don't do that. Or just use machine
with more cores.

GetSnapShowData() isn't a bottleneck?

I'd be surprised if it showed up in a profile on your machine with that
workload in any sort of meaningful way. The snapshot reuse logic
will always
work - because there are no writes - and thus the only work that
needs to be
done is to acquire the ProcArrayLock briefly. And because there is
only a
small number of cores, contention on the cacheline for that isn't a
problem.

Thanks for sharing this.

These results look much saner, but IMHO it also does not show

any clear

benefit of the patch. Or are you still claiming there is a benefit?

We agree that they are micro-optimizations. However, I think they

should be

considered micro-optimizations in inner loops, because all in

procarray.c is

a hotpath.

As explained earlier, I don't agree that they optimize anything - you're
making some of the scalability behaviour *worse*, if it's changed at
all.

The first objective, I believe, was achieved, with no performance
regression.
I agree, the gains are small, by the tests done.

There are no gains.

IMHO, I must disagree.

You don't have to, really. What you should do is showing results
demonstrating the claimed gains, and so far you have not done that.

I don't want to be rude, but so far you've shown results from a
benchmark testing fork(), due to only running 10 transactions per
client, and then results from a single run for each client count (which
doesn't really show any gains either, and is so noisy).

As mentioned GetSnapshotData() is not even in perf profile, so why would
the patch even make a difference?

You've also claimed it helps generating better code on older compilers,
but you've never supported that with any evidence.

Maybe there is an improvement - show us. Do a benchmark with more runs,
to average-out the noise. Calculate VAR/STDEV to show how variable the
results are. Use that to compare results and decide if there is an
improvement. Also, keep in mind binary layout matters [1]https://www.youtube.com/watch?v=r-TLSBdHe1A.

[1]: https://www.youtube.com/watch?v=r-TLSBdHe1A

But, IMHO, this is a good way, small gains turn into big gains in

the end,

when applied to all code.

Consider GetSnapShotData()
1. Most of the time the snapshot is not null, so:
if (snaphost == NULL), will fail most of the time.

With the patch:
if (snapshot->xip != NULL)
{
if (GetSnapshotDataReuse(snapshot))
return snapshot;
}

Most of the time the test is true and GetSnapshotDataReuse is not

called

for new
snapshots.
count, subcount and suboverflowed, will not be initialized, for all
snapshots.

But that's irrelevant. There's only a few "new" snapshots in the
life of a
connection. You're optimizing something irrelevant.

IMHO, when GetSnapShotData() is the bottleneck, all is relevant.

Maybe. Show us the difference.

2. If snapshot is taken during recoverys
The pgprocnos and ProcGlobal->subxidStates are not touched

unnecessarily.

That code isn't reached when in recovery?

Currently it is reached *even* when not in recovery.
With the patch, *only* is reached when in recovery.

3. Calling GetSnapshotDataReuse() without first acquiring

ProcArrayLock.

There's an agreement that this would be fine, for now.

There's no such agreement at all. It's not correct.

Ok, but there is a chance it will work correctly.

Either it's correct or not. Chance of being correct does not count.

Consider ComputeXidHorizons()
1. ProcGlobal->statusFlags is touched before the lock.

Hard to believe that'd have a measurable effect.

IMHO, anything you take out of the lock is a benefit.

Maybe. Show us the difference.

2. allStatusFlags[index] is not touched for all numProcs.

I'd be surprised if the compiler couldn't defer that load on its own.

Better be sure of that, no?

We rely on compilers doing this in about a million other places.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

[1]: https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462
https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462
[2]: /messages/by-id/5198715A.6070808@vmware.com
[3]: https://it-events.com/system/attachments/files/000/001/098/original/PostgreSQL_%D0%BC%D0%B0%D1%81%D1%88%D1%82%D0%B0%D0%B1%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5.pdf?1448975472
https://it-events.com/system/attachments/files/000/001/098/original/PostgreSQL_%D0%BC%D0%B0%D1%81%D1%88%D1%82%D0%B0%D0%B1%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5.pdf?1448975472

ranier.vf@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#17)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em sáb., 28 de mai. de 2022 às 09:00, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/28/22 02:15, Ranier Vilela wrote:

Em sex., 27 de mai. de 2022 às 18:08, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> escreveu:

Hi,

On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:

On 5/27/22 02:11, Ranier Vilela wrote:

./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres

pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: prepared
number of clients: 100
number of threads: 100
maximum number of tries: 1
duration: 60 s

conns tps head tps patched

1 17126.326108 17792.414234
10 82068.123383 82468.334836
50 73808.731404 74678.839428
80 73290.191713 73116.553986
90 67558.483043 68384.906949
100 65960.982801 66997.793777
200 62216.011998 62870.243385
300 62924.225658 62796.157548
400 62278.099704 63129.555135
500 63257.930870 62188.825044
600 61479.890611 61517.913967
700 61139.354053 61327.898847
800 60833.663791 61517.913967
900 61305.129642 61248.336593
1000 60990.918719 61041.670996

These results look much saner, but IMHO it also does not show any

clear

benefit of the patch. Or are you still claiming there is a benefit?

They don't look all that sane to me - isn't that way lower than one
would
expect?

Yes, quite disappointing.

Restricting both client and server to the same four cores, a
thermically challenged older laptop I have around I get 150k tps at
both 10
and 100 clients.

And you can share the benchmark details? Hardware, postgres and pgbench,
please?

Either way, I'd not expect to see any GetSnapshotData() scalability
effects to
show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just
not enough
concurrency.

This means that our customers will not see any connections scalability
with PG15, using the simplest hardware?

No. It means that on 4-core machine GetSnapshotData() is unlikely to be
a bottleneck, because you'll hit various other bottlenecks way earlier.

I personally doubt it even makes sense to worry about scaling to this
many connections on such tiny system too much.

The correct pieces of these changes seem very unlikely to affect
GetSnapshotData() performance meaningfully.

To improve something like GetSnapshotData() you first have to come
up with a
workload that shows it being a meaningful part of a profile. Unless
it is,
performance differences are going to just be due to various forms of
noise.

Actually in the profiles I got with perf, GetSnapShotData() didn't show

up.

But that's exactly the point Andres is trying to make - if you don't see
GetSnapshotData() in the perf profile, why do you think optimizing it
will have any meaningful impact on throughput?

You see, I've seen in several places that GetSnapShotData() is the
bottleneck in scaling connections.
One of them, if I remember correctly, was at an IBM in Russia.
Another statement occurs in [1]https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462[2]/messages/by-id/5198715A.6070808@vmware.com[3]https://it-events.com/system/attachments/files/000/001/098/original/PostgreSQL_%D0%BC%D0%B0%D1%81%D1%88%D1%82%D0%B0%D0%B1%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5.pdf?1448975472
Just because I don't have enough hardware to force GetSnapShotData()
doesn't mean optimizing it won't make a difference.
And even on my modest hardware, we've seen gains, small but consistent.
So IMHO everyone will benefit, including the small servers.

regards,
Ranier Vilela

Show quoted text

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Daniel Gustafsson

daniel@yesql.se

over 3 years ago

In reply to: Ranier Vilela (#19)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 28 May 2022, at 16:12, Ranier Vilela <ranier.vf@gmail.com> wrote:

Just because I don't have enough hardware to force GetSnapShotData() doesn't mean optimizing it won't make a difference.

Quoting Andres from upthread:

"To improve something like GetSnapshotData() you first have to come up with
a workload that shows it being a meaningful part of a profile. Unless it
is, performance differences are going to just be due to various forms of
noise."

If you think this is a worthwhile improvement, you need to figure out a way to
reliably test it in order to prove it.

--
Daniel Gustafsson https://vmware.com/

#21

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Ranier Vilela (#19)

5 attachment(s)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 5/28/22 16:12, Ranier Vilela wrote:

Em sáb., 28 de mai. de 2022 às 09:00, Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
escreveu:

On 5/28/22 02:15, Ranier Vilela wrote:

Em sex., 27 de mai. de 2022 às 18:08, Andres Freund

<andres@anarazel.de <mailto:andres@anarazel.de>

<mailto:andres@anarazel.de <mailto:andres@anarazel.de>>> escreveu:

Hi,

On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:
> On 5/27/22 02:11, Ranier Vilela wrote:
> > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U

postgres

> >
> > pgbench (15beta1)
> > transaction type: <builtin: select only>
> > scaling factor: 1
> > query mode: prepared
> > number of clients: 100
> > number of threads: 100
> > maximum number of tries: 1
> > duration: 60 s
> >
> > conns tps head tps patched
> >
> > 1 17126.326108 17792.414234
> > 10 82068.123383 82468.334836
> > 50 73808.731404 74678.839428
> > 80 73290.191713 73116.553986
> > 90 67558.483043 68384.906949
> > 100 65960.982801 66997.793777
> > 200 62216.011998 62870.243385
> > 300 62924.225658 62796.157548
> > 400 62278.099704 63129.555135
> > 500 63257.930870 62188.825044
> > 600 61479.890611 61517.913967
> > 700 61139.354053 61327.898847
> > 800 60833.663791 61517.913967
> > 900 61305.129642 61248.336593
> > 1000 60990.918719 61041.670996
> >
>
> These results look much saner, but IMHO it also does not

show any

clear
> benefit of the patch. Or are you still claiming there is a

benefit?

They don't look all that sane to me - isn't that way lower

than one

would
expect?

Yes, quite disappointing.

Restricting both client and server to the same four cores, a
thermically challenged older laptop I have around I get 150k

tps at

both 10
and 100 clients.

And you can share the benchmark details? Hardware, postgres and

pgbench,

please?

Either way, I'd not expect to see any GetSnapshotData()

scalability

effects to
show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just
not enough
concurrency.

This means that our customers will not see any connections scalability
with PG15, using the simplest hardware?

No. It means that on 4-core machine GetSnapshotData() is unlikely to be
a bottleneck, because you'll hit various other bottlenecks way earlier.

I personally doubt it even makes sense to worry about scaling to this
many connections on such tiny system too much.

The correct pieces of these changes seem very unlikely to affect
GetSnapshotData() performance meaningfully.

To improve something like GetSnapshotData() you first have to come
up with a
workload that shows it being a meaningful part of a profile.

Unless

it is,
performance differences are going to just be due to various

forms of

noise.

Actually in the profiles I got with perf, GetSnapShotData() didn't

show up.

But that's exactly the point Andres is trying to make - if you don't see
GetSnapshotData() in the perf profile, why do you think optimizing it
will have any meaningful impact on throughput?

You see, I've seen in several places that GetSnapShotData() is the
bottleneck in scaling connections.
One of them, if I remember correctly, was at an IBM in Russia.
Another statement occurs in [1][2][3]

No one is claiming GetSnapshotData() can't be a bottleneck on systems
with many cores. That's certainly possible, which is why e.g. Andres
spent a lot of time optimizing for that case.

But that's what we're arguing about. You're trying to convince us that
your patch will improve things, and you're supporting that by numbers
from a machine that is unlikely to be hitting this bottleneck.

Just because I don't have enough hardware to force GetSnapShotData()
doesn't mean optimizing it won't make a difference.

Well, the question is if it actually optimizes things. Maybe it does,
which would be great, but people in this thread (including me) seem to
be fairly skeptical about that claim, because the results are frankly
entirely unconvincing.

I doubt we'll just accept changes in such sensitive places without
results from a relevant machine. Maybe if there was a clear agreement
it's a win, but that's not the case here.

And even on my modest hardware, we've seen gains, small but consistent.
So IMHO everyone will benefit, including the small servers.

No, we haven't seen any convincing gains. I've tried to explain multiple
times that the results you've shared are not showing any clear
improvement, due to only having one run for each client count (which
means there's a lot of noise), impact of binary layout in different
builds, etc. You've ignored all of that, so instead of repeating myself,
I did a simple benchmark on my two machines:

1) i5-2500k / 4 cores and 8GB RAM (so similar to what you have)

2) 2x e5-2620v3 / 16/32 cores, 64GB RAM (so somewhat bigger)

and I tested 1, 2, 5, 10, 50, 100, ...., 1000 clients using the same
benchmark as you (pgbench -S -M prepared ... ). I did 10 client counts
for each client count, to calculate median which evens out the noise.
And for fun I tried this with gcc 9.3, 10.3 and 11.2. The script and
results from both machines are attached.

The results from xeon and gcc 11.2 look like this:

clients master patched diff
---------------------------------------
1 46460 44936 97%
2 87486 84746 97%
5 199102 192169 97%
10 344458 339403 99%
20 515257 512513 99%
30 528675 525467 99%
40 592761 594384 100%
50 694635 706193 102%
100 643950 655238 102%
200 690133 696815 101%
300 670403 677818 101%
400 678573 681387 100%
500 665349 678722 102%
600 666028 670915 101%
700 662316 662511 100%
800 647922 654745 101%
900 650274 654698 101%
1000 644482 649332 101%

Please, explain to me how this shows consistent measurable improvement?

The standard deviation is roughly 1.5% on average, and the difference is
well within that range. Even if there was a tiny improvement for the
high client counts, no one sane will run with that many clients, because
the throughput peaks at ~50 clients. So even if you gain 1% with 500
clients, it's still less than with 50 clients. If anything, this shows
regression for lower client counts.

FWIW this entirely ignores the question is this benchmark even hits the
bottleneck this patch aims to improve. Also, there's the question of
correctness, and I'd bet Andres is right getting snapshot without
holding ProcArrayLock is busted.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22

ranier.vf@gmail.com

over 3 years ago

In reply to: Tomas Vondra (#18)

1 attachment(s)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em sáb., 28 de mai. de 2022 às 09:35, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/28/22 02:36, Ranier Vilela wrote:

Em sex., 27 de mai. de 2022 às 18:22, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> escreveu:

Hi,

On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:

Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
tomas.vondra@enterprisedb.com

<mailto:tomas.vondra@enterprisedb.com>> escreveu:

On 5/27/22 02:11, Ranier Vilela wrote:

...

Here the results with -T 60:

Might be a good idea to share your analysis / interpretation of

the

results, not just the raw data. After all, the change is being

proposed

by you, so do you think this shows the change is beneficial?

I think so, but the expectation has diminished.
I expected that the more connections, the better the performance.
And for both patch and head, this doesn't happen in tests.
Performance degrades with a greater number of connections.

Your system has four CPUs. Once they're all busy, adding more
connections
won't improve performance. It'll just add more and more context
switching,
cache misses, and make the OS scheduler do more work.

conns tps head
10 82365.634750
50 74593.714180
80 69219.756038
90 67419.574189
100 66613.771701
Yes it is quite disappointing that with 100 connections, tps loses to 10
connections.

IMO that's entirely expected on a system with only 4 cores. Increasing
the number of connections inevitably means more overhead (you have to
track/manage more stuff). And at some point the backends start competing
for L2/L3 caches, context switches are not free either, etc. So once you
cross ~2-3x the number of cores, you should expect this.

This behavior is natural/inherent, it's unlikely to go away, and it's
one of the reasons why we recommend not to use too many connections. If
you try to maximize throughput, just don't do that. Or just use machine
with more cores.

GetSnapShowData() isn't a bottleneck?

I'd be surprised if it showed up in a profile on your machine with

that

workload in any sort of meaningful way. The snapshot reuse logic
will always
work - because there are no writes - and thus the only work that
needs to be
done is to acquire the ProcArrayLock briefly. And because there is
only a
small number of cores, contention on the cacheline for that isn't a
problem.

Thanks for sharing this.

These results look much saner, but IMHO it also does not show

any clear

benefit of the patch. Or are you still claiming there is a

benefit?

We agree that they are micro-optimizations. However, I think they

should be

considered micro-optimizations in inner loops, because all in

procarray.c is

a hotpath.

As explained earlier, I don't agree that they optimize anything -

you're

making some of the scalability behaviour *worse*, if it's changed at
all.

The first objective, I believe, was achieved, with no performance
regression.
I agree, the gains are small, by the tests done.

There are no gains.

IMHO, I must disagree.

You don't have to, really. What you should do is showing results
demonstrating the claimed gains, and so far you have not done that.

I don't want to be rude, but so far you've shown results from a
benchmark testing fork(), due to only running 10 transactions per
client, and then results from a single run for each client count (which
doesn't really show any gains either, and is so noisy).

As mentioned GetSnapshotData() is not even in perf profile, so why would
the patch even make a difference?

You've also claimed it helps generating better code on older compilers,
but you've never supported that with any evidence.

Maybe there is an improvement - show us. Do a benchmark with more runs,
to average-out the noise. Calculate VAR/STDEV to show how variable the
results are. Use that to compare results and decide if there is an
improvement. Also, keep in mind binary layout matters [1].

I redid the benchmark with a better machine:
Intel i7-10510U
RAM 8GB
SSD 512GB
Linux Ubuntu 64 bits

All files are attached, including the raw data of the results.
I did the calculations as requested.
But a quick average of the 10 benchmarks, done resulted in 10,000 tps more.
Not bad, for a simple patch, made entirely of micro-optimizations.

Results attached.

regards,
Ranier Vilela

Attachments:

procarray_bench.tar.xzapplication/x-xz; name=procarray_bench.tar.xzDownload

�7zXZ���F!t/�������]8�!�['"o�M�F��1�f�ja o
���+�W�HX��� ��Z�@x[O
d����J�h$!`>:�z�/�9��
Y��h��8D`����ahh���;�F
��#�#��`z_
"J|}�0���[Ce�I�	���
Sf�8�@�T�����-H9YH�b�fq%����e���0���,hLBr�iu�W���?&��8�Q�����5�������r��.y�?/�7�t ,D��/�iPa��������(���ep�pYY��9�*HZBy�!���@�,��8��������:���%���he,^��Q�$�4AD��l�g�����������&�vc��o�n="�[��M�a�nn�*�����\����`�H_�U�JSgT��mn��Z$X
=.��t&�,	������Ng����3��F4h�)�|T��y/����w;U�>�������Gv��>��x��B���#�H�,t;�<�����Z����km�qeM,�����AP��[a<�z�;I3EMs������.-.�Vbc�Wa7[��j�T�u�
_���F�Xf���iLC����j"R��G���L{�i%�r���r���2����C��O�������X>C�
:�z�Z��������I�!���@�{�jo��>R�l$��aS��LQ�e��e�M��u�j���3\j�
�It\�x�
��B}�!���Xub��O��Dy�6�T'S�����������b�A���t�+&R��C��9��O#2��%/��5PyK���"�u�Q�y�4��"(yv�UZzt&c�:��R��7_��o�$���S�*%C����8��6�'Sj��e�p������	��>�XA��1
f
��+YW�������
T�)`���z�u��n~��f@���C^��M����X��`�%��f\vj��t�����q��k�~�S�f}�>i��b����n���j%��h�^(s��F�����p�q�O�0=y���@
���I��&qE^>N[���A���q^�q����n�iZ��cEI��V;��hNA�%�T���{����3rQJ7��w�B��2WF@��|�nT�P~��H�V�H:��&t�#��W��X�6i'D��e>��1��Q�Kj(VBu��?���U�e�����H����w����\��*��%�t>�E��Y:_���H`�9�R�,��"�t��.8�uRY�,�)Lq��bc���[����x�_���%�=^#��LT1@���&�d4b3���Q|��O5�:��_�eN��2��\���Q���61J��i���c�8���������C%1{���V?�G�`�v�������6�m�dL��[��d�&+�c�\�uiW�!�&���sB��0����9]w�	�%v���!���bDe�����+o��w���Z���>�w3���'�_:AT��/R;�lm�eY2r��B�����=q�Z�\��$���8kGzz��wb^���R�A_����N.3�a�xu=�*b��(�Y��������r���e+��O�;*my�BCy|!�����
����2���
����W��X���� �X5��������
�:�g��6�B�

BJS#2A���B��t���N9960�ve�c����g�1��#5?���
�g����&�6D%���� �F�sR��c�m���[��[�5����o��4V��_y��Q��"d2H�1�/�fQ/��)5���MJ��8��������]k���4W�0� �`���+�k�7'�p�}��~<x�����d�y���w���!��`���2��a��K�V���HqG�Oa��yvk,R�J�%R�s]�"4���1�%:,��0)Q�0�tg
�a�}�n<�BZWk����<>r���G��G�a=z�W���9�^ul��t�sl�Z������4����_Q�(l��
s�x��j��F6�g�2����;��)��a�n���IcU������B��^���U����0�Y�V	@U�n��V��R8A��x���K���QJ�4�~�O��8'������5����-���fb�/�����mL������?lI��%-�#�u'�j,�<0���rs�t���Ur�	zR�dL����I�����K�i����(;�O*������zpz���&�$������]P�{���������i
B}����E�w�1.�z�I���g����,�c��x��k��bpz��LR~�����4��?��	����D�H�U>���������E�xu��L�+?U�=��>v�����&���������aD��x�V��}�
�����>���%A��,����0��c�JI���60^��JZ�h���2������]�-����6Y���+�@'����k�	�C������������*�q�E��2���v��������I��o*��0@��s���X�W���[��t���]���g�D�O)�VP���!�UFR�bX���QW&@���4MbQ�`	6b�&!��Z��X����������[����+*�����8�������B���P��>6���5�=��E!#�mn�����#��!��K������O����U������
Yw���~9�����BQ�~a��B����}��hi�Z���h���1+�!����V�d������;���=��c��7IUi"]���m�`�k��E���;����9�$n5r�a�<_[}�RY��v�J�������|�BCMVD5R+�S������������g����Vz�(sz`��v�=
n��&��
]����Eu��?���s��uW�z7nv
/�� ��/��}q[�+@�����%�i�k6�yF�xIe�W���X�Zb���>�z`=:t��T�P��?!������w����L�=���D�fGk~s<�p��x��$�?������'��+�t�6v1�*�+n��C^���
�9���+�n��O)��#���������P�1*7���TT$�=7_q�Ma
�$/*����hF�7�����sNDX6�t����g���#B��xs�^-Q
rSjp�0��>%�?����2f�b����>��������oH<��6V}_����I���=<�3�����NV7�%fS-��vQ�����������G��,��$$Fz���42��`$�'x��J(I<��w�Pbt���Cd��hz�o����J��F#����'7\��Z��dr���86_�4�F��s>�6��\p8�����l]PD�Q���������P����%*���d��k�l1c?cP��8����*��!\I%�#�[�z��]�����S�'���y�	p�
����Gji���Y�r�^�R�J�-q1��,�4�"�%�&t�����T����B{��.�I�&"�����q��z`Uh��Sc��yU33��P��"zb�/����{]�{�O�����%�eM�|E*���T����.gM���o�d��`���<{��>~���k�V<�N�>��r�F*���f�vj���V���l=��'��E='�A�jn8����6�7��z��>�������P�^S���]�#p'�C�k�,���]��]�e���Y�$�����-�p(����w7$<|�o!���W�����R�3��E�g�$<���
9�D���0���@����h�v��/�g������sE�:�kw��4�	�|:XM�_�M��N���O��+��TV�����DO�a� ���P~&�FL�_�G@�D�$�z��Db�(��iO��Y�$����+�If�r�:UL
���9�K+������O�|�Y:�K�T+���W��_)���~S�������RQK��i���OiT	yk��q�����*��H���r�������/�I��R��Mv�vvF��������=������Zx��1y�
�@�M�_��Jq��PI��C���>V�2'LC���^���+�S��5-���/O�G��"�sUJ�|\�Ky��z<��@�E��Q5N�VP���I{0�uS��W�Mp����1lZ0�+��
b��N�@91O���~��L4��c�Y�C\5�^BykB���(����{C��r��>RC��k.�Oc�=�N�{8.;��:Z�Dd�/���w���	��VJ����A)�idi��~	���o�����,T���Hl�6j��CZn����R[(�p:�u@����U�<��]{������Z���4��U�]���1���h�����l��7:���^������dG�%����{����V�����nY����*�|����n>*�pxZrDR����<�^}r��p���������#�����������X��6�"�@�B1K��66�k��>J�S�����	��f�-���,�;�3�S�[�	eg�]lHRU���O��,�[����Ny��M�=	�L���_W���qG)5�Y��b(�:�,�eXW�6��I�,�d-��
���#�v	���/V���>eE�@%���8��<���0|�)�����)m	f���POm-���S�BRX��������\W_>�����D�"�(2=:����6Jb���7��IN����{����p����z0���pdQ�
�����t�Q������d=���I��F =���8�jb��F��g\���1�����#Sy\
�*�����-4D@y�C9�����>6L��6!/�R��#����������7��~]HF��wz���z�����xi�����������K�����Qvz��\P04�_��^�4�����M�Qi2������4������d`\�:�D�I�q�9��MA�x(O����������7|T�XU�,2�<5C������G�:�T^�Qk�������5���[R�����l\�]������b���Q�.F�������:.e��v��^�0���!&�b���k�m ���|N!�f�=�O���������FE������n�S{�mj��'�&h���TO��B���x��p!J��1m?�C��)��X�kAV]���<����W��x�z���a���*��	P�0������M>����VK�[��8xx
I�|Y*�����.���4�
��BFLQ4�um��i�x�q�7�&���N�&���"�87��� en$��*XE�"�:������v��
Q7������"�Cr"��b�6�da[�U������_�{���6�T���%��v�x���&�����N��1���a�qh���S�cWW�0�&Tm�z�K������7]��b?�u|�NB�4���R1U���M�i��-q������������Y;(�r�z��u�K�{���
uF638r�Re��S	m�R<������[h�����b2��9$������+�c�N7R
%���G��K����o���h�]!#��cg��~MY�f�,v�F���)30�a�8���q\~[3��_����=Y�B22e.&���Z1_+r��S�e}�D�'���i[+����63_�:���?Uu;aL�!�TX���>|~�x,)�l���S�)1�Q�P'AR�V�l��M�z}��S��^?#��r@�f�[x�!��\)#%I�<f��0��4��������}tD�]bR�A*���l�8��c�o�eq��k���-�~).uc�o�vD_��XA���<X��3���]�/(��/�k��;L|-�W�g[
g��9!��q@��gI0���'�^��es��CF�	�9�\o�����$I0�~i��i>\��z)�#����c�7�8��X3�b��������g����h0I����3
^c�Z�_���up���4(��Y��z�x=���^���u_zli!y�������u�
}
��8"�[zI�U_����<�+��iW��_��	2�?>�S���UZ[ZK���4�� l�����90F{6w9�����*�y��Z��y���j�j�`�����t���f5c����L�4��ce)*R��VT������p���k� $^<p}����!K�Jn��?Bg��6�����
��<�4E6X�I��WsJnl$�:M�#~����yy�?R��aN���M���iIE�'6��?��gLg|������>����9zT���R�y����'������w�c��Me����_�J���j�v���8���S:pPq�#;�7���'��S�E��l�q����L��+��z���Eg
i"�x�!#���$`��;KU�I�����d~*R�".>�>1����S<134m��D0��'�*1��~Zf��N������+�0?Kq���v����yh1mK�+l��t?����L��V~����J����(2�*�����d,�<Dr���NH�$��+��\���y� ���)*���4��9I���5<��*�IFH����XYb�0��������l���v){�"��G�k�b�t�B�TBSI�-�?������(�{�P6!WA4�~R���uizN��:�307,�>HF!^d
{2�)�~>������m�5<�u�{ni��k�����i�B���^}Q�V�9�����-Z�'��VT
k�mr�5?�5���L�<��k���U����7n�U����������!��~��[�o����`�ej����-D���B�!�puS@&4�i�v.����}��~�Z�mIQ�|�=��dD#D�|��6�]3
��FB�S����]E��=�3T��dc"��N���M;�g!�^
����`pgT�C�K�[������@@�T�p]zd��a��$��F��B��f�W.DZ� �w�L'�=�m�h���Z}���T)����.\
{2d���g�74��y�+$_Y>�C�A��*<Q |����^�����5#��8E"/7A�����P/QI�������Y�Y/4����P�����
�~��-��9�i{Z(c����A��m��?�N>(
x�m;�sM����0{1��� LxF�2�\�!}Y������N���l�j%Y�E��y�j?sl)lc8G/�O7�=�����F��c��T���\������.����rw��o6��`u�Rz�IAC�k	�|
���<=��R�
��QR`���.����(
x�=��F�����$6�I������Z��Tf�e�f���%m�=ze��n�����?wo����t���r���f��}3��*$������'e<1F�#OB�J~�#=W�N�?�n�0:tLlz��73�GY�����e�k"�
,c�Fr*�y��Ts�
����x3�/��T��=<�*l@�G�f-�Q[��0�H
����~��G����X�C��j�9������`w��70��������H��)���~>������J��A&GyDs��c�����I������8�������HB�@���������fg��O���<�>>�?�`�"���s�^���:X
�h
>,�(�:D!������j���@6�P�u��b����OP��w�Z1��M������`����c��w�?��zf�UA�E�	d-���A�D)��~1T��CC����������NL��K�qNl��������.t#>�J�����y��uRH�8d?&n1a����Q���>'�vr`�{'�+�=%��D:a�����8�~���N��d1�����i���7�rm��u��������\�\N=>.��kpj��N������|E�"s�\4������?)����8�k`��ZX�	x���$;�J`a���2���[26�w:������:��:�8��p�z� �������=t�,���o?�I#�~����E+/�>R�cs�%%�=j�}��
y
�d�6�M��)��hW-xi�����r���^��7�;)�����u��m���N�l�K�c����s}�9��T6!��,y����E��%(<�j�Pj���nV�^�S�������"?��o^����������S��`e�h������V��~Hr��)g��Q5P�F?}������~��[���z�Qo7��g�
\uFD�a����^���C��c����X@p�&\�Q@�9�g%�Jc���5�W_��fc��������I?��iA�����8�\��9j�e�hQ��1�N����_�$�F���v����:	"�<�89����d��}���@�lS2 8e���0��AX�����xs�#$������Il��uM;]���j�K%�W�_`��zp���l�I������� DS�H����O��f��}JY'w���HM������L{��v�9��J�}��
�pY�%��A�[d�B���I�����6�-0Br8+�Y��[-~Sl{_����)��:I�x���;�
4Q�7����.��	3�+�IwA��;�/��_�R3k����gV��]��k�����K����s*: ��QQR����vP`G�,�2�Q���C�������9� �/�{�\Bgh��L�z]j��-��%_v!mo!�~�l�&e��/��E^�������i�������n��C]o�������6�����n�k��r��\fX�Kg�d;�f_@+�}.�������(��[����R�+��B`�gc��t��(7pp6{P���Z��%t��6���t�Dg����w�{����d��X-Nl�j��Fq�r	�o���9��M4��6�	���y
�f��@�L����0�M5=s�((��CjL]�s_�
���'�
�>MG�8����J�������%�����B�����
	/t�)P0ltpu�v�������<�����%�(��z���Y�x=�M,[����BVQQ�E���.����TfHe�o��*��_��~�����'b�������2����9�e��=��(V?�U�PxL��*��R�����O�K^/
���������{5,�6��	�F��=��:��pkp�c���"�B������o���#����
v�@$a5�av]x6�
U��y~�k�"l�~����Cl���f�~�g����^���]�=���u'#g�<�KWA�^VQ��,f-��G�X���5
�ITs�S�k3�E�a�������on���%�����z�'W��0@�R:N^��"#U�'#��^_��=-U�O����v�,t�l��Z��lf^@
9�"2�2��B������N��P�GL�`�%���������Kj��������1H���VU"�����Y�����k�1����6U���q=�!��=
��Jg��U��������Q������(#����0��������a��~�k�������nD������c\/�-��V���0�II��-��wZ�r�������6?�`�c�	�SE��/K�-+�����(�/`b��y�yO�����cZ��QsFqb�����L)��T��]���]����b��~���8�e�M�a=��e`��R��11���5�|,Q
�<\1���o�A`4���-�a������5��$�[/s
�������w=��[��:�+��c��)��+���������x��Xl��YJe�P�;
:w/�����uW��R���9l,���]���4$Rwu;
EDK�5~'�����a�|1��"den����?���2�>cW�s�������������%e]��tXY�d���3�+=�P�L�����l�N5ZX!��_#�u�|"���~3�a9���b�x�#{=�?H[h��s�y��n4-w�#���
��,J�.�@��f��Ls��M�61�l8�1����1��=��@�P��]�B���~�Q��l�����dlX�����o���>r��M_���jF�\��+b����$��~T2���Mn;#��O��]9�UC
���=�Z���������M� ��b#}&y����*��,[�+Nd:���,i�$#�E�z��?�'7Y���A�R$��;b�w)l���)��������f/����zA��o��c	E�����Zu/���KtZ���?�O��r�L��I�������q;[�Q	P�����Y�w���L���D�c�Tq�2���~���+{(jWjr��p#jB���QW	��k�$-OR���g�_�=��F�.]����YJ��p�������,��a�F*���l3e����q����gHrb��G*�z/���,GYW�_���q��,�p�J�/�.���x�g��X������oN@7����(�-�T��''����3N�C��NUf�[�������#j�w������0�br��%�\$���\�R'[��?�^���:��{��?��lu�&�
RAp�H/�I�����kz���	
����e�y�������O�DY~�On������PR �6'�s��R�k�q���C������������;���h���G�Z���������w�%
��av;!�d7����[kAc����vE�kXP�fz�bsm������p����D/T���x�V����z4��F[��s�~�Z���e����z�`a���V$�
���UKiEN��?f���~p;y����K��6�!g�,�%o��7����t��\95�}���?�7=��8�Q^BU	Au�n���v=����Le��f�����K �^�37b���*�f�U��[�n��u`�	�T����	g���)N,�O�Z���cB���{��P���b���7�[�E2���N���b�M�������-����KM|a� m4P�8�C�����N�|M
�Dy����_m���c)���������r*���N�V��M:$X���U�d�(%��3�0�
?��q��%�j�����aJM~�"�`@�wE��V���Y������
^�
�[��&SD��Sf{0f@:��5�9��*�$��7�?�7��mYbV�������Q������7Y�A�#xs��,���=$k����'��t���N���,^?.y}6�sA�&���|��y�4�	t�:�)���OWt���iz�*���W�M���������pFSX�y1J�o�i����E�z��P
V���$i���-u�$�J�b���?(��\g���r=M�>���g�
eyr��F=�y����ch�M�(
�����m�w�3��Sd�1)��t�z}7����J�(k�:��s"��K�������s3���D�3��R�T�3�������}����B�A�T��%�bP��	���3�Il���mOU��8V����v�\����,��e�3�Q���7�C�I#�%���<j������@=����T2	���rF����zv/��{=�_{���T�(�X�P�5vH,���)V���+���3]#���4��n'�����r<�5��e*�6�>h���v&�|�X]��>����1���������r���/E�;Tq������o�^������.J��/�����[��44��*}$�	&E��ih��S��B�����(jn]'|���g��]�!���Xu(����ONX���b�+���%�h��\*b}i=��I
���\�s0��N�v��k���;F�2����sU ����j�6R5��?��c�G����M��gm~�?/����I�� �u��`�����x�>%���O��f���W�z
^�.PS/w��o��{��'��d<��'�%n��"��H������2�����o�����s�>4OU�-�0]�Cq;yD#���x��/��R�������$��5U�|�*��`�a�>�I
�F�������y�jl5nU}m�.����(��{�����r����7��/5�����{]��t�h����R#������ @%9I	�&[u�R�~}y? �q���$"�����Fh����2',�T)���ao���A��@H�v�e ����
]	�Y�Cq�2e)d������}������@Hy��~r��s��,v�������Kd�6�q��dyKV�Zwf2�{��i?��F1W&�1����y���t�z��k��F�"�9npL=�P�p�g�dC����@�b'���S
]�[x\����?�}y����J[^��!q� ���P"�^rJ���Y������lM���������|f��0)J��)�r����H������^�z����'����}�������O������m��9ufVHC:� E�}����v�y����XYs�����U�����~���������S�=�b�JudQ�u����id���;���8���-�r�������K���8Ge�\�\m�K|.���\�4x�g=��Ds�`������@�vq��^�Z|[��P���pI19�8l��;�kGm�S�G�H�j�|�X��^|���3�N�X���=`�ts�m���xi)oHg}/�[�R2�Zp��~�zK�b�����������[C��fFm��s��@��m�)"hb�+���x�Sb��FEJy	_�F����K�����&Vr���27 �OX^�=~��Q��4�8AsL�}�j<�*�|0�����uY��Z��@��A%�T��x�'�c��E	[:�'�=��n��\�������rI���1��A�ElW@�y~#�%z�)Q��
	����xp�j�p����p��n$�7M�F�-�=m�'������Vac�$�b�G����O� �'I����&r�FU����m�^�$b�����H&;��Y���,4���������}G3}�F*0<�B����VtI��g9����R��M�����Z�c��������0��Nj& �O����&V1�������\�-�L%������M���!�P&�	@�N�.���P|_b-�!]�O���'zo��������2�R_�@@�#�h��� ��m7^O
��
v�g�m��r�h��+"�~�UCj�.aae�
��%�Y���	�n��R��+���n�o�������=��)U}C�o�mK��N�C~��n��J^<Z����M���k��L�14zYCIj;�A��"&�L�r���Jb\��s�y% ���|���o��y	.����;%�O#��dr*:y��0��J�Z�D)��9E�� /��-�t�����b�6�]��n��w1�����'� �B'&��A�<��M�����_b�-@��G��I��]��yV���	�2����RE�_�0!�B���8���%3���(�*��5�R��z�k����<5����������L)���RW�x�����������fZ�&��;�IV��Z���q����o��b�u�c��%�c���~����o2E(��!}FV5�X�Y��ctuo(Z�)�:��3�����2���x>���,#������y����%�+�&2�}N�;�C�$i���R1'A?�S��$(��:�HP}�(E��W�g;��o�"� �Z����:U�������(T&�]I����UKA�G��Q� �e1��$)"a�(�k�\;}�
Z�%upH$��z���*�%�$�d>aIk��l�4������[m��=g"�����UE����Q5�1��V�!��1�S���(�������_�.���X�O#S�u������������.hd�u]%�:��:GK�x��\�/��
q��d��+���3�i����/,��1L1����^$f!�N.PKs�I�!3�T&���G������}��!��5��j�c����L��|���&�������������	��a�����$����5_rQ�����|��Z<�Db ��X�GA�%p�}$)����v��|�`:�eS	d�z+h���6�[EFc
~�,~W�S�w>[����_�$@�B%�{b��HRJg�Zj���[��a�;Z-���4N����4C��[���_���[`����9��Y�OV���(d����'sQfQ��n:bV��Bau�QL����ao_�F9Q�$�:�*��(C
��J���-���:u"kq���3t��pX��I���G�+���V2�
��\�������m������-�����8y������&m@���4�a>������n����A��k�K���qB�|����
���p��"/=�E���;ihF�=)^[���[�,��r^#��U�������*��[P����!9�AU��^�y�[����N�7hb�e�N��c����6��"�	�XN\0,@��~�n$D�������K��tYd��w�=E�����CwsY���|���9d;6��#�xp���_`�0�"c���
���d

��tX�#�!Ns��P���QC�
G�[�Q-�
9��V�o-�NHl}���p!j
���;��\���7�N������y���K�	=�����r�pf�^(�fT���r��]��ki��C���J &0p��r�8�A8W���*hI�������WjWW��"�����sX����
O
�p�iWb�
���add����� ��<[��X�|�
�u�l*��u�������8����u���hQ���_�e�����X�_��t�����l0�����j�*��]F��^���M��c_��%o�8�*���
�G�Z����5KZ���ZE�����D�o$M���nu�0�����*�i�jN�����
���
'�etE�Hr^tpJ���K+�!� "�`p-��(}'��O��*X�����Za^�1sbZx'�A�nh��+� B�03�+�#��*����Y�G+�����~7D�E�D���k%����������������}\������M��~�F4'�BFk4G�*U�������*������l���i����C�!cu��p�&`id���e���s���M��p�f������}�5�T?y?PH6����D���C*�h+��I|�����X�f����FP����6�W��[������}�\��!�.��/e%AF���ST/��0Al/�8�'� EGhN�Htb��F��E
6n�\��	g2D5�[��>T��A��J]p���������R���4����:���j]w�|�Gw�d�{h
�����;�3/���lm���^4�LrM�������Xs!������w�4�V�I��zDnN�z�,s5k(
t�	R�Gc,\�1�����|�d�rzt�^�n��_wa��#z���Mi�j��{��[��4}�������
�[uAM����`\d�L}��(��z5��6Yo�u���kp��E���-�Vd��Re�S�zC��<:�����=?I��b�$j,X�F(����K�/}��I�E �C*�a��V����N	�c��K��"���Qv��d��t�A ��=�\CF���Di_��=��.�>������RY�2{��Z�K
��-EJ��)��H����<F�$�q��i7�V��M~�K���U�������%�3��@�G�qw��}"�<~��C} �o��~V%�s���Pm�r	4������Fb�q}QFd��V��\
<���z��iMq�wSTN�[��~���-L��o(���P������5�2\���
z���q�U�����3e��GND����%��W�
F��U��u^c�:��X^����N<ng�_�1!ehq %�z��v�������
�m����Z�q���Y�����XD�|��4��PO�I��Y��PU����o����p���w���N�u�|��J^�F|�MV�]Ng�A��3w��%�0;DoaQ>Y$0���w(�#�<N�C���p����:��b�e2i0����29����ty���\��0�����>���O���@B�����h�m��H������U�<�H������/��s��Qw���x���nw�>3���o��'4t�V��QRvR�6r�PP����
�V,�\���C�����XKd@��^M�)�%���g8�-�J��5�|E���1��.W��ou����i(��O)?%���1J�������
��p1)3~� ��|I���,��"�&��Ku�CX<Q'��ZQ��"(�iY}z2�9l���wG�3�
����."z�d+
�����I�HV����1��c����^I�R��gD�@K����&"�w��L�sQb�$�3]��2k;�b��
))�aW���QZ�%�F�uE�pO|�WT+�|����J{Kh6&�2�q3�2�t�W���~@1�*���]�e)��T�2�Sf=�����)�dfq�I�/�1����~��b\�f��G�C:��z�a:r1m����W�<B�����0m"v�o���4t]�S�Y��D
�y7?�y��&�������t����:�c��g��m����c�5��#�;���@�Q*� )��1��K��Y�#6a�T|'\�R���0���4����M��Jyp��sjv��/P�5��.}w��A����us_:@��_9*���#�
���������f�m-p6����&�i�cK�v:��0uW�o&���t��?��p�������e��(����K����=��XV��m����+K���A��L��xE�Y/t�3��^>���N�+<*�,��m���J��=���T�X"Z�o�\k��AO������V��)I(Ay�|�_JH~G�.Wmv���Q����6�����=�����+>}i�����	I��S�
K#���GS������������C����ZU��

h?�\�Bi:�1	2�7gnA{TI����W�q�U(co��EL],:0��d��/N-X�?{�w���e�M��	��>�X�M������t���V\S���9�����$�W{�_�K��]�l"����
��T=�_���p`�k�J�wBJL�L���z�3kC���S���W���=O:�d=I3<6�@I
�W������<6q��,�#�-tB������rPA
_jI�'�d�r����<�����0=�DO���7z��`z��l<1M"Q��Rl���@J��e�ZP��`��K�U`"�r��L���{����ZI�����~,�nY9�F��k������S3�6��>�*�����1� �X��b!��-�B�G��:�
s��s�pO\Q"RR(VP���V�]{Y��[`�CE���9y+^j���{JJ��$�?����I������|�i]��������z�N`0�
M@�\k����|����3a��S�����q'������[^6��\�
y�1T;�rb�����T�A~`/�E6�����X�q���,��������S�M}X��$���S=����6����&o_����*(ic��������O/&3��S�,����p��YIH�
�_�W?�J��������r+�8��������5JG��H$g���x�_��X7��:��%1�������������J���Rj4S����#.H�o��@_����n�~p
�7�QM���%�G��o-/�?;� ���\ y�yN��Cv5�7E����*�?�D��1�am�A�W!5�e�/������Z�(X,�����c����J.�v09Y�/��A���aK\^��_:��1���Z�g�"e�So�su����_�b_L����3i�L��"���N"����t87�x�l��
k�>��5���;������,���w������}�Z3�����i��������$��-�_��:��q��|�^
����	�\V��}�����5=*m�q\�����+6W�0�k��y�� �5K���H�z���.c�d�.��83@����"�$��Po!��`YVV1�7`�'��@t�G�����r@�5S�a���+"��LW�/��?��|8��X����R��L��tm���6���C
���`���8{)����+�-��L���u�{�H��S{�Y(���NT����_���;}�J?��'��n����N ���`����kx�obW�:��Ei��
�#"��v��v}��Y8�Ud���?�`�{a���-�Z��)����y���)������u�9����{'?��K~���dh�E�QF%�(��Bo9'.&R���I��RM>oZl�Cy�W���&��g�5�{	�����3�P��� I��A�5D �����T���u��N]c]�y+`�/�*(+b#X��}���D�G�v�I�TNh%��W���B���O����=���Rf��GS`����M��xc��n�!W�k����z�aX���]j��nG��p��  e~u������=���H���Db�X��6���+������afC����{�O���xR��r��lm���,��y�+HQZW��3��t5U[f�$�- do����g����]^r�@'�Tz��i?N>�Q2���f���F	�a���X�j�O���$DO�}a/U*���C�l-�_Z3/�a�U�p���G�����O��-�y�Q/sb���pPL��*c�`|��0�>b���j����Z���:��j��?d��4���&;��Q��I���U�2�8.�����o�d���|]F,�zcW7n����@�ElZ�L��:�����_�P��~i�W.�}��A��F�b���4#���;5�x�O��2�"8ur�+�3-����`~}���^��3�M��[���/�p�gk�,Q���'���D��	��xb��)_I�����y]"��������w��NK1��5~���c
b%�E��CV\	�{�d�%�l�>t�@n�N!��������$���LL����LK���`��{������9R~|���&L��m�7[	o=:6��D��������@��Ov�}G���'���+V�hcZ3���F[���I�#���IR�Ppa��o�)$�-89�Q!I�����{Q�f��SN������h����X��
���f�E
�V7%�
	�����ca��cL%��f.A��g[�����+�K��.5��������](�����L�����-S!���y�*���z�ju�0(���-�E>�	_��\�	��@��/�K,������b���o�t^��>�8���6��
H���Tz1�
�N�����A�����]�gX%���p>���QE%�7��Xq.����K�(����
�\C�$FU�����n�yA7F�� Y`�.��mB9C��&BrO�+�H��Hf�102���U�*�����Wf���QCR��xTE��:�?d�/g9�����h���6MIuZ�$!RdH
�E��[�G=��'�&u��|R�!�_���'y85�h��w��4��W�����	�81��E9K���$%<��
��{��9�����r�w��YDIH���D�+�1d�{��s%����\]�MC�QM��-�y�)����c���G�0�QN��I��^&�R�����PI���o�c�(��3����AS��>�P�f�	��e��T��R����n����%�F�g%).b����c�)�<��m����^
��z�9����br�7a`W�l����6�}.3�NB���������������}����m�^����yl�����K��9|{�����T�h$H�b��"?�h����C]0�/�@w����K�!=�������v
��$��x�Bd��(��#?�l���.�9��9L4a�H���U��Cs�	*5�����('�����O����gp���j��}{[muxP��!}�a�����rq.`J��
%L��R=
I�`Gz��
��}���}���lF��bV�V|����|�B�0b���g��_�������$�JK�5D�5�����W*_�������4�|����W�z���N`_}*���j��Ch]�#:�c�����#������_G1^�6�������Z���I/��:��Ucpp���?�R��p��y�RWT��b0HNK. ��`K���YAhdb���x(����}��[�/����qZn��F�W����������f�rr�J��a�n[���l�H��7�����d~Dc�*9-<��6����9��[���&�q]&��*�nS5Qz��-��W�Lz�,u�������[������!eh'�xE�S����(����<��+S+���:�+R�~��~ZE����0�H�W���}6`��X�/�E]��XtK�(yW$�`xK��j�l��"�qkS>Ea�MO��l��\��h�,5�?	��s��"N����J=B�}/Kc�N�JVQy��[$gx&���*��n������z�t���fK�pu�>
���n[���~�~�4��
z�;!m5�3�
�'��m��U(���LI����2�C�qxy-����T��VN-�)�p�Gj�1���G�����R��c�{D�u���x�6��	�:���]oFe����kj/�!��D����J����m�������C����?yC9
X�o������t�r"G� W�����@�Uz�����������&i���{��f@�6~�7�-NR��h	(�^��GrUH��!(�\$���GB��l\�����WJ0(\��
l���.�������[�+d(�s���XD��'����q3x'*��������iTP>F"���n`YX�~���b>;]��s���7wG��"K�D8C�.�Y�$���)>L��*��C1�x�����
c���v�����(!d��^�����C�������e��o���<g�^;w���VX*7�����2�4�R���\&��tu>��{�Kn�:~�����kxe��wk���]$^���-�����P75����K������@	�m�����P�>`�3���L�W�~����|��@�`�.�w�BYE����iI��$�������}��t��z�.��3�;�l�9-c�T��6uj��
	����*�J73eD)'��Pe��Gr�'���87
O������WP�3R�������mW\P�%;�Z����*��H�Do�,���+b�4��Wjb���@o�~�&���N{�(���v*i����U��r��������%oq����;������D�y��O��)������-�=K{S���T�����$����	W�����8�C9����]�n�;�tu~%�6MB��Q��-�����]%<�Cu����nD�o�D1�p����*]�:�]�W�����65��2�0���6\�k�
l�����@d�'�-�H]������~��}�w�B���Zs[����A�����V��� 9E\���uK%+}�3[� �N�mg�]:wu�7�Z��D�X�qO�����' "$�P���`2��X	lE�����tF�^������X�������������eW�7{��B�&��]b���-O�QQyq��/�G;�����%�_x����@h�H���%v����Q!�oa%������{�3dRD�S�2}.��-�B������w��$+�
�S�q�Kq�0����-aTPJ�(�N��K��9��~_���5�����*�?��s|��0��t���;��\N���r������1SP;$��E�\�~G��er�F[=��[I�Yb��������*i�-�'*,��$9Ut�o�v�B����e:Oc��~��-^$��1o'v�8p��!�qI}�@���'�$���<&q����u��r�vYvJ�et�3�8"\�z���P��@�����d��)����mJC=�F�Q����7�Ps��D�Z�=�b*�E]�d�w�\�$����O��4�;l�>�6���~�{-��=��O
����X���fB�X�$�[��]�%'�*�PVm~�!j�{�T���W�A;��^ly	z�"��`;gE�e;�m�`�s�y�������2]l�SB�3���p�G��������!*�Y`�lO.7���*�����A��tp��;��PTQ`��|�Qa�N��B����!LRL�o}� �O���i�������r�
�G����FI)��$7.���#����F�;|[�$u��]b�!;FU0��_����C�tu8A�q�=_Jj� �h� +J�>>����Ka�"%����K�K�����4Pn� �@G�����Rt�6��s��bY
Zn��������E�S2��t�xaF50���H��)Q���������,juO1rN<� �H���L��+�����W5���^W�EP������5HK���G*,��NC%5�H8->S���� Om��hK�]�cA�B�N�z���m6b��b&�\�n x����u�/y`�{,�&zh�y���Q��	�����>�7���r=� ����5�����R<9�����L�R:�%$���Yu��d!T7��+�����WG4���\�o6^_[�L����9�����C1�����3�!�b���mc(��N��%���|�A��������K�_��/q)W���=�������	�b����z@v>������x���}�Z�W��~Lqm%�1g�=��4���kY�O� 2���&��O9X
H�z��c�����Q�D�.O��3�Yae�A���n@����'�>_��72�����V2S���IF�7��_�!`�k�C7F<r��3����?K��cD��>��b��b�����RtU(�<�}ya��{��B��S�<T��
�8�
eL�`>�;�8�"�Z}B��aglyg�})7��%��"��|�BW\}�n58w�R�XC&s9y��TIRDl���f�����6�IQ$U,�G�l��3S�������|M�'Z�t	��.%1�,}�BH7�puB0�rqY�t13�
(�$�A�{�p���+5���>�<B������G�q�QP�S�@��U��g9��:�H���o�|n����i��MO��u����KpW���8-�*S�_����Uv��������QYz�vx�c����@dDVy��oO�.	4����T��qG�VD��=����������z�����1pb��z�0�]�(W�~�����X���������O�"��,��"is�t9
�@�KP���2@F����;�Uh����Wi����[��>��k�<��AN��%���t���d2�ntJ����,�'�WVJS2"���O���#� �^[��O+�����}%T�v��@������J�����uP��6�H@cM�����_.e0�g�����
���Q������t�w��o�*�x�ozrye��?��*���m�l��(2�p���qcW24`�
D�B�5S=�y�� :��+�!�\Tw��	Q�UH�����("N0�u�!��:�Z����ev>�c�D��E]4�0��a��yU_�|,� �i���6�O,&�v�#����|qL��;L=Q�ZU>�.
P�#�!������T7�Z5s���k�{K8.I>�S�}��)TVc}��������Q'�u������!Zf���/��a'P������<��W�0���8^{=�o�N�W!��]t
>����4��a�!r
�����[��:K����=
e��Y-�d���=�V�|=�5PI�G��e��]�u���zt�&t�(m�DB�,��Lh6��I	� �H6�K�;F
�����Q�>��aHB���89�`Ixo�v��F�-��[w~��O"�KmX0qJ��������{!:�e���f���C������\��Z�):��d����t����/�3���ebZ,Qq�t#����Q�����-���T��*n���|���*r�s�$�����s>�g��Q������������~���/^!�G�?������B<u8�
n�S�i����g('!FJ�Id�hW\���ZTr�������������`�F(�I"�4$Px����r�_���MX�"*U'�����t]H���K�������$�7g�,�	��s���^	^��'�#��G���a��E���bN:���,�@�'D�^����
�^"�Q�����fiI�vHd`�����<�Zw����J���%o��rrm��ss|]D�[��O�`1q�9B�~�����#k�����^Po1�-��=>mINQ�t��XEU2 �<�������c)�����)��2��i����n���I���{��dd�eGv��X6c�q0sd����W@'.���vY]J��6�vt�j�3�{7�,n-N0� ���W����\�EQ�QO�Mi5p������?�K���T��>'����W1���hz [[��4�jc���F�T-�2�$�����}������
qu���B��P����s_|�������u*`��nFK-Q��'�`I�_���q��f�>g���?2n������"6b8�������im������v����c��_u�i��9���N�����=��XC�A������K2���_�@���D���p`?���P��(:9�:M���a?3c�!P�t��DiM��u������#����f�B�.� �t]��(Vt5(	bR���d ���bb%%���q6�}t�s3
���9�������������T����F��9��,��I�{��\Y��P��aM�����]$�(�n�^V��>������l���k>�R=��K�IH}�#�;H\�
^9���q���,� qB8�*�O���A%r�E�y�C�z�Z�t�>��2�#��N6���`l��a��F�B?KV��Mu+I5v�}��������Nk�>�����ym��"�b������X���#��`T�m���]��1��G����h�n�������]�8�Y��=��a7�E��� ��� 7��Ib]����F�50X��B�C��.[u���b�'��YAq],��,���f�������`��b�mWR{��A�!}�L�5�1���
������\ �"'�$�Cc�|3gs�����^^{Ea�bD�����W�����������_C��q��
���l�B�R%*�����v�A���!��ON�����>���e�q�zL'��<�ompRu�T���7��:J0�t�=	�iT��m�5w0�|M�p|������u��3��q����r	Z0��(���$���f�=���H��^
��u��&�m������jV�w����A(��e��?Q�	���H��?���!r#B9{P�������//J�%��
%M}�c�
��Ot��g�ZU��x��m:?~!0��h\sQ���h�G�iL�?��:k�,8V��	��n�
��?���
�?A�<�x^.�Y��S��%��X�m,O���z�3	�r��W��N`���n�w�;���#{=<BYQRsb��n2)a����XX� M��[��b�/�8!w�������H\����2�?�������M�#��?(�K>!���v�����}Z�#��e�LfLO��e��HG�����j���� �0��QO���
L�\2F�y�Jg�N;�����3����&�6�!��q%W'F�
�LB�c��M���%�-���x�"�^Doa��1��b-����V���A��������9�EyC�4,4����O�K_^����b�����rM���������G���u�n�q5������>��.E8�#��@H5�]v�g�|�+�~���l3��u�`d2M�Q_1�R��`}�K�5�cy������(�k�&�$�[a����}�(w
l�����O9�@����W&��O��fC>
R�������rl��� ���B*/�~������K*�h�}�
�'�G�)�+P�L�5O	08���*
���-WN�ptc�-Ur�oo�m�/�<�aL�5G1���/��X#�������Z/S��^�%���e<��*��ld��g�hk��SG��T��������b��p�m�G���Z)wq���Q������������dQ-�_�9�`>SkZ���N8a������b�D�
�~����w)eGi�sF7�V_�a3o_�
]�*8^��e*��!O���!���q�]>(.�����z{�[��q9�����`	���������TGvv�+H�0P�5��,7����Q�WV�a �a�E��V�!R68H��X��_GxO�~1Q��{���*0��C�E�Er��Kl�S�k��<?���bP^�+:�T���#�u��t`�yso���W�O����~�P�}����������_5��k)'�3^d�O����i�#iFlj��G4�s��3��mz���/oi$Uo��$��+?��&	�X��T�Sn����>~�$�7�������.���J�Ro}���e�uZ@�}���7R�����T"{��Ow�,h�
���O�AF�;2�lI���S���$RA�##k�	�
��-�M�j��{�Pap�/8'X������y�b�
3�������"��j����W,�?#n�	��.�	y��C�o�������5J��� �9-��#r�V_�l��������F@�c=�1_u���#Gi�l]�������,�
�vz�Lrd�.��(�F�%�	~cZ�S��z,�d�?����X�dP��Rk���E�}S�?� <�?��^��
�f`��hq����{..��WB�*���Q���arV�'	:_qB���T]�f�T�#�{���+a�h����!�U�c|((�8�^�g���w[����+�&7l��FR�Y3l`�>\�c�O����D��L�<���S7��Hk.iw�!�]�X����>�y��Y���q��kr�`�4u����4W��wEYH5�e���'��-����e�������|��
�	H/n)�5�9�n��l0�l���$$uE?���HC~�7������fPfC�3xJ����9B��.
��+~�a
l��Zx.h"��~�C��/�'=}P#2"/Zb���qO�����m�N�l���$����R]j3_4�7 {kz��/&P�!����c���zV]� �bGx��	Q3�z�S����
u��?F�M��Y�.j0T2/uX���r�){�d/�bq��g����5s��"�r�;c<xG�9� ���V
��rKW�����T�T�k�
@f�0�mR�iM�.�b��g�������H�Z`��s6b��^��c��[�������wZ����1�M��t�����e��P��4UKF����@p��3�9�g���z��)8�[b�_����b�)u5z�'
J�!F��=���|���n���N���U�����OH�nM�b\vD�W |W0K�4�2� ���|�����,{�f)<�vZ�^���mm�����]/GQ��7�$�|v��(���p��S
�'A����-P���,R��#����1C���`����|���,��y��eH��1����o�6��a���%�g5������r��-g����x#��E��rS�o�sD�e�I����=$v������O���Ug������H,��e5T���Sgy�k�h]q�}���W�l�=|��Sj^C��w7��j��FZ�*$�A��IS��=?{�O��1)n	��|�{G0��S��D&f��o�J��!L1�	d��(�YHP=���9,U�>{
��f�*��=�����1�����c����d��.0�
��M��W��dt��&]66;��aNq�t)1'O�����?b�����Z��u��-�3��s���N����#E|�KSQ=,U�����Hq����
_X��a��O����Lk�B6�f
�\/���G���ZY+�����:G#��21����'����vRb�%y���{Iq>��N<�i�d��=�c���0[�������-� y����o��K�Y�b��m�b+��E_J�=�^�3uN�� �����i��,�y=�]����}��42��@^�f���,��3�}�k�����K7��aB�	,���:��j�4�MqGa\]�<��F�|l`+KpQ(��,�}���{��|��)�-C*�����/	f�p7[E�oZ�ci�����~2�]A	��&�v���
!m1���a�\�#+���}������/t�9W�l,�T�s��^���4�)}!�.2��w``�)33�����k����.:DMgwc���yj������;���gF���3_.Q6��D�w�*��K����]��#J��].8��@E��2�+�=Ux
5���G�����vff��'"�>t�w���v�������x��C���-W�4'2F6P�n�.aT���� �U�m�N3�������c
1A�a�-	���D#�UJ�0�I��$�_����m�("��c�h��xa��P���i$�]���Gi�3Y�`U�j_�0�t!�#<c���}����;����s����TJC���-�o����I��)�����(+��&�w(*g2����eb_�`<�XsE��,4h�}�-�1�9,-�K�*�v�!&j�8�����
��Z��(v��j�P�]#�����%�V��h���7{)�
\�R]W3��� Mh`��I�-Z'�
Is���n3���m!&�Es$�Q������,�=���#RA����S-��J��P�}�\�������`}"�G��-H�����=����>*�{H�6}�vl���i3���i��g����oJ�'�'���W���J��J/�������;5S	��2j���KO�^C�����.
'�JH�m0C����tc��0��cT�V\F��K2{[�t��� ����EC5qI6��W-sCr��@7�<���������)�!�y:�S��x>��=���?4��o�!=��g���Y���2',��r��|�m�6��e�X�������
���Lh�.6�{9?���ia��7�������4������s�Cxi)*����;�<�]F���C���?�j��oO
w���`H������|�}��D�qU�TQA��5~T��j�o0���Aj����g]����<I�1G�x����'{E�7���d�M�[�)g)�=_>|�B������-8e��s,���f��C����v��c�L1������
f��ty<SM����\-���|�����9�,[f�di1������
-�4����9_���x����e��n���I��-��5fF�#�dsz������3,�����v���z�g!�M�\�T��DV1O�p�P�>$��_����
���e�s������GIH%2��������a%��jL$�b��F��G
93��S�s-�e|��b��}U�rU��{F��S��qW!}���I�
CO�������OI)��Y��-[�xX�v[��7n���aeB�ib �&����*��8OD���Z�F��~�sy����oRd��\6�S��Y)��5=�.�{������oe�����:�sr�z�5 Gz��7��5��X�cA����4�68�fP�_�R����*o����m:�3� ���^�AtdS\���D	����-��i2������������K�������|���m���
gD~��#�U��^�� 6�$�x��
�$��>�I�R
O������]]�P�
p����,���R6�G6B�d�&����w���7����7?B����6�t�4��-{������+�}4c|�]�������1J�p9�����S}|���U�^U��u��i�25p���5mN�Z!��6�9p��h�;����z�K�2�k^��z��s�y{����^�S�f�������hL��1=�U��.��(�$t�?����-a�Wp�hF�6F�~3�h��dB�����W�=\��-S@�|�����W1��.Jf����c��^"ag�����Y��s�����&]���f���/�\#?����U��X�.d��@�,��.3)�����H
�V��c{�7�e�n������UG��/T�Z�`H��]O���;�Y��	t����{��l���.	]�\1X+�ak���������\��%�e����f��ggU�z_�q�/����/����7x���J&����V����5�w	{�N	j�c���|���WT�r���7��X>�����;|1�J�A_8�$��"�E�����P�E6�|�'�`$�9�)7=\���;a�,�K�V��1��DBYaj/(���%���l2|�'�5����iT0�\����v���G���SLi�TKO�u�x��q���<� ��]����X��YP��11�"=E�������(xxq�-�P0����������m����+8����?>S�Tr*�v	A��u����xWb�G>�{=��&�&��$	\��`{\�(�G(/+�o%���AJ�������g� �V>��6Z�u^�WVl�*Qao�����p�fmVq���VQ�q?_a��s�����	�1�Z8D\���i���@���~O��`;o��?
^���*4��X����,��)$g��<	��a������))�u��������;��C�RA��\s(����C��V&����W�B�P�e���2yp��'<��|B�1\�D�y���������b�A��5/�i�f:*�U��t<�Qxp����z����C/��
S>�_�:��3>�m'!AF03� Svt?��\[G�B�5�CDN_�G�R�*�8���-�H�v�]�ih�]7����I�p���-qNN��[
9����T��!e���;������M4|�V��e��~�i)7%�\�����Z�Zi$m����?}j�~�j����A�OM#@��cY�L?0(�&�����}�m�����y�*�AzB����v�G��e�)�P��_���nFS(�0�Od��6%�t��2��q���tF��W��`���,��W�a3U�r�Y�=5��Z4,Po0\4�t|�-T�����aA�{�`fm]��N:������Q���h�!�)��q�C�x	�CK�Q�b ���r�!�A�����A+�d��9�P:����q(��+���"�p�>d��d��r�h<�q�k�k�'������j��wFe&����tTwTR���u���a�����������"���X�K�����n�������A��}�j�pj�5�5W[�����������sFz������[���������!U(w��=���"��[ZMg��LP�:���:$��]��i��?D�U(��~M��^q�7�����G>��P�h7v���p�u�R�m�d6����E���g|CPY�{	�))7�C�H4��A�uzp*�	Ibds�z4M�+D����6i�|����8�r��'w�yP��6��ET�Kb�9��#��SHG2����'�Q�N��� �)� ����Y�$,�3"���s�ZP�^�����yW+pUC��=s���K�j>&����*aC���A��Ip��P���s%�G%������%w�������R��fI�3�?�*k���^���2��5�������v���@%���bx����(��iAo'Od����c�����M3{H�f�)b0�����lCo�b�M��EN�5vC�v-��Q��(?�����qvg���
[���a������Z*Fx�
�������G�	�uE3>�����nA?*�\!4K�-����r���C-U~�+����d�?���3��3����S����v7���[�hh�Rt���?�������"��E��0��*l�gP~��������������k0���Z`�wjF����Q��+�]Q����n}�p[�����T�B��hLy�A2Y!������G\\jQ�`"�����@_�n��O!!	�^0�H�z��������S�{I����z�!��NkPc
B�h�O@U��O�S$����K���p���!�9<���M~2|�.F.4��c,b�=�+:^��w�y����@za��
��/�cF��2��>�253O�@'��	���"�N�������,�����q	�.��6���E�~�>�B��3�;g���i�,�c|&BL����*j\��/�>]#>PL��O��\��9��Nk�����S��iWJ)$��p��:��p��MTXij[D�;�b���WdUyl�qK��#���>r<�����v4�j(�`���WS0���B�A��-q]����U�%�:.�H�xO�g�9w{h`%��y����Y]�h��K�p��
s���mTh9R�9�a?%]�_�FB�QG�7T}Lp�t��I���S��d��K�Q���a�r�c��fI^��b\�;^Y;P^�$�k����5���h���s9M�V���41V��H����b�~�%������z���+���.)�/a���1�~�����2�x��#�T���k���7�5��, ���0x��{��Z�,�9�'GI
dX��(*��sO�-zO|�����/?su�M�Lz���$���OJ0��[a�@�W�#�@��PTr�j���H���#w���5�����!����������~#���bw�-=#*%�����{�)�*�������1���� ��/v����b�)���-��w� �4��f? *YO�o�i���������N7�F�./����%Dv��`l��FO)o��N����������t����36)/��n����<�L�'I�</7����oRo����3��"0�mN����b
G8�D	����f���A�1�
;�
�����D���@r���;k'�
�-?�V�(f�#i����s�u���9D�������a2�^��6n%sJn�����h�p�{��$���|H ���E�g�j��)a� b���ix�5�t`����U��s�H���������d@8:Lxo^ -,�{p��|�~�#]�j6��Or�.;1����a*F��W��`�V������xG�ECr��lE�0�[Jv�K�p����-�hv0��&A��T��C�;wF$��5=j&(	�������3�a���5Y�K�s$�-��"8�)�'�cI>���>(��Ef����]��v�:g�(DLU�_K'����r�|*�Mw����9�\|+}����DY�GU�/���$v��jTLe������������La����"&����}��kv��1o>6@21�^j:����G��e�mwx��o�!�%5aO���4�����^�������Vi�X)��L��m?��$�f�xh���:��B�%~S�B�	��^���HA��)-�
������W��>��?����R�hO��z��������u�4z���-B
m�DI?��}�b�V�A&���Y1�9B�h�������N����� �-r���]j-@#K�!�� P�U��9j��{>�K���n�i� )�#�`g!k�D�Dtx�#�X7��.�9)we	��Y$J�>|�r��o��#��x������)g:���[E���Pq�c���L������M�����y�4�����C&��r�8��
D*M�.7�bY���k��u
��bW���z#���?&�{
�B�.�5��5�x�g���uy�b���;���5�N�as��c����R���B����B�:��p�1��2�'-1Z<��K����������������j�>�)�g�uZ���H��M�����p9I
b�����{QN���#�1���7U:���;����k����=��Pm�����(�T��A1�����-�E�n�_CA�O��_t��(�.Rx�P��1|S{j�f���o��d.!�Jv/����:��j�����Z�9��I�El ��4P?4�����L"3�r��
!(E�V?����%p�2-|eO.���M-Z[�
��L��M�9��}�I�J�=0��l�|�r��u1���[�Z����c�-5n��h�:�����
�*u>���-��?(Fi^F�I�HM'���.��C�	��Rg�����s���d�p��1�������F�n��(���m;]��MAqO�g�&�Am\-7�'q6\4W��
�{X�p<x3��}��O�$��3�9�������	,��=����>"�0W�����l��.t��[��>���G[�q��-��S����d4C�I��o�~#���W:��uh?������
�xa��� �F'&�
����������tLc�d��R�0��m����Y���i����Li�#H	�>�!d8�����kB'6���c�tS���p��L�3�!��t
��:���<�����c� ��Nl'�W&�oA�_�5�9�v�e�
n<�S�X�
���!�o7�!Y6�}c~$����3L�����������I,��M�+�{�W�>���8%�l���
�����y>����T�#��yS����}��#Z���49��
'�8L��"h���c����5O�OM`��5"rdx>6���1x��Y��tc�T"d4���qrp<��[���y�a���e�A�v`�8�<��i�.[�	�����a�d�4�0��e^dbo,@�D"�D�3��������@DV�A;jTkv01Zr�����LCE��c8�XZ{d�zk�
�M9�m$����q�[3�3��\@R�f����x��>�!�w��9��j2���SWd%wD0�7!p�������z�H4^�������M�7������i��-�A3��C8E��.����r37�\����~1SON���M��Xn[Q$X�����K����_S������d'���8~jdj��4�/�>�*�7�G�j���VB�I�����m��7�E�qL�n\�.���A��_�2�lt	w�;�������t��X��n�'����~cf����UY��M�����i��Wh���:\l3��>3OE&3�A,�<�Zz?�����|/�NaX�,1�?���C��$�|�5qYG��i��U<P
����&�J9�.*y���x��������]0�Z��A^V-J�X���3��Y�\����;�z��Uq�9���6����;P3��J�	�O�yYR$�4m���.�����|T�����.���J-���.����+`T�&���3�z���Tf�AkA�����S�[ls1k�4j�F��78Ci_UD��wI���<���):���j<��)mj|T�X����%�5��q6��� �g�n-� |�����C�U�{>�������){-N7��N*V0.�7�*��HIlf7%��)��,���h�eU����i3�j�-)�JV��+t��u5����<_���"U.ZzK��D���)���f\������-����0c�"��\��kQ��1�6�Rl�u5K��_#�r�|�Fo�����+�Y����n��f������)
����2��zW!���cr�	�dbIa����e����R����:���T�����Q�U�^g��o����i�p��5�>r+&`t�L2�X+�
���&vD�N�q��(�h�����������H��u�,y��U��vd����rZ�8���L{� �J����S]"�L���N��-k�E�1U�T�4�z�l`tN����������y����MXc�C*�!g�%h�� W��{��d��l�?�x�K��6�K�	�3%}(|(4et��z�,G��o3�tb�, �5��4(
KQmR�u��id,���V��
9��7�*�"��}�B������H���_���r����#a��}�j��.�)*�--K�i��0���4�����*�]OD~v�:�fa��@i��rAu:���4D��K�h>X���o������h����o���M�t�8����KZ�
c���ew��M���C�����<���!��OU�7��H��w�SLq��B�����&�wW�j�`}���=�,��B[��4�.�C���bES1X�>c
c�J����O��hN<�K?TXs����q����8�j5f�4������l���?��^�%s�%���$�4V�R��A�9a�O�������d��7n(��f��t���A�	A'�������V`j�<�F��^�	V(��6���z���'��(����l��$�
c1	����i�Zm���!�tF�`U`�8.�1�M�5�<k�����%�������/��ZZqz��AU~�Zn-�o�gSq�Z�U��5"y�d���R���z�QM@����2��A���������%�"q�(�B4�
���%�,a�>gB�e�`6	�t�5����\���+�q�1_lA�jh��Mb�_�I�%\VB(��@���2��!�|�Z��8O����E�<R����������3X��`Tu���b��P�s&"c��;�f��Wh��KG����ql�v=��)�`�$64����z��0�v=o�RO��5���&��NV��������y���ZcE����0�����1���{�f�����_9F��qhU��0�=�������7�D����r3;�<���%����������-�f��_��I�'%�D6�Z1����+�����L���{�&cb��\�,�qN�S�����K�N�K�qO��%5@��cl+�Z���Aj�_Y���WYJ��Lz�����\��T��n���7�*�.'�\�E���	&��V�6�H����_	��3�����W6N3]��
[���F�,��wP�P�.>^K1�8Jd�(<�$�!������C�9�Nl�_�iQ:se�~@����8���%=��%��!N�`��B*y���5���������^��5j=?$�����������7�	C�L�)6x��7�L+�P�S��pp�pl���.,��L'�����f��J��K�%���oe�����Y�aT�A�*x�_G�_Mz���OU�w��\����o�|%�!��SPE��$V���H����<�PW��J;��o���6�z��q���t���q
m �"�_��,t@�P�����w�Y�di8i�'}h��������� �������q��A*5�^U�S��`^Q��96��=�<L���q�'\A����&��w�(��21K	2��I#UYj	DG%�U7nM�7X��1nS{�=yt���9�1������?||����P�O<g��Y������?���?��L�� ��W@���g���e��8�v��x�yD�3��B�a������>n���Jn��?,�/�X+t�0������(��_�+C`A�fz0���7�}h}�Bw�V�?���I�9G�	�g�B��
]�K���	ON.��%,)1rJ�����������$��u6��������Y�+���;K���@�t�y��V���	32X2!���@[�@O}e?��P+9I�������Ax���;T-�����K+�c���^=&��*%N�<�����zJ"�gd���	���Sp5�����x��s�S��x�Drg/������[5S��l���������������[�@K���PY�������pE�n��&hd"2aw��`�	���R�����F��!�?��1[��������[T(	-X�p��kyM�g�`��[�������)h1
�F������'�3�;�vjf(���0e�@����I��������W����T5�$�x0�������	�P���r�o�n!�	����k���l�)�=����-j��C�a��?���k_q����o�!���vx�1>;�����*je`X��h������|������o�rs��������!��xa�b��2Rq��/J�w*�X<b6�������cN��ERnu�q�_W��.)������'�j�hN2���������&;+/^?����C&�dW�B��P������co.^�%�V4���VX�_����������1�A�����:J/�6��n,���{��������4o+���2_�Z�dK�VO��������teZ���z"�����	�C@��
���)�������g�lO
����o_�w���,��J�G�@����,O����1�^M�Df�v����M��J�-]��i��]R��/�E%��U�Eb�������!62?r���PP����:M>EI9�,���Z�&����Q��auhsH��y; r����x\�%[��]�i��Y�p�e	��^����!���#}�Q���=v�o{��YI:��bA��������;�w	��'X|�_M6��{>�]UcnK����q�59�^��PM%��l�����"�sZ��m���f�g��$���^YZ�g�����3�f7�/��������R�3�i����
��(���1�^����"����\���TCM�M2����7<;�=���A�~E�	���z_jx���s��)�}��������/�1�^!
D�������d�<��&�Z<n���73j,Z�G������)��JxM���a���B��A�f8^30V��#�4��U��+r�
w�q�2�9��#O����X��e�uv�)������i3�U���2g����`�������U&�� j�@�Z�TZ��"j����#��*9$�AZb��W�~�����Fl���3��s����t���,(P��iy����>[N�|���'�J�)c�m�A�I���
K�D[�DO�e��-��FS������B�������/\�+���n0�������2�B���Q��t��F��2N�~o��5/=B7�21���c[�|�7r$�z$��L�"��=���%�E�F���Ml���;z.H����B�2#�y�R(�[�[����vT��%�O~�E�
�`sW���Qm�nDO	�u������%��Q�yeCc7h����9:,�����,;�����o�C���G���7�������9�w�O�����z���u����"|�����A��R��~y�Op�;?�dl�T	���������!{�E��~]������;{���:�IDY�������8��$��Q4�-����po
���^������(B&�|���\X����y9�kn4D,���B$dS���RA
� ������
����Au`�^��cm��1I����|4zF��0��8���
K��m���2�7L���k[<!A3� ������;ylN<0�}��M�d3���(U��������<�!������@|�;�uQ8�s�SA�5*	h��{�tm��k�����:���P8mY���oI�B��H�a�X�9���7E��1%�V���\Po��.������]��������L��~2�����Rb�s��6���in����^D ��_�u�b#>���bn/�y�fE���&4�6�)�����b��V�?_�1�>*�����������sz��@9&�����q2��,����� N�MNf'T���.�DW��D�m6[��3�_����S�,�|��{�9�a��N�����z�J���1���p�)l`��6j�z����X�hM���[�i�6EA�+@e�{b50�F�lc�E��Cr��������%[+������
P��z+��j�����k��n	�n?#��1�r��M�HEL��3H���^r�������+���S@�<6��/���Nerv)+����(�����Y�9wD����H=n�R"���jf��{����G���BZN���z�����!�{����L��uno~���=kM,'$rz��v{�?y�br�[���+����)@ O�����K��h���(=�GC+)L�����;����y,�4���!U����$m�64,�G
S=�v�}!�~
�W+>���m�_�v��;0)�A����8	3
��Q�JKB��bn�u���#�N�[e�f���1�{��
���+�����Q5&�Y�\-�Rgg�
��I�r����Y������~�,�
��c���(�G(	� t�z�,�M���(�C��]U���D�����35X�@|�@�r���tV���DX��|r����:y��1�,b����M���WJ���!Z�*n�!���:�f\k��JCO�m�QM���Z�GH�=h��M�d���B/�����Z�u��i�n��e���V9�Q=����t�>TxY�j�]�7��������Kx�|��p��5���&��_Y'�p�/v����C�ge8���)��a��(0��8��Y�KH�I��?��XgnX�����~4�Le���1���:Qz.�7����	�H��9��%�r�L�*t8�EeM	�����y���Yt��.�+�K��q�J�(��4&��d���{{���X��5@��o�=�q��\��rf�l��^�f���_h�W�}�L=_�xqb|�*s��C���oW�$�?4��W��'�8�=E������n��cs�1��_��75����\JI�5����2��T�y�N�?��(p|.x�����,�J9��fa�N���$�����A��QQ����X
��������2-vK/��/t�l]L����+�q^o�Q��V
wa��U��B)4F�g ���C��B>WL<.�����7�
�!���l�`==���>�|����q.s����h��^����iy���;^�^�5u�S��&�����)�S�[�k[��It{a���I5��u6*�������,����}�%4��]6�f���2i�k�z�6��d�
��!M��m�a�!7��?�������QC���y��"Y��&������N������xv���V�7G��,�O�C���Z"�;C��u=B�y
G��G�2q;EI�z�=�bv+L�I4�^��w�L��j0��g�?J{^����}}U�`������������_��\'ws!� S���n���j�G��8����*��8}��"��m����sBN1C��p�)WC�;�;���x;9�-����l7���fK,K:�������p�'��>��um{r<"��GD�Ji6���0���;h"����,A��,��M����+��$��$gq�+�t1�2�D��Bb���<�t����F����
	�k��>
����
�����m_��X7"��������>�2�������h�I�f�.1i��VrR�$�[�&,�"�a����c^��qS"0�k�D,=XdW:���~gzD ���_a�<����8 �54��3�8�o�"������L����b������7v��^y�	�M�����>C�Ua�����\�j]S|�BE���cAe��?�n�KJ"�*�qRF����
�pB�W��"����#%y
P}�\���M@p/
[�L���e/�`�q?�����d���W�d�@��f����M��������A�~08�V��7`:������d��k������I_������:>�o����@ ��p��;��|
����)�����YS�H�a�Y���peo-��������Pu�&sBC��@�����Q~��vRG_��b%�����g/4gK�������_,�b������}�B��anc�3k
���o�Ek���[��`v���X�� �)���EG�)-4Fi���i�de|i��L���M�����jl����P��w����d%������f��(0�kI�\���:��a�\%�5R�]��c�����n���PKm�+*T3�22w���	�an$�����j��h�������F(�(M���Nci��,M�,6Z���V5�Hd�����a[��A8K���������9-Q^@.������1�3v��I��� coOfU��k�Ya�$4R6�T����0�E.�/����X�E(�N�a�������]�(����J�D��G��A\���?^�"�������>�t�"�����I����u�4x���R�����)���1��]ofh���W4�h�>����J)����f��eG�����>�������)^@Oz�k���#>b>�G�n�J��6�I?���84�JFO������PAlA�ZO�
��!������tt����t�l�[���n`�t��GE#�m����L�I�������lu�����'r����l�f�6Fvp���z������}��^������sL�?���8������;�H�A�s�#��J�t��d��$�-������8�@q9:$��"=X:��}��B"��#�e��pL+8�T�C
C�������#���O�qz*��H�������E^�?z7�f���a����6��=����F���~<���=0����m�u7�W����m�"x���<jKMt&���us>��K��MBh ��r������YEp�ad
���H��o��/���UV���`�4y���,w�z�\!��q���F����A�I,�3�5������k'=W/%�v'��;����iD���C�S��v"t5���9�}Z��1�K�Qys���qV9����<�����BJ7��l�_��'8*������;��rXB������(2w��w���
�v7R�{��I����9���c�g0$�N`�dOs�Q8��^+U�U6��#�=#(�.-���?~�v������?�tf��[U��y������?�:}[�Od�
n�i��Z����;�_�2�#��C��9T�[H;&�=i_�hP($���
\�x�6��4�3a]|)(���1�����h��:�P`�N�b3(����}�����Q��Ohl����N�8�W��p�b������`����fJy�F8��7}��K7��
Cv�I�1#G��8��G>�F���Ip�ifB���^m����������v"%��N��T��RB*y>D|������s���]L R#d���3:^fZ9m�A�3;�������\��I~������S�	���S��k�Ue���4�X�K��Z�#�)��4�u�"�����4�xQp�P_���%�X�1��K)�l��T�7�	���*�����$�#��<�����������~x%�]�5��W��nj(s0�e��
��m��=��Z��(�G�z�>�-�
j�%y�7B�_�8���8��C@kVNA?��)q��u%:�k�Y�c���Y���|\�?{�������`F���t�,83��c^0�P5��@v�M�B$�������W�����*���>�����Np}�����h�fs�J-�yT�Z�d�g�:�K��;�e�~����hfQL��X��p0]����'<r*��%�"���6�8s����pf$dceH�� ��{|�&N7���:��i&���G��-^H5��lR�����?j��d��{8��{���[B�S��
~�}t��������
�������
3�m:��j���c�mF�1��>$U�N'��������Ak$.��2s��YC_s��s�^�9_�c9�-����]s�>Md4��K;WQ��W��<���c7I��l@W���v
���:N	���?N�O��Q�iD�O���*[�#�����LG9o���%#�(����#H�����Cf9N��WRGg��u��#DG�������� ��]���H�c9�K=������iV[��w��U��������;�[��$��W�4�Ic��PI���{��d�5��>cvJ|���2��1�@~�b,��Q��C���E�%���n��$��%G�����_a�O��&��e�e �������M��|�Uo�F��Q���66��K~(���1�
�T�9�w�A�_���' t�����D���l�7;�J
$�uon��Xw�d�>�*+j�V�Zr^�7��4�Jj0_P2I��4�47�Ui#`Y8�����[�Q��"�j�Mv}u�N���,����D����������]> �9�SF]<�m�=�����-���M��S?����pJ���(#�;8T� s���?�-q�D�@=8�#��B�^"n9�g���[�r0�yM�:��o#���so�8��VU���w�V��
�{�}�m��S&���K�������y��e���������b����Vh���
c�����B38W=z|]@b����"��F��_BdW9w`nl���	�A$��:yB�*����^���I�^~�R�����p�-������9���K�)��
�4,0n7��y���r��������|\�F��&������l*�����<e��|Q�nMO;��u�Y�(�;&1
���Rdx=����+�t������b�\�f����G�uo_|�	��Ht2O#�-6�C7A�k,@����t�#6��j�>@��r���9@�aU��~�?����/�:�j�)f��G��2;2v���r!,,���On��S-�e���������TU�2�S>ma�z	;���&V���=��N� ��]��{��/ND��z
Q�QN�L���<q<9�������
p����y&��5�)�0�<|1cy$��� ���U�1>�p�}��
, ?k���e��{Z��`����Kk&�=���M��F�r�r��&��F6�|X�������%2��kf�cj�l���$dg���=������S��$7�y5��V��?w�N���������[�"�yO$�w��6�y��G�����RN�mvV&/j����V1�������e]�yqP�%���6��_�}8�G��	���=D�p�����e �858�8?E{�����r �X���x*[l��v�ba���>�8��6s*���M$��4{�� =�~����O2f1����C��>w������/*�1��]>�W(n�:��4�
� _0|���B�1���IV�2��p?���l�
��= ���jO��#�2�?�:�]E�x1hvk�k�CC�������<
�����r����W)*oR���[(�2�er��!��L�cQ��4�6����������������B�F�Aw[��:5���fD>�!\�6
T��?������Ibo����QUK��(�U�!�>a�9)��-p��Sjk����$�����I�U&X��K�i���6�0��j��:���dW)�t�������/�Ic�Zj���=mN�yf_U�[;$:�C��DV�KPXs���������j�_��6P`O�e��S���$?��;�)�5��q6;
qD���S���V���*�8� ���g4*~�4W��'�6S��[��	��K�<wv\�g.rHP�������
� S�U[BF{*��>[��>QH�U��a[HMRP����;����UnH�l<��S�ON���s���V-��	���G}����&�Tg6��_��}�8Hp������Uug`d���=�#c����t�"x��3�7�SY�TO��LUB���$~��go�@.#���^���G%����iU�6��s��V��^���%�e��(�t1�����EuA���G#u_�_��m��D@����e�W��]�	����g��p�f����4
J�L��&H���3q�e�����Tu������bk)�����/�A��T����x�pL��.��H������������,O#8���e5�&�`�����T�x��l%Jg������qj�M�<>2pQX�VC�Q;�����*u�����r��M�!����������� �lMw����{���������O���xV��A�6�w;_���-\���9������mw��j�*1�!�ly���<�y��6V%WLN���������e������c37@�&�����UH�U��<r��R����%P[W����5!��a�u&��]/�H-gj������H���!)L�=��������VO�|��[@������4��;�� DOj�
���p�O;-o���?Rv����	`d`�5�OM��BX2�<������AI�j���w�K�j�2&ps���G��y��������
��T�t����}4���
^#r#6u�xJ�m{T��s@����}��k5�vT�����)���_U���@X����^��U�%��v�5��	�Z�V�i��?��L.�
�F���)D�l�������&5wu���s`���a�4������+��6B�J�m�Q��?" �$_����5��x����2,������W���#}����'�tG+}��g�/U����vBB����}_,X���O= ������S�	'M.����S���>yf�x �%���i����w����wyO��Z���$���j�]����faS�Mg�T�D��d�\U���)Q�tt�d�Q��kdb�^��0��y,m�G������+7Z%��-h���8��CL4�/X}6 i�Rv��=.N�#X�	��X�\E-�#2���9��V�2�}i���u�!?n������v�R��������/����G5���0�4���\����}���#��l�qu��jd��W%C�K��=��!�w[|���?�51��,�~���1����6Sb�����$xIC��Z����\�R
~��E�F�������7�mC6A�!�m
��t�%N�.EM�@���6�gc�;�Z�J/��gp%{���,����3Xl2R!,y%�)���
�!�\O�F���U��[�����:�g+=��B��}��~��$��-���������Yp�}���~���x�%�s2��\W��������
?R{�U�����T�8�&���k9���i��OU��1����&�4�AG�?Z��d���C�!G9�B~3�(�5���U�:�r�D.8���9q��I�H�
���+��Y&�#������D��^��r��G�e������~�����
Hln��'1i��+�Bq���Q�;��#&����+@��L����K��ZE�b�
��j�)@�>SD�*��Ic��A��������v�� r��8�m��6I���S�����u"i�,
�s6w����lSXQ'��]��~�x�`rN�F6�	7s%�OP��h��&L�Fs�zOl������I{���|56�O���^NB�n�����bi�z �~&�V��E�
�L��%>�=��S�~��h��	�7"�Dg�a���>�����P��:��oFFj����y0Y�5'������ �h���)e��`��d���m���
�AA���UU`
X�fkgz���x��P[��F�|%����y�F
z,�C�ku��AL��Xr��o����FU�b��u��<�nd�5�{,��]��xh����H����U�S�e�dTJ
�FZ�#�Fr1��u�j�u�d �����1��Nwx
#�g��ab�%9����J�b���g��I^����������-mc�,-Q�)0�G�V�B�t^�rS��i�T����25����o�7w�M�����`5;����fE��T[�H_���Um�)��
�N�#��h��������
��S���'ha7���L��[���I	I�������xZ�}��#�V�����
����*!�'K���0�����#�\9��5���&�j���M|��c��IA}
"�~�"�8��.�C��H�o���cd?b����i�>�C�� ����t��#���Pr~�0��L���
	�nL���~�T��W0��8��[�zr�0�;�����F��H ���������D/Pjt���&�� )�J�	��X�I�������$,�fz����|�)��l��M~�Q��?��H�|m�m�k5�����'v���M��0P$L�I3"W� ���H�`�_a���b�v}����~z�a���1@Ja/�&���7?+�W+��+p����z|���@f5C��e��!�����=�9���������*���>���{	��m_J9�g���{�l,Ed*0#��k>��a��>�?\	`+->���H���:�d�&!�b_B�����Ogl�[3���i�	(��V$u�G��Wr����o`���-)��4F�Lf�I�p+�������[
�����$�����d� �Z���2�1�8��%�P�N��+`o=�a����`�H��N��x�s���#�)��� ��g
���������{��C"�>t������}��O�F��F�Q��:�M��h5X�p���eX;=�+;x+�����=�j�X�Z
�(�����l���j.�b��������b��?���_JSg$�=wEkf�7�X
q�
2������N����^;������SpB{-7����D�]u�p����d��YG<w�mnT�/f}o:�~@��VFX�I��#�,�	��������^�|[�.��hf^�����,������3�g�����V���
Y3�C	:@�cd�;���-- cM��vh�_U�����������r����w��n��W����+�q��G!��L��v-W���`|�������F �z�V��ao���dy�e�d?p*���3��o�1�tP�J�x��6L������
����2q��JV���q�H��~i����+���&��l'����c�iu�m'���n�%
�vy�����qMv�Ar�	iy�O�[�1+��70��r%��?t
�\��p�4���%�Z���K�����
��2�������`@G��94(�t�'����z��Yj�g����"�
5��EN��B�IN�����������8�1tn���S��l��OAFU����B�����s�L�$�����;sW�O�9�B�:���O�!�E���K"1�t/���m�EH|����8�<�1�h�����k;=��4v�pa
TXk����7�p>�?�����X[��o���6�u����N���V��=?������~e��O@N`F|��d���@Y?����-Qf��,�o�"���C~��dX�y��N�������W_b"���5�Y�}���(��D���7P6RU�TSOc�g��!�lcQ���(.h�����p����{�(�-��9'�*D���L}��
pg������)c����)�E��}�k��9�)�P�@�'E��Y���m���0�}����.���?e���.����VB��W�grJ\����n1�|R�(��<���V�>�G��,�^�o�aP_�1�'P��m���������w��
j���m�3�����a+�;5�^W��=���I<�F�S��<hF��@��e\��*]/������z��9��q��n�hCD�������#N�
S�/�u`0H���o�[���	�s����A��7?�ir+J�N�i��^�vT,c��]K�B�6�^��957�����DQ�k:?F�g4�$R����6�5
I-E�\��P�t��'>r������J�mca�h���Q�h+#B�}1[H	G�T�/7�\(����Jd����LQs�����(G�kJ���D�K�J����}���+�SP������B�RK����r�H/
LmF���z�����[Q�<�����Yen�v}['�M�k�p���Z02>h�<�fUZ1@gU�}B�D����-%��q!E����M�Rh�#mh3s�n��(:������z]X�<�����p�����C!^�b���S(��S���PRPQ�s@j1����n�u�;q^��2vNRxd������<u.>�R�^�v3#��0W���(��E�]3�R��fj�<�����X	aEX���1N����y��R���r-~R�IZ%F��N���-G���K��u�{����9����>���dl�=�����
	�j���f��#��"���I?��k1����\+�T��cU��3��<�7U#�o��?x>�L�����b4���<3�����i%Gz/,�j$N�I�eu�_�`{d)@69~�3j������rN��O�?3�	
��e�2;5<��V�\0�����x`T�eZ�����a]�����p�����-��?�����������e\��|�.m�����
��,���g�YZ

#23

andres@anarazel.de

over 3 years ago

In reply to: Tomas Vondra (#21)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 2022-05-29 18:00:14 +0200, Tomas Vondra wrote:

Also, there's the question of correctness, and I'd bet Andres is right
getting snapshot without holding ProcArrayLock is busted.

Unless there's some actual analysis of this by Rainier, I'm just going to
ignore this thread going forward. It's pointless to invest time when
everything we say is just ignored.

#24

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Ranier Vilela (#22)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 5/29/22 19:26, Ranier Vilela wrote:

...
I redid the benchmark with a better machine:
Intel i7-10510U
RAM 8GB
SSD 512GB
Linux Ubuntu 64 bits

All files are attached, including the raw data of the results.
I did the calculations as requested.
But a quick average of the 10 benchmarks, done resulted in 10,000 tps more.
Not bad, for a simple patch, made entirely of micro-optimizations.

I am a bit puzzled by the calculations.

It seems you somehow sum the differences for each run, and then average
that over all the runs. So, something like

SELECT avg(delta_tps) FROM (
SELECT run, SUM(patched_tps - master_tps) AS delta_tps
FROM results GROUP BY run
) foo;

That's certainly "unorthodox" way to evaluate the results, because it
mixes results for different client counts. That's certainly not what I
suggested, and it's a pretty useless view on the data, as it obfuscates
how throughput depends on the client count.

And no, the resulting 10k does not mean you've "gained" 10k tps anywhere
- none of the "diff" values is anywhere close to that value. If you
tested more client counts, you'd probably get bigger difference.
Compared to the "sum(tps)" for each run, it's like 0.8% difference. But
even that is entirely useless, due to mixing different client counts.

I'm sorry, but this is so silly it's hard to even explain why ...

What I meant is calculating median for each client count, so for example
for the master branch you get 10 values for 1 client

38820 39245 39773 39597 39301 39442 39379 39622 38909 38454

and if you calculate median, you'll get 39340 (and stdev 411). And same
for the other client counts, etc. If you do that, you'll get this:

clients master patched diff
------------------------------------
1 39340 40173 2.12%
10 132462 134274 1.37%
50 115669 116575 0.78%
100 97931 98816 0.90%
200 88912 89660 0.84%
300 87879 88636 0.86%
400 87721 88219 0.57%
500 87267 88078 0.93%
600 87317 87781 0.53%
700 86907 87603 0.80%
800 86852 87364 0.59%
900 86578 87173 0.69%
1000 86481 86969 0.56%

How exactly this improves scalability is completely unclear to me.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#25

ranier.vf@gmail.com

over 3 years ago

In reply to: Andres Freund (#23)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em dom., 29 de mai. de 2022 às 15:21, Andres Freund <andres@anarazel.de>
escreveu:

On 2022-05-29 18:00:14 +0200, Tomas Vondra wrote:

Also, there's the question of correctness, and I'd bet Andres is right
getting snapshot without holding ProcArrayLock is busted.

Unless there's some actual analysis of this by Rainier, I'm just going to
ignore this thread going forward. It's pointless to invest time when
everything we say is just ignored.

Sorry, just not my intention to ignore this important point.
Of course, any performance gain is good, but robustness comes first.

As soon as I have some time.

regards,
Ranier Vilela

#26

ranier.vf@gmail.com

over 3 years ago

In reply to: Ranier Vilela (#25)

2 attachment(s)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Em dom., 29 de mai. de 2022 às 17:10, Ranier Vilela <ranier.vf@gmail.com>
escreveu:

Em dom., 29 de mai. de 2022 às 15:21, Andres Freund <andres@anarazel.de>
escreveu:

On 2022-05-29 18:00:14 +0200, Tomas Vondra wrote:

Also, there's the question of correctness, and I'd bet Andres is right
getting snapshot without holding ProcArrayLock is busted.

Unless there's some actual analysis of this by Rainier, I'm just going to
ignore this thread going forward. It's pointless to invest time when
everything we say is just ignored.

Sorry, just not my intention to ignore this important point.
Of course, any performance gain is good, but robustness comes first.

As soon as I have some time.

I redid the benchmarks, with getting a snapshot with holding ProcArrayLock.

Average Results

Connections:
tps head tps patch diff
1 39196,3088985 39858,0207936 661,711895100008 101,69%
2 65050,8643819 65245,9852367 195,1208548 100,30%
5 91486,0298359 91862,9026528 376,872816899995 100,41%
10 131318,0774955 131547,1404573 229,062961799995 100,17%
50 116531,2334144 116687,0325522 155,799137800001 100,13%
100 98969,4650449 98808,6778717 -160,787173199991 99,84%
200 89514,5238649 89463,6196075 -50,904257400005 99,94%
300 88426,3612183 88457,2695151 30,9082968000002 100,03%
400 88078,1686912 88338,2859163 260,117225099995 100,30%
500 87791,1620039 88074,3418504 283,179846500003 100,32%
600 87552,3343394 87930,8645184 378,530178999994 100,43%
1000 86538,3772895 86771,1946099 232,817320400005 100,27%
avg 89204,4088731917 89420,444631825 1981,0816042 100,24%
For clients with 1 connections, the results are good.
But for clients with 100 and 200 connections, the results are not good.
I can't say why these two tests were so bad.
Because, 100 and 200 results, I'm not sure if this should go ahead, if it's
worth the effort.

Attached the results files and calc plan.

regards,
Ranier Vilela

Attachments:

v2-001-improve-scability-procarray.patchtext/x-patch; charset=US-ASCII; name=v2-001-improve-scability-procarray.patchDownload

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index cd58c5faf0..101a4e1c63 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -418,8 +418,8 @@ CreateSharedProcArray(void)
 		/*
 		 * We're the first - initialize.
 		 */
-		procArray->numProcs = 0;
 		procArray->maxProcs = PROCARRAY_MAXPROCS;
+		procArray->numProcs = 0;
 		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
 		procArray->numKnownAssignedXids = 0;
 		procArray->tailKnownAssignedXids = 0;
@@ -1039,6 +1039,7 @@ void
 ProcArrayApplyRecoveryInfo(RunningTransactions running)
 {
 	TransactionId *xids;
+	int			cnts;
 	int			nxids;
 	int			i;
 
@@ -1136,13 +1137,14 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	 * Allocate a temporary array to avoid modifying the array passed as
 	 * argument.
 	 */
-	xids = palloc(sizeof(TransactionId) * (running->xcnt + running->subxcnt));
+	cnts = (running->xcnt + running->subxcnt);
+	xids = palloc(sizeof(TransactionId) * cnts);
 
 	/*
 	 * Add to the temp array any xids which have not already completed.
 	 */
 	nxids = 0;
-	for (i = 0; i < running->xcnt + running->subxcnt; i++)
+	for (i = 0; i < cnts; i++)
 	{
 		TransactionId xid = running->xids[i];
 
@@ -1371,13 +1373,14 @@ TransactionIdIsInProgress(TransactionId xid)
 	static TransactionId *xids = NULL;
 	static TransactionId *other_xids;
 	XidCacheStatus *other_subxidstates;
-	int			nxids = 0;
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId topxid;
 	TransactionId latestCompletedXid;
+	int			nxids;
 	int			mypgxactoff;
 	int			numProcs;
-	int			j;
+	int			i;
+	bool		in_recovery;
 
 	/*
 	 * Don't bother checking a transaction older than RecentXmin; it could not
@@ -1416,6 +1419,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 * If first time through, get workspace to remember main XIDs in. We
 	 * malloc it permanently to avoid repeated palloc/pfree overhead.
 	 */
+	in_recovery = RecoveryInProgress();
 	if (xids == NULL)
 	{
 		/*
@@ -1423,7 +1427,7 @@ TransactionIdIsInProgress(TransactionId xid)
 		 * known-assigned list. If we later finish recovery, we no longer need
 		 * the bigger array, but we don't bother to shrink it.
 		 */
-		int			maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
+		int			maxxids = in_recovery ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
 
 		xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId));
 		if (xids == NULL)
@@ -1451,13 +1455,14 @@ TransactionIdIsInProgress(TransactionId xid)
 	}
 
 	/* No shortcuts, gotta grovel through the array */
+	nxids = 0;	
 	mypgxactoff = MyProc->pgxactoff;
 	numProcs = arrayP->numProcs;
 	for (int pgxactoff = 0; pgxactoff < numProcs; pgxactoff++)
 	{
-		int			pgprocno;
 		PGPROC	   *proc;
 		TransactionId pxid;
+		int			pgprocno;
 		int			pxids;
 
 		/* Ignore ourselves --- dealt with it above */
@@ -1494,10 +1499,10 @@ TransactionIdIsInProgress(TransactionId xid)
 		pg_read_barrier();		/* pairs with barrier in GetNewTransactionId() */
 		pgprocno = arrayP->pgprocnos[pgxactoff];
 		proc = &allProcs[pgprocno];
-		for (j = pxids - 1; j >= 0; j--)
+		for (i = pxids - 1; i >= 0; i--)
 		{
 			/* Fetch xid just once - see GetNewTransactionId */
-			TransactionId cxid = UINT32_ACCESS_ONCE(proc->subxids.xids[j]);
+			TransactionId cxid = UINT32_ACCESS_ONCE(proc->subxids.xids[i]);
 
 			if (TransactionIdEquals(cxid, xid))
 			{
@@ -1522,7 +1527,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 * Step 3: in hot standby mode, check the known-assigned-xids list.  XIDs
 	 * in the list must be treated as running.
 	 */
-	if (RecoveryInProgress())
+	if (in_recovery)
 	{
 		/* none of the PGPROC entries should have XIDs in hot standby mode */
 		Assert(nxids == 0);
@@ -1579,7 +1584,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	Assert(TransactionIdIsValid(topxid));
 	if (!TransactionIdEquals(topxid, xid))
 	{
-		for (int i = 0; i < nxids; i++)
+		for (i = 0; i < nxids; i++)
 		{
 			if (TransactionIdEquals(xids[i], topxid))
 				return true;
@@ -1600,10 +1605,11 @@ TransactionIdIsInProgress(TransactionId xid)
 bool
 TransactionIdIsActive(TransactionId xid)
 {
-	bool		result = false;
-	ProcArrayStruct *arrayP = procArray;
-	TransactionId *other_xids = ProcGlobal->xids;
+	ProcArrayStruct *arrayP;
+	const TransactionId *other_xids;
+	int			numProcs;
 	int			i;
+	bool		result;
 
 	/*
 	 * Don't bother checking a transaction older than RecentXmin; it could not
@@ -1612,13 +1618,18 @@ TransactionIdIsActive(TransactionId xid)
 	if (TransactionIdPrecedes(xid, RecentXmin))
 		return false;
 
+	arrayP = procArray;
+	other_xids = ProcGlobal->xids;
+	result = false;
+	
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (i = 0; i < arrayP->numProcs; i++)
+	numProcs = arrayP->numProcs;
+	for (i = 0; i < numProcs; i++)
 	{
-		int			pgprocno = arrayP->pgprocnos[i];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		const PGPROC  *proc;
 		TransactionId pxid;
+		int			pgprocno;
 
 		/* Fetch xid just once - see GetNewTransactionId */
 		pxid = UINT32_ACCESS_ONCE(other_xids[i]);
@@ -1626,6 +1637,8 @@ TransactionIdIsActive(TransactionId xid)
 		if (!TransactionIdIsValid(pxid))
 			continue;
 
+		pgprocno = arrayP->pgprocnos[i];
+		proc = &allProcs[pgprocno];
 		if (proc->pid == 0)
 			continue;			/* ignore prepared transactions */
 
@@ -1711,9 +1724,11 @@ static void
 ComputeXidHorizons(ComputeXidHorizonsResult *h)
 {
 	ProcArrayStruct *arrayP = procArray;
+	const TransactionId * const other_xids = ProcGlobal->xids;
+	const uint8 * const allStatusFlags = ProcGlobal->statusFlags;
 	TransactionId kaxmin;
+	int			numProcs;
 	bool		in_recovery = RecoveryInProgress();
-	TransactionId *other_xids = ProcGlobal->xids;
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
@@ -1762,14 +1777,14 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	 */
 	h->slot_xmin = procArray->replication_slot_xmin;
 	h->slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
-
-	for (int index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (int index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
-		int8		statusFlags = ProcGlobal->statusFlags[index];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC	*proc = &allProcs[pgprocno];
 		TransactionId xid;
 		TransactionId xmin;
+		int8		statusFlags;
 
 		/* Fetch xid just once - see GetNewTransactionId */
 		xid = UINT32_ACCESS_ONCE(other_xids[index]);
@@ -1802,6 +1817,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * removed, as long as pg_subtrans is not truncated) or doing logical
 		 * decoding (which manages xmin separately, check below).
 		 */
+		statusFlags = allStatusFlags[index];
 		if (statusFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING))
 			continue;
 
@@ -1828,10 +1844,10 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * horizon, as all xids are managed via the KnownAssignedXids
 		 * machinery.
 		 */
-		if (proc->databaseId == MyDatabaseId ||
+		if (in_recovery || 
 			MyDatabaseId == InvalidOid ||
-			(statusFlags & PROC_AFFECTS_ALL_HORIZONS) ||
-			in_recovery)
+			proc->databaseId == MyDatabaseId ||
+			(statusFlags & PROC_AFFECTS_ALL_HORIZONS))
 		{
 			/*
 			 * We can ignore this backend if it's running CREATE INDEX
@@ -2110,8 +2126,8 @@ GetSnapshotDataInitOldSnapshot(Snapshot snapshot)
 		 * If not using "snapshot too old" feature, fill related fields with
 		 * dummy values that don't require any locking.
 		 */
-		snapshot->lsn = InvalidXLogRecPtr;
 		snapshot->whenTaken = 0;
+		snapshot->lsn = InvalidXLogRecPtr;
 	}
 	else
 	{
@@ -2120,8 +2136,8 @@ GetSnapshotDataInitOldSnapshot(Snapshot snapshot)
 		 * snapshot becomes old enough to need to fall back on the special
 		 * "old snapshot" logic.
 		 */
-		snapshot->lsn = GetXLogInsertRecPtr();
 		snapshot->whenTaken = GetSnapshotCurrentTimestamp();
+		snapshot->lsn = GetXLogInsertRecPtr();
 		MaintainOldSnapshotTimeMapping(snapshot->whenTaken, snapshot->xmin);
 	}
 }
@@ -2138,15 +2154,12 @@ GetSnapshotDataInitOldSnapshot(Snapshot snapshot)
 static bool
 GetSnapshotDataReuse(Snapshot snapshot)
 {
-	uint64		curXactCompletionCount;
-
 	Assert(LWLockHeldByMe(ProcArrayLock));
 
 	if (unlikely(snapshot->snapXactCompletionCount == 0))
 		return false;
 
-	curXactCompletionCount = ShmemVariableCache->xactCompletionCount;
-	if (curXactCompletionCount != snapshot->snapXactCompletionCount)
+	if (ShmemVariableCache->xactCompletionCount != snapshot->snapXactCompletionCount)
 		return false;
 
 	/*
@@ -2175,10 +2188,10 @@ GetSnapshotDataReuse(Snapshot snapshot)
 	RecentXmin = snapshot->xmin;
 	Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));
 
+	snapshot->copied = false;
 	snapshot->curcid = GetCurrentCommandId(false);
 	snapshot->active_count = 0;
 	snapshot->regd_count = 0;
-	snapshot->copied = false;
 
 	GetSnapshotDataInitOldSnapshot(snapshot);
 
@@ -2221,21 +2234,19 @@ GetSnapshotDataReuse(Snapshot snapshot)
 Snapshot
 GetSnapshotData(Snapshot snapshot)
 {
-	ProcArrayStruct *arrayP = procArray;
-	TransactionId *other_xids = ProcGlobal->xids;
+	const TransactionId *other_xids;
 	TransactionId xmin;
 	TransactionId xmax;
-	int			count = 0;
-	int			subcount = 0;
-	bool		suboverflowed = false;
-	FullTransactionId latest_completed;
 	TransactionId oldestxid;
-	int			mypgxactoff;
 	TransactionId myxid;
+	TransactionId replication_slot_xmin;
+	TransactionId replication_slot_catalog_xmin;
+	FullTransactionId latest_completed;
 	uint64		curXactCompletionCount;
-
-	TransactionId replication_slot_xmin = InvalidTransactionId;
-	TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	int			count;
+	int			subcount;
+	int			mypgxactoff;
+	bool		suboverflowed;
 
 	Assert(snapshot != NULL);
 
@@ -2250,7 +2261,21 @@ GetSnapshotData(Snapshot snapshot)
 	 * xip arrays if any.  (This relies on the fact that all callers pass
 	 * static SnapshotData structs.)
 	 */
-	if (snapshot->xip == NULL)
+	if (snapshot->xip != NULL)
+	{
+		/*
+		 * It is sufficient to get shared lock on ProcArrayLock, even if we are
+		 * going to set MyProc->xmin.
+		 */
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
+
+		if (GetSnapshotDataReuse(snapshot))
+		{
+			LWLockRelease(ProcArrayLock);
+			return snapshot;
+		}
+	}
+	else
 	{
 		/*
 		 * First call for this snapshot. Snapshot is same size whether or not
@@ -2263,31 +2288,31 @@ GetSnapshotData(Snapshot snapshot)
 					(errcode(ERRCODE_OUT_OF_MEMORY),
 					 errmsg("out of memory")));
 		Assert(snapshot->subxip == NULL);
+
 		snapshot->subxip = (TransactionId *)
 			malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
 		if (snapshot->subxip == NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_OUT_OF_MEMORY),
 					 errmsg("out of memory")));
-	}
 
-	/*
-	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
-	 * going to set MyProc->xmin.
-	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
-
-	if (GetSnapshotDataReuse(snapshot))
-	{
-		LWLockRelease(ProcArrayLock);
-		return snapshot;
+		/*
+		 * It is sufficient to get shared lock on ProcArrayLock, even if we are
+		 * going to set MyProc->xmin.
+		 */
+		LWLockAcquire(ProcArrayLock, LW_SHARED);
 	}
 
-	latest_completed = ShmemVariableCache->latestCompletedXid;
+	count = 0;
+	subcount = 0;
+	suboverflowed = false;
+
 	mypgxactoff = MyProc->pgxactoff;
+	other_xids = ProcGlobal->xids;
 	myxid = other_xids[mypgxactoff];
 	Assert(myxid == MyProc->xid);
 
+	latest_completed = ShmemVariableCache->latestCompletedXid;
 	oldestxid = ShmemVariableCache->oldestXid;
 	curXactCompletionCount = ShmemVariableCache->xactCompletionCount;
 
@@ -2296,22 +2321,22 @@ GetSnapshotData(Snapshot snapshot)
 	TransactionIdAdvance(xmax);
 	Assert(TransactionIdIsNormal(xmax));
 
-	/* initialize xmin calculation with xmax */
-	xmin = xmax;
-
 	/* take own xid into account, saves a check inside the loop */
-	if (TransactionIdIsNormal(myxid) && NormalTransactionIdPrecedes(myxid, xmin))
+	if (TransactionIdIsNormal(myxid) && NormalTransactionIdPrecedes(myxid, xmax))
 		xmin = myxid;
+	else
+		/* initialize xmin calculation with xmax */
+		xmin = xmax;
 
 	snapshot->takenDuringRecovery = RecoveryInProgress();
 
 	if (!snapshot->takenDuringRecovery)
 	{
+		TransactionId * xip = snapshot->xip;
+		const uint8	   * const allStatusFlags = ProcGlobal->statusFlags;
+		const XidCacheStatus * const subxidStates = ProcGlobal->subxidStates;
+		const ProcArrayStruct * const arrayP = procArray;
 		int			numProcs = arrayP->numProcs;
-		TransactionId *xip = snapshot->xip;
-		int		   *pgprocnos = arrayP->pgprocnos;
-		XidCacheStatus *subxidStates = ProcGlobal->subxidStates;
-		uint8	   *allStatusFlags = ProcGlobal->statusFlags;
 
 		/*
 		 * First collect set of pgxactoff/xids that need to be included in the
@@ -2387,7 +2412,6 @@ GetSnapshotData(Snapshot snapshot)
 			 */
 			if (!suboverflowed)
 			{
-
 				if (subxidStates[pgxactoff].overflowed)
 					suboverflowed = true;
 				else
@@ -2396,8 +2420,8 @@ GetSnapshotData(Snapshot snapshot)
 
 					if (nsubxids > 0)
 					{
-						int			pgprocno = pgprocnos[pgxactoff];
-						PGPROC	   *proc = &allProcs[pgprocno];
+						int				pgprocno = arrayP->pgprocnos[pgxactoff];
+						const PGPROC	*proc = &allProcs[pgprocno];
 
 						pg_read_barrier();	/* pairs with GetNewTransactionId */
 
@@ -2549,22 +2573,20 @@ GetSnapshotData(Snapshot snapshot)
 	RecentXmin = xmin;
 	Assert(TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin));
 
+	/*
+	 * This is a new snapshot, so set both refcounts are zero, and mark it as
+	 * not copied in persistent memory.
+	 */
 	snapshot->xmin = xmin;
 	snapshot->xmax = xmax;
 	snapshot->xcnt = count;
 	snapshot->subxcnt = subcount;
 	snapshot->suboverflowed = suboverflowed;
-	snapshot->snapXactCompletionCount = curXactCompletionCount;
-
+	snapshot->copied = false;
 	snapshot->curcid = GetCurrentCommandId(false);
-
-	/*
-	 * This is a new snapshot, so set both refcounts are zero, and mark it as
-	 * not copied in persistent memory.
-	 */
 	snapshot->active_count = 0;
 	snapshot->regd_count = 0;
-	snapshot->copied = false;
+	snapshot->snapXactCompletionCount = curXactCompletionCount;
 
 	GetSnapshotDataInitOldSnapshot(snapshot);
 
@@ -2585,29 +2607,38 @@ bool
 ProcArrayInstallImportedXmin(TransactionId xmin,
 							 VirtualTransactionId *sourcevxid)
 {
-	bool		result = false;
-	ProcArrayStruct *arrayP = procArray;
+	ProcArrayStruct *arrayP;
+	const uint8 *allStatusFlags;
+	int			numProcs;
 	int			index;
+	bool		result;
 
 	Assert(TransactionIdIsNormal(xmin));
 	if (!sourcevxid)
 		return false;
 
+	arrayP = procArray;
+	allStatusFlags = ProcGlobal->statusFlags;
+	result = false;
+
 	/* Get lock so source xact can't end while we're doing this */
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
-		int			statusFlags = ProcGlobal->statusFlags[index];
+		const PGPROC *proc;
 		TransactionId xid;
+		int			pgprocno;
+		int			statusFlags = allStatusFlags[index];
 
 		/* Ignore procs running LAZY VACUUM */
 		if (statusFlags & PROC_IN_VACUUM)
 			continue;
 
 		/* We are only interested in the specific virtual transaction. */
+		pgprocno = arrayP->pgprocnos[index];
+		proc = &allProcs[pgprocno];
 		if (proc->backendId != sourcevxid->backendId)
 			continue;
 		if (proc->lxid != sourcevxid->localTransactionId)
@@ -2664,7 +2695,6 @@ bool
 ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 {
 	bool		result = false;
-	TransactionId xid;
 
 	Assert(TransactionIdIsNormal(xmin));
 	Assert(proc != NULL);
@@ -2680,21 +2710,25 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 	 * can't go backwards.  Also, make sure it's running in the same database,
 	 * so that the per-database xmin cannot go backwards.
 	 */
-	xid = UINT32_ACCESS_ONCE(proc->xmin);
-	if (proc->databaseId == MyDatabaseId &&
-		TransactionIdIsNormal(xid) &&
-		TransactionIdPrecedesOrEquals(xid, xmin))
+	if (proc->databaseId == MyDatabaseId)
 	{
-		/*
-		 * Install xmin and propagate the statusFlags that affect how the
-		 * value is interpreted by vacuum.
-		 */
-		MyProc->xmin = TransactionXmin = xmin;
-		MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
-			(proc->statusFlags & PROC_XMIN_FLAGS);
-		ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+		TransactionId xid;
 
-		result = true;
+		xid = UINT32_ACCESS_ONCE(proc->xmin);
+		if (TransactionIdIsNormal(xid) &&
+			TransactionIdPrecedesOrEquals(xid, xmin))
+		{
+			/*
+			 * Install xmin and propagate the statusFlags that affect how the
+			 * value is interpreted by vacuum.
+			 */
+			MyProc->xmin = TransactionXmin = xmin;
+			MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
+				(proc->statusFlags & PROC_XMIN_FLAGS);
+			ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+
+			result = true;
+		}
 	}
 
 	LWLockRelease(ProcArrayLock);
@@ -2740,11 +2774,13 @@ GetRunningTransactionData(void)
 	static RunningTransactionsData CurrentRunningXactsData;
 
 	ProcArrayStruct *arrayP = procArray;
-	TransactionId *other_xids = ProcGlobal->xids;
+	const TransactionId *other_xids = ProcGlobal->xids;
+	const XidCacheStatus *other_subxidstates = ProcGlobal->subxidStates;
 	RunningTransactions CurrentRunningXacts = &CurrentRunningXactsData;
 	TransactionId latestCompletedXid;
 	TransactionId oldestRunningXid;
 	TransactionId *xids;
+	int			numProcs;
 	int			index;
 	int			count;
 	int			subcount;
@@ -2794,7 +2830,8 @@ GetRunningTransactionData(void)
 	/*
 	 * Spin over procArray collecting all xids
 	 */
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
 		TransactionId xid;
 
@@ -2816,7 +2853,7 @@ GetRunningTransactionData(void)
 		if (TransactionIdPrecedes(xid, oldestRunningXid))
 			oldestRunningXid = xid;
 
-		if (ProcGlobal->subxidStates[index].overflowed)
+		if (!suboverflowed && other_subxidstates[index].overflowed)
 			suboverflowed = true;
 
 		/*
@@ -2836,12 +2873,8 @@ GetRunningTransactionData(void)
 	 */
 	if (!suboverflowed)
 	{
-		XidCacheStatus *other_subxidstates = ProcGlobal->subxidStates;
-
-		for (index = 0; index < arrayP->numProcs; index++)
+		for (index = 0; index < numProcs; index++)
 		{
-			int			pgprocno = arrayP->pgprocnos[index];
-			PGPROC	   *proc = &allProcs[pgprocno];
 			int			nsubxids;
 
 			/*
@@ -2851,6 +2884,9 @@ GetRunningTransactionData(void)
 			nsubxids = other_subxidstates[index].count;
 			if (nsubxids > 0)
 			{
+				int			pgprocno = arrayP->pgprocnos[index];
+				PGPROC	   *proc = &allProcs[pgprocno];
+
 				/* barrier not really required, as XidGenLock is held, but ... */
 				pg_read_barrier();	/* pairs with GetNewTransactionId */
 
@@ -2912,8 +2948,9 @@ TransactionId
 GetOldestActiveTransactionId(void)
 {
 	ProcArrayStruct *arrayP = procArray;
-	TransactionId *other_xids = ProcGlobal->xids;
+	const TransactionId *other_xids = ProcGlobal->xids;
 	TransactionId oldestRunningXid;
+	int			numProcs;
 	int			index;
 
 	Assert(!RecoveryInProgress());
@@ -2933,7 +2970,8 @@ GetOldestActiveTransactionId(void)
 	 * Spin over procArray collecting all xids and subxids.
 	 */
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
 		TransactionId xid;
 
@@ -2978,7 +3016,6 @@ GetOldestSafeDecodingTransactionId(bool catalogOnly)
 {
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId oldestSafeXid;
-	int			index;
 	bool		recovery_in_progress = RecoveryInProgress();
 
 	Assert(LWLockHeldByMe(ProcArrayLock));
@@ -3001,16 +3038,16 @@ GetOldestSafeDecodingTransactionId(bool catalogOnly)
 	 * slot's general xmin horizon, but the catalog horizon is only usable
 	 * when only catalog data is going to be looked at.
 	 */
-	if (TransactionIdIsValid(procArray->replication_slot_xmin) &&
-		TransactionIdPrecedes(procArray->replication_slot_xmin,
+	if (TransactionIdIsValid(arrayP->replication_slot_xmin) &&
+		TransactionIdPrecedes(arrayP->replication_slot_xmin,
 							  oldestSafeXid))
-		oldestSafeXid = procArray->replication_slot_xmin;
+		oldestSafeXid = arrayP->replication_slot_xmin;
 
 	if (catalogOnly &&
-		TransactionIdIsValid(procArray->replication_slot_catalog_xmin) &&
-		TransactionIdPrecedes(procArray->replication_slot_catalog_xmin,
+		TransactionIdIsValid(arrayP->replication_slot_catalog_xmin) &&
+		TransactionIdPrecedes(arrayP->replication_slot_catalog_xmin,
 							  oldestSafeXid))
-		oldestSafeXid = procArray->replication_slot_catalog_xmin;
+		oldestSafeXid = arrayP->replication_slot_catalog_xmin;
 
 	/*
 	 * If we're not in recovery, we walk over the procarray and collect the
@@ -3026,12 +3063,15 @@ GetOldestSafeDecodingTransactionId(bool catalogOnly)
 	 */
 	if (!recovery_in_progress)
 	{
-		TransactionId *other_xids = ProcGlobal->xids;
+		const TransactionId *other_xids = ProcGlobal->xids;
+		int			numProcs;
+		int			index;
 
 		/*
 		 * Spin over procArray collecting min(ProcGlobal->xids[i])
 		 */
-		for (index = 0; index < arrayP->numProcs; index++)
+		numProcs = arrayP->numProcs;
+		for (index = 0; index < numProcs; index++)
 		{
 			TransactionId xid;
 
@@ -3076,6 +3116,7 @@ GetVirtualXIDsDelayingChkpt(int *nvxids, int type)
 {
 	VirtualTransactionId *vxids;
 	ProcArrayStruct *arrayP = procArray;
+	int			numProcs;
 	int			count = 0;
 	int			index;
 
@@ -3087,10 +3128,11 @@ GetVirtualXIDsDelayingChkpt(int *nvxids, int type)
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC   *proc = &allProcs[pgprocno];
 
 		if ((proc->delayChkptFlags & type) != 0)
 		{
@@ -3120,18 +3162,19 @@ GetVirtualXIDsDelayingChkpt(int *nvxids, int type)
 bool
 HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids, int type)
 {
-	bool		result = false;
 	ProcArrayStruct *arrayP = procArray;
+	int			numProcs;
 	int			index;
 
 	Assert(type != 0);
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC   *proc = &allProcs[pgprocno];
 		VirtualTransactionId vxid;
 
 		GET_VXID_FROM_PGPROC(vxid, *proc);
@@ -3145,18 +3188,16 @@ HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids, int type)
 			{
 				if (VirtualTransactionIdEquals(vxid, vxids[i]))
 				{
-					result = true;
-					break;
+					LWLockRelease(ProcArrayLock);
+					return true;
 				}
 			}
-			if (result)
-				break;
 		}
 	}
 
 	LWLockRelease(ProcArrayLock);
 
-	return result;
+	return false;
 }
 
 /*
@@ -3192,25 +3233,25 @@ BackendPidGetProc(int pid)
 PGPROC *
 BackendPidGetProcWithLock(int pid)
 {
-	PGPROC	   *result = NULL;
-	ProcArrayStruct *arrayP = procArray;
+	ProcArrayStruct *arrayP;
+	int			numProcs;
 	int			index;
 
 	if (pid == 0)				/* never match dummy PGPROCs */
 		return NULL;
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	arrayP = procArray;
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		PGPROC	   *proc = &allProcs[arrayP->pgprocnos[index]];
+		int		pgprocno = arrayP->pgprocnos[index];
+		PGPROC *proc = &allProcs[pgprocno];
 
 		if (proc->pid == pid)
-		{
-			result = proc;
-			break;
-		}
+			return proc;
 	}
 
-	return result;
+	return NULL;
 }
 
 /*
@@ -3229,23 +3270,29 @@ BackendPidGetProcWithLock(int pid)
 int
 BackendXidGetPid(TransactionId xid)
 {
-	int			result = 0;
-	ProcArrayStruct *arrayP = procArray;
-	TransactionId *other_xids = ProcGlobal->xids;
+	ProcArrayStruct *arrayP;
+	const TransactionId *other_xids;
+	int			numProcs;
 	int			index;
+	int			result;
 
 	if (xid == InvalidTransactionId)	/* never match invalid xid */
 		return 0;
 
+	arrayP = procArray;
+ 	other_xids = ProcGlobal->xids;
+ 	result = 0;
+
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
-
 		if (other_xids[index] == xid)
 		{
+			int				pgprocno = arrayP->pgprocnos[index];
+			const PGPROC	*proc = &allProcs[pgprocno];
+
 			result = proc->pid;
 			break;
 		}
@@ -3301,6 +3348,8 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 {
 	VirtualTransactionId *vxids;
 	ProcArrayStruct *arrayP = procArray;
+	const uint8	* const allStatusFlags = ProcGlobal->statusFlags;
+	int			numProcs;
 	int			count = 0;
 	int			index;
 
@@ -3310,15 +3359,17 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
-		uint8		statusFlags = ProcGlobal->statusFlags[index];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC	*proc = &allProcs[pgprocno];
+		uint8			statusFlags;
 
 		if (proc == MyProc)
 			continue;
 
+		statusFlags = allStatusFlags[index];
 		if (excludeVacuum & statusFlags)
 			continue;
 
@@ -3387,6 +3438,7 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 {
 	static VirtualTransactionId *vxids;
 	ProcArrayStruct *arrayP = procArray;
+	int 		numProcs;
 	int			count = 0;
 	int			index;
 
@@ -3407,10 +3459,11 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid)
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC	*proc = &allProcs[pgprocno];
 
 		/* Exclude prepared transactions */
 		if (proc->pid == 0)
@@ -3467,15 +3520,17 @@ SignalVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode,
 						 bool conflictPending)
 {
 	ProcArrayStruct *arrayP = procArray;
+	int			numProcs;
 	int			index;
 	pid_t		pid = 0;
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int		pgprocno = arrayP->pgprocnos[index];
+		PGPROC	*proc = &allProcs[pgprocno];
 		VirtualTransactionId procvxid;
 
 		GET_VXID_FROM_PGPROC(procvxid, *proc);
@@ -3514,8 +3569,9 @@ SignalVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode,
 bool
 MinimumActiveBackends(int min)
 {
-	ProcArrayStruct *arrayP = procArray;
-	int			count = 0;
+	ProcArrayStruct *arrayP;
+	int			numProcs;
+	int			count;
 	int			index;
 
 	/* Quick short-circuit if no minimum is specified */
@@ -3527,10 +3583,13 @@ MinimumActiveBackends(int min)
 	 * bogus, but since we are only testing fields for zero or nonzero, it
 	 * should be OK.  The result is only used for heuristic purposes anyway...
 	 */
-	for (index = 0; index < arrayP->numProcs; index++)
+	count = 0; 
+	arrayP = procArray;
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC   *proc = &allProcs[pgprocno];
 
 		/*
 		 * Since we're not holding a lock, need to be prepared to deal with
@@ -3555,7 +3614,7 @@ MinimumActiveBackends(int min)
 			continue;			/* do not count if blocked on a lock */
 		count++;
 		if (count >= min)
-			break;
+			return true;
 	}
 
 	return count >= min;
@@ -3568,15 +3627,17 @@ int
 CountDBBackends(Oid databaseid)
 {
 	ProcArrayStruct *arrayP = procArray;
+	int			numProcs;
 	int			count = 0;
 	int			index;
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC	*proc = &allProcs[pgprocno];
 
 		if (proc->pid == 0)
 			continue;			/* do not count prepared xacts */
@@ -3598,15 +3659,17 @@ int
 CountDBConnections(Oid databaseid)
 {
 	ProcArrayStruct *arrayP = procArray;
+	int			numProcs;
 	int			count = 0;
 	int			index;
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC	*proc = &allProcs[pgprocno];
 
 		if (proc->pid == 0)
 			continue;			/* do not count prepared xacts */
@@ -3629,12 +3692,14 @@ void
 CancelDBBackends(Oid databaseid, ProcSignalReason sigmode, bool conflictPending)
 {
 	ProcArrayStruct *arrayP = procArray;
+	int			numProcs;
 	int			index;
 
 	/* tell all backends to die */
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
 		int			pgprocno = arrayP->pgprocnos[index];
 		PGPROC	   *proc = &allProcs[pgprocno];
@@ -3669,15 +3734,17 @@ int
 CountUserBackends(Oid roleid)
 {
 	ProcArrayStruct *arrayP = procArray;
+	int			numProcs;
 	int			count = 0;
 	int			index;
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (index = 0; index < arrayP->numProcs; index++)
+	numProcs = arrayP->numProcs;
+	for (index = 0; index < numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[index];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int				pgprocno = arrayP->pgprocnos[index];
+		const PGPROC   *proc = &allProcs[pgprocno];
 
 		if (proc->pid == 0)
 			continue;			/* do not count prepared xacts */
@@ -3719,6 +3786,7 @@ bool
 CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 {
 	ProcArrayStruct *arrayP = procArray;
+	const uint8	*allStatusFlags = ProcGlobal->statusFlags;
 
 #define MAXAUTOVACPIDS	10		/* max autovacs to SIGTERM per iteration */
 	int			autovac_pids[MAXAUTOVACPIDS];
@@ -3728,8 +3796,9 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 	for (tries = 0; tries < 50; tries++)
 	{
 		int			nautovacs = 0;
-		bool		found = false;
+		int			numProcs;
 		int			index;
+		bool		found = false;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -3737,11 +3806,11 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 
 		LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-		for (index = 0; index < arrayP->numProcs; index++)
+		numProcs = arrayP->numProcs;
+		for (index = 0; index < numProcs; index++)
 		{
-			int			pgprocno = arrayP->pgprocnos[index];
-			PGPROC	   *proc = &allProcs[pgprocno];
-			uint8		statusFlags = ProcGlobal->statusFlags[index];
+			int				pgprocno = arrayP->pgprocnos[index];
+			const PGPROC	*proc = &allProcs[pgprocno];
 
 			if (proc->databaseId != databaseId)
 				continue;
@@ -3754,6 +3823,8 @@ CountOtherDBBackends(Oid databaseId, int *nbackends, int *nprepared)
 				(*nprepared)++;
 			else
 			{
+				uint8		statusFlags = allStatusFlags[index];
+
 				(*nbackends)++;
 				if ((statusFlags & PROC_IS_AUTOVACUUM) &&
 					nautovacs < MAXAUTOVACPIDS)
@@ -3798,15 +3869,17 @@ TerminateOtherDBBackends(Oid databaseId)
 {
 	ProcArrayStruct *arrayP = procArray;
 	List	   *pids = NIL;
+	int			numProcs;
 	int			nprepared = 0;
 	int			i;
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (i = 0; i < procArray->numProcs; i++)
+	numProcs = arrayP->numProcs;
+	for (i = 0; i < numProcs; i++)
 	{
-		int			pgprocno = arrayP->pgprocnos[i];
-		PGPROC	   *proc = &allProcs[pgprocno];
+		int				pgprocno = arrayP->pgprocnos[i];
+		const PGPROC	*proc = &allProcs[pgprocno];
 
 		if (proc->databaseId != databaseId)
 			continue;
@@ -3847,8 +3920,8 @@ TerminateOtherDBBackends(Oid databaseId)
 		 */
 		foreach(lc, pids)
 		{
-			int			pid = lfirst_int(lc);
-			PGPROC	   *proc = BackendPidGetProc(pid);
+			int				pid = lfirst_int(lc);
+			const PGPROC	*proc = BackendPidGetProc(pid);
 
 			if (proc != NULL)
 			{
@@ -3875,8 +3948,8 @@ TerminateOtherDBBackends(Oid databaseId)
 		 */
 		foreach(lc, pids)
 		{
-			int			pid = lfirst_int(lc);
-			PGPROC	   *proc = BackendPidGetProc(pid);
+			int				pid = lfirst_int(lc);
+			const PGPROC	*proc = BackendPidGetProc(pid);
 
 			if (proc != NULL)
 			{
@@ -3951,9 +4024,9 @@ XidCacheRemoveRunningXids(TransactionId xid,
 						  int nxids, const TransactionId *xids,
 						  TransactionId latestXid)
 {
+	XidCacheStatus *mysubxidstat;
 	int			i,
 				j;
-	XidCacheStatus *mysubxidstat;
 
 	Assert(TransactionIdIsValid(xid));
 
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 5bc2a15160..37259a4c31 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1800,8 +1800,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 	TransactionId xlimit = recentXmin;
 	TransactionId latest_xmin;
 	TimestampTz next_map_update_ts;
-	TransactionId threshold_timestamp;
-	TransactionId threshold_xid;
 
 	Assert(TransactionIdIsNormal(recentXmin));
 	Assert(OldSnapshotThresholdActive());
@@ -1839,6 +1837,9 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 	}
 	else
 	{
+		TransactionId threshold_timestamp;
+		TransactionId threshold_xid;
+
 		ts = AlignTimestampToMinuteBoundary(ts)
 			- (old_snapshot_threshold * USECS_PER_MINUTE);
 
@@ -1901,8 +1902,6 @@ void
 MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 {
 	TimestampTz ts;
-	TransactionId latest_xmin;
-	TimestampTz update_ts;
 	bool		map_update_required = false;
 
 	/* Never call this function when old snapshot checking is disabled. */
@@ -1915,14 +1914,12 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	 * a new value when we have crossed a bucket boundary.
 	 */
 	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
-	latest_xmin = oldSnapshotControl->latest_xmin;
-	update_ts = oldSnapshotControl->next_map_update;
-	if (ts > update_ts)
+	if (ts > oldSnapshotControl->next_map_update)
 	{
 		oldSnapshotControl->next_map_update = ts;
 		map_update_required = true;
 	}
-	if (TransactionIdFollows(xmin, latest_xmin))
+	if (TransactionIdFollows(xmin, oldSnapshotControl->latest_xmin))
 		oldSnapshotControl->latest_xmin = xmin;
 	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
 
@@ -2284,8 +2281,6 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
 bool
 XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 {
-	uint32		i;
-
 	/*
 	 * Make a quick range check to eliminate most XIDs without looking at the
 	 * xip arrays.  Note that this is OK even if we convert a subxact XID to
@@ -2307,6 +2302,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 	 */
 	if (!snapshot->takenDuringRecovery)
 	{
+		uint32		i;
+
 		/*
 		 * If the snapshot contains full subxact data, the fastest way to
 		 * check things is just to compare the given XID against both subxact

procarray_bench.tar.xzapplication/x-xz; name=procarray_bench.tar.xzDownload

#27

tomas.vondra@enterprisedb.com

over 3 years ago

In reply to: Ranier Vilela (#26)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 5/31/22 16:36, Ranier Vilela wrote:

Em dom., 29 de mai. de 2022 às 17:10, Ranier Vilela <ranier.vf@gmail.com
<mailto:ranier.vf@gmail.com>> escreveu:

Em dom., 29 de mai. de 2022 às 15:21, Andres Freund
<andres@anarazel.de <mailto:andres@anarazel.de>> escreveu:

On 2022-05-29 18:00:14 +0200, Tomas Vondra wrote:

Also, there's the question of correctness, and I'd bet Andres

is right

getting snapshot without holding ProcArrayLock is busted.

Unless there's some actual analysis of this by Rainier, I'm just
going to
ignore this thread going forward. It's pointless to invest time when
everything we say is just ignored.

Sorry, just not my intention to ignore this important point.
Of course, any performance gain is good, but robustness comes first.

As soon as I have some time.

I redid the benchmarks, with getting a snapshot with holding ProcArrayLock.

Average Results

Connections:
tps head tps patch diff
1 39196,3088985 39858,0207936 661,711895100008 101,69%
2 65050,8643819 65245,9852367 195,1208548 100,30%
5 91486,0298359 91862,9026528 376,872816899995 100,41%
10 131318,0774955 131547,1404573 229,062961799995 100,17%
50 116531,2334144 116687,0325522 155,799137800001 100,13%
100 98969,4650449 98808,6778717 -160,787173199991 99,84%
200 89514,5238649 89463,6196075 -50,904257400005 99,94%
300 88426,3612183 88457,2695151 30,9082968000002 100,03%
400 88078,1686912 88338,2859163 260,117225099995 100,30%
500 87791,1620039 88074,3418504 283,179846500003 100,32%
600 87552,3343394 87930,8645184 378,530178999994 100,43%
1000 86538,3772895 86771,1946099 232,817320400005 100,27%
avg 89204,4088731917 89420,444631825 1981,0816042 100,24%

For clients with 1 connections, the results are good.

Isn't that a bit strange, considering the aim of this patch was
scalability? Which should improve higher client counts in the first place.

But for clients with 100 and 200 connections, the results are not good.
I can't say why these two tests were so bad.
Because, 100 and 200 results, I'm not sure if this should go ahead, if
it's worth the effort.

I'd argue this is either just noise, and there's no actual difference.
This could be verified by some sort of statistical testing (e.g. the
well known t-test).

Another option is that this is simply due to differences in binary
layout - this can result in small differences (easily 1-2%) that are
completely unrelated to what the patch does. This is exactly what the
"stabilizer" talk I mentioned a couple days ago was about.

FWIW, when a patch improves scalability, the improvement usually
increases with the number of clients. So you'd see no/small improvement
for 10 clients, 100 clients would be improved more, 200 more, etc. We
see nothing like that here. So either the patch does not really improve
anything, or perhaps the benchmark doesn't even hit the bottleneck the
patch is meant to improve (which was already suggested in this thread
repeatedly).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#28

Jacob Champion

jchampion@timescale.com

over 3 years ago

In reply to: Tomas Vondra (#27)

Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

On 5/31/22 11:44, Tomas Vondra wrote:

I'd argue this is either just noise, and there's no actual difference.
This could be verified by some sort of statistical testing (e.g. the
well known t-test).

Given the conversation so far, I'll go ahead and mark this Returned with
Feedback. Specifically, this patch would need hard statistical proof
that it's having a positive effect.

--Jacob

#29