FWD: fastlock+lazyvzid patch performance

Started by Nonameover 14 years ago4 messages
#1Noname
karavelov@mail.bg

Hello,

I have seen the discussions about fastlock patch and lazy-vxid performance
degradation, so I decided to test it myself.

The setup:
- hardware
Supermicro blade
6xSAS @15k on LSI RAID:
1 disk for system + pg_xlog
4 disk RAID 10 for data
1 disk for spare
2 x Xeon E5405 @2GHz (no HT), 8 cores total
8G RAM

- software
Debian Sid, linux-2.6.39.1
Postgresql 9.1 beta2, compiled by debian sources
incrementally applied fastlock v3 and lazy-vxid v1 patches. I have to resolve
manually a conflict in src/backend/storage/lmgr/proc.c
Configuration: increased shared_mem to 2G, max_connections to 500

- pgbench
initiated datasert with scaling factor 100
example command invocation: ./pgbench -h 127.0.0.1 -n -S -T 30 -c 8 -j 8 -M prepared pgtest

Results:

clients beta2 +fastlock +lazyvzid local socket
8 76064 92430 92198 106734
16 64254 90788 90698 105097
32 56629 88189 88269 101202
64 51124 84354 84639 96362
128 45455 79361 79724 90625
256 40370 71904 72737 82434

All runs are executed on warm cache, I made some runs for 300s with the same results (tps).
I have done some runs with -M simple with identical distribution across cleints.

I post this results because they somehow contradict with previous results posted on the list. In
my case the patches does not only improve peak performance but also improve the performance
under load - without patches the performance with 256 clients is 53% of the peak performance
that is obtained with 8 clients, with patches the performance with 256 client is 79% of the peak
with 8 clients.

Best regards
Luben Karavelov

P.S. Excuse me for starting new thread - I am new on the list.

#2Robert Haas
robertmhaas@gmail.com
In reply to: Noname (#1)
Re: FWD: fastlock+lazyvzid patch performance

On Fri, Jun 24, 2011 at 3:31 PM, <karavelov@mail.bg> wrote:

I post this results because they somehow contradict with previous results
posted on the list. In
my case the patches does not only improve peak performance but also improve
the performance
under load - without patches the performance with 256 clients is 53% of the
peak performance
that is obtained with 8 clients, with patches the performance with 256
client is 79% of the peak
with 8 clients.

I think this is strongly related to core count. The spinlock
contention problems don't become really bad until you get up above 32
CPUs... at least from what I can tell so far.

So I'm not surprised it was just a straight win on your machine... but
thanks for verifying. It's helpful to have more data points.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Robert Haas
robertmhaas@gmail.com
In reply to: Noname (#1)
Re: FWD: fastlock+lazyvzid patch performance

On Fri, Jun 24, 2011 at 3:31 PM, <karavelov@mail.bg> wrote:

clients beta2 +fastlock +lazyvzid local socket
8 76064 92430 92198 106734
16 64254 90788 90698 105097
32 56629 88189 88269 101202
64 51124 84354 84639 96362
128 45455 79361 79724 90625
256 40370 71904 72737 82434

I'm having trouble interpreting this table.

Column 1: # of clients
Column 2: TPS using 9.1beta2 unpatched
Column 3: TPS using 9.1beta2 + fastlock patch
Column 4: TPS using 9.1beta2 + fastlock patch + vxid patch
Column 5: ???

At any rate, that is a big improvement on a system with only 8 cores.
I would have thought you would have needed ~16 cores to get that much
speedup. I wonder if the -M prepared makes a difference ... I wasn't
using that option.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Noname
karavelov@mail.bg
In reply to: Robert Haas (#3)
Re: FWD: fastlock+lazyvzid patch performance

----- Цитат от Robert Haas (robertmhaas@gmail.com), на 25.06.2011 в 00:16 -----

On Fri, Jun 24, 2011 at 3:31 PM, wrote:

clients beta2 +fastlock +lazyvzid local socket
8 76064 92430 92198 106734
16 64254 90788 90698 105097
32 56629 88189 88269 101202
64 51124 84354 84639 96362
128 45455 79361 79724 90625
256 40370 71904 72737 82434

I'm having trouble interpreting this table.

Column 1: # of clients
Column 2: TPS using 9.1beta2 unpatched
Column 3: TPS using 9.1beta2 + fastlock patch
Column 4: TPS using 9.1beta2 + fastlock patch + vxid patch
Column 5: ???

9.1beta2 + fastlock patch + vxid patch , pgbench run on unix domain
socket, the other tests are using local TCP connection.

At any rate, that is a big improvement on a system with only 8 cores.
I would have thought you would have needed ~16 cores to get that much
speedup. I wonder if the -M prepared makes a difference ... I wasn't
using that option.

Yes, it does make some difference,
Using unpatched beta2, 8 clients with simple protocol I get 57059 tps.
With all patches and simple protocol I get 60707 tps. So the difference
between patched/stock is not so big. I suppose the system gets CPU bound
on parsing and planning every submitted request. With -M extended I
get even slower results.

Luben

--
"Perhaps, there is no greater love than that of a
revolutionary couple where each of the two lovers is
ready to abandon the other at any moment if revolution
demands it."
Zizek