testing HS/SR - 1 vs 2 performance

Started by Erik Rijkersabout 16 years ago113 messageshackers

er@xs4all.nl

about 16 years ago

Using 9.0devel cvs HEAD, 2010.04.08.

I am trying to understand the performance difference
between primary and standby under a standard pgbench
read-only test.

server has 32 GB, 2 quadcores.

primary:
tps = 34606.747930 (including connections establishing)
tps = 34527.078068 (including connections establishing)
tps = 34654.297319 (including connections establishing)

standby:
tps = 700.346283 (including connections establishing)
tps = 717.576886 (including connections establishing)
tps = 740.522472 (including connections establishing)

transaction type: SELECT only
scaling factor: 1000
query mode: simple
number of clients: 20
number of threads: 1
duration: 900 s

both instances have
max_connections = 100
shared_buffers = 256MB
checkpoint_segments = 50
effective_cache_size= 16GB

See also:

http://archives.postgresql.org/pgsql-testers/2010-04/msg00005.php
(differences with scale 10_000)

I understand that in the scale=1000 case, there is a huge
cache effect, but why doesn't that apply to the pgbench runs
against the standby? (and for the scale=10_000 case the
differences are still rather large)

Maybe these differences are as expected. I don't find
any explanation in the documentation.

thanks,

Erik Rijkers

Fujii Masao

masao.fujii@gmail.com

about 16 years ago

In reply to: Erik Rijkers (#1)

Re: testing HS/SR - 1 vs 2 performance

On Sat, Apr 10, 2010 at 8:23 AM, Erik Rijkers <er@xs4all.nl> wrote:

I understand that in the scale=1000 case, there is a huge
cache effect, but why doesn't that apply to the pgbench runs
against the standby? (and for the scale=10_000 case the
differences are still rather large)

I guess that this performance degradation happened because a number of
buffer replacements caused UpdateMinRecoveryPoint() often. So I think
increasing shared_buffers would improve the performance significantly.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Robert Haas

robertmhaas@gmail.com

about 16 years ago

In reply to: Fujii Masao (#2)

Re: testing HS/SR - 1 vs 2 performance

On Mon, Apr 12, 2010 at 5:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Sat, Apr 10, 2010 at 8:23 AM, Erik Rijkers <er@xs4all.nl> wrote:

I understand that in the scale=1000 case, there is a huge
cache effect, but why doesn't that apply to the pgbench runs
against the standby? (and for the scale=10_000 case the
differences are still rather large)

I guess that this performance degradation happened because a number of
buffer replacements caused UpdateMinRecoveryPoint() often. So I think
increasing shared_buffers would improve the performance significantly.

I think we need to investigate this more. It's not going to look good
for the project if people find that a hot standby server runs two
orders of magnitude slower than the primary.

...Robert

Erik Rijkers

er@xs4all.nl

about 16 years ago

In reply to: Erik Rijkers (#1)

Re: testing HS/SR - 1 vs 2 performance

On Sat, April 10, 2010 01:23, Erik Rijkers wrote:

Using 9.0devel cvs HEAD, 2010.04.08.

I am trying to understand the performance difference
between primary and standby under a standard pgbench
read-only test.

server has 32 GB, 2 quadcores.

primary:
tps = 34606.747930 (including connections establishing)
tps = 34527.078068 (including connections establishing)
tps = 34654.297319 (including connections establishing)

standby:
tps = 700.346283 (including connections establishing)
tps = 717.576886 (including connections establishing)
tps = 740.522472 (including connections establishing)

transaction type: SELECT only
scaling factor: 1000
query mode: simple
number of clients: 20
number of threads: 1
duration: 900 s

both instances have
max_connections = 100
shared_buffers = 256MB
checkpoint_segments = 50
effective_cache_size= 16GB

See also:

http://archives.postgresql.org/pgsql-testers/2010-04/msg00005.php
(differences with scale 10_000)

To my surprise, I have later seen the opposite behaviour with the standby giving fast runs, and
the primary slow.

FWIW, I've overnight run a larget set of tests. (against same 9.0devel
instances as the ones from the earlier email).

These results are generally more balanced.

for scale in
for clients in 1 5 10 20
for port in 6565 6566 --> primaryport standbyport
for run in `seq 1 3`
pgbench ...
sleep ((scale / 10) * 60)
done
done
done
done

(so below, alternating 3 primary, followed by 3 standby runs)

scale: 10 clients: 1 tps = 15219.019272 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 10 clients: 1 tps = 15301.847615 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 10 clients: 1 tps = 15238.907436 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 10 clients: 1 tps = 12129.928289 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 10 clients: 1 tps = 12151.711589 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 10 clients: 1 tps = 12203.494512 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 10 clients: 5 tps = 60248.120599 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 10 clients: 5 tps = 60827.949875 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 10 clients: 5 tps = 61167.447476 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 10 clients: 5 tps = 50750.385403 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 10 clients: 5 tps = 50600.891436 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 10 clients: 5 tps = 50486.857610 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 10 clients: 10 tps = 60307.739327 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 10 clients: 10 tps = 60264.230349 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 10 clients: 10 tps = 60146.370598 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 10 clients: 10 tps = 50455.537671 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 10 clients: 10 tps = 49877.000813 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 10 clients: 10 tps = 50097.949766 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 10 clients: 20 tps = 43355.220657 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 10 clients: 20 tps = 43352.725422 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 10 clients: 20 tps = 43496.085623 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 10 clients: 20 tps = 37169.126299 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 10 clients: 20 tps = 37100.260450 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 10 clients: 20 tps = 37342.758507 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 100 clients: 1 tps = 12514.185089 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 100 clients: 1 tps = 12542.842198 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 100 clients: 1 tps = 12595.688640 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 100 clients: 1 tps = 10435.681851 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 100 clients: 1 tps = 10456.983353 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 100 clients: 1 tps = 10434.213044 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 100 clients: 5 tps = 48682.166988 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 100 clients: 5 tps = 48656.883485 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 100 clients: 5 tps = 48687.894655 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 100 clients: 5 tps = 41901.629933 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 100 clients: 5 tps = 41953.386791 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 100 clients: 5 tps = 41787.962712 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 100 clients: 10 tps = 48704.247239 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 100 clients: 10 tps = 48941.190050 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 100 clients: 10 tps = 48603.077936 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 100 clients: 10 tps = 42948.666272 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 100 clients: 10 tps = 42767.793899 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 100 clients: 10 tps = 42612.670983 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 100 clients: 20 tps = 36350.454258 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 100 clients: 20 tps = 36373.088111 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 100 clients: 20 tps = 36490.886781 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 100 clients: 20 tps = 32235.811228 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 100 clients: 20 tps = 32253.837906 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 100 clients: 20 tps = 32144.189047 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 500 clients: 1 tps = 11733.254970 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 500 clients: 1 tps = 11726.665739 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 500 clients: 1 tps = 11617.622548 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 500 clients: 1 tps = 9769.861175 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 500 clients: 1 tps = 9878.465752 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 500 clients: 1 tps = 9808.236216 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 500 clients: 5 tps = 45185.900553 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 500 clients: 5 tps = 45170.334037 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 500 clients: 5 tps = 45136.596374 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 500 clients: 5 tps = 39231.863815 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 500 clients: 5 tps = 39336.889619 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 500 clients: 5 tps = 39269.483772 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 500 clients: 10 tps = 45468.080680 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 500 clients: 10 tps = 45727.159963 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 500 clients: 10 tps = 45399.241367 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 500 clients: 10 tps = 40759.108042 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 500 clients: 10 tps = 40783.287718 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 500 clients: 10 tps = 40858.007847 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 500 clients: 20 tps = 34729.742313 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 500 clients: 20 tps = 34705.119029 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 500 clients: 20 tps = 34617.517224 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 500 clients: 20 tps = 31252.355034 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 500 clients: 20 tps = 31234.885791 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 500 clients: 20 tps = 31273.307637 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 1000 clients: 1 tps = 220.024691 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 1000 clients: 1 tps = 294.855794 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 1000 clients: 1 tps = 375.152757 pgbench -h /tmp -p 6565 -n -S -c 1 -T 900 -j 1
scale: 1000 clients: 1 tps = 295.965959 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 1000 clients: 1 tps = 1036.517110 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 1000 clients: 1 tps = 9167.012603 pgbench -h /tmp -p 6566 -n -S -c 1 -T 900 -j 1
scale: 1000 clients: 5 tps = 1241.224282 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 1000 clients: 5 tps = 1894.806301 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 1000 clients: 5 tps = 18532.885549 pgbench -h /tmp -p 6565 -n -S -c 5 -T 900 -j 1
scale: 1000 clients: 5 tps = 1497.491279 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 1000 clients: 5 tps = 1480.164166 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 1000 clients: 5 tps = 3470.769236 pgbench -h /tmp -p 6566 -n -S -c 5 -T 900 -j 1
scale: 1000 clients: 10 tps = 2414.552333 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 1000 clients: 10 tps = 19248.609443 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 1000 clients: 10 tps = 45059.231609 pgbench -h /tmp -p 6565 -n -S -c 10 -T 900 -j 1
scale: 1000 clients: 10 tps = 1648.526373 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 1000 clients: 10 tps = 3659.800008 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 1000 clients: 10 tps = 35900.769857 pgbench -h /tmp -p 6566 -n -S -c 10 -T 900 -j 1
scale: 1000 clients: 20 tps = 2462.855864 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 1000 clients: 20 tps = 27168.407568 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 1000 clients: 20 tps = 34438.802096 pgbench -h /tmp -p 6565 -n -S -c 20 -T 900 -j 1
scale: 1000 clients: 20 tps = 2933.220489 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 1000 clients: 20 tps = 25586.972428 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1
scale: 1000 clients: 20 tps = 30926.189621 pgbench -h /tmp -p 6566 -n -S -c 20 -T 900 -j 1

Jim Mlodgenski

jimmy76@gmail.com

about 16 years ago

In reply to: Robert Haas (#3)

Re: testing HS/SR - 1 vs 2 performance

On Mon, Apr 12, 2010 at 7:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 12, 2010 at 5:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Sat, Apr 10, 2010 at 8:23 AM, Erik Rijkers <er@xs4all.nl> wrote:

I understand that in the scale=1000 case, there is a huge
cache effect, but why doesn't that apply to the pgbench runs
against the standby? (and for the scale=10_000 case the
differences are still rather large)

I guess that this performance degradation happened because a number of
buffer replacements caused UpdateMinRecoveryPoint() often. So I think
increasing shared_buffers would improve the performance significantly.

I think we need to investigate this more. It's not going to look good
for the project if people find that a hot standby server runs two
orders of magnitude slower than the primary.

As a data point, I did a read only pgbench test and found that the
standby runs about 15% slower than the primary with identical hardware
and configs.

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
--
Jim Mlodgenski
EnterpriseDB (http://www.enterprisedb.com)

Erik Rijkers

er@xs4all.nl

about 16 years ago

In reply to: Erik Rijkers (#4)

Re: testing HS/SR - 1 vs 2 performance

On Mon, April 12, 2010 14:22, Erik Rijkers wrote:

On Sat, April 10, 2010 01:23, Erik Rijkers wrote:

Oops, typos in that pseudo loop:
of course there was a pgbench init step after that first line.

for scale in 10 100 500 1000

pgbench ... # initialise
sleep ((scale / 10) * 60)

for clients in 1 5 10 20
for port in 6565 6566 --> primaryport standbyport
for run in `seq 1 3`
pgbench ...

sleep 120

Show quoted text

done
done
done
done

Robert Haas

robertmhaas@gmail.com

about 16 years ago

In reply to: Jim Mlodgenski (#5)

Re: testing HS/SR - 1 vs 2 performance

On Mon, Apr 12, 2010 at 8:32 AM, Jim Mlodgenski <jimmy76@gmail.com> wrote:

I think we need to investigate this more. It's not going to look good
for the project if people find that a hot standby server runs two
orders of magnitude slower than the primary.

As a data point, I did a read only pgbench test and found that the
standby runs about 15% slower than the primary with identical hardware
and configs.

Hmm. That's not great, but it's a lot better than 50x. I wonder what
was different in Erik's environment. Does running in standby mode use
more memory, such that it might have pushed the machine over the line
into swap?

Or if it's CPU load, maybe Erik could gprof it?

...Robert

Aidan Van Dyk

aidan@highrise.ca

about 16 years ago

In reply to: Robert Haas (#3)

Re: testing HS/SR - 1 vs 2 performance

* Robert Haas <robertmhaas@gmail.com> [100412 07:10]:

I think we need to investigate this more. It's not going to look good
for the project if people find that a hot standby server runs two
orders of magnitude slower than the primary.

Yes, it's not "good", but it's a known problem. We've had people
complaining that wal-replay can't keep up with a wal stream from a heavy
server.

The master producing the wal stream has $XXX seperate read/modify
processes working over the data dir, and is bottle-necked by the
serialized WAL stream. All the seek+read delays are parallized and
overlapping.

But on the slave (traditionally PITR slave, now also HS/SR), has al
lthat read-modify-write happening in a single thread fasion, meaning
that WAL record $X+1 waits until the buffer $X needs to modify is read
in. All the seek+read delays are serialized.

You can optimize that by keepdng more of them in buffers (shared, or OS
cache), but the WAL producer, by it's very nature being a
multi-task-io-load producing random read/write is always going to go
quicker than single-stream random-io WAL consumer...

--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

Aidan Van Dyk

aidan@highrise.ca

about 16 years ago

In reply to: Aidan Van Dyk (#8)

Re: testing HS/SR - 1 vs 2 performance

And I see now that he's doing a stream of read-only queries on a slave,
presumably with no WAL even being replayed...

Sorry for the noise....

* Aidan Van Dyk <aidan@highrise.ca> [100412 09:40]:

* Robert Haas <robertmhaas@gmail.com> [100412 07:10]:

I think we need to investigate this more. It's not going to look good
for the project if people find that a hot standby server runs two
orders of magnitude slower than the primary.

Yes, it's not "good", but it's a known problem. We've had people
complaining that wal-replay can't keep up with a wal stream from a heavy
server.

The master producing the wal stream has $XXX seperate read/modify
processes working over the data dir, and is bottle-necked by the
serialized WAL stream. All the seek+read delays are parallized and
overlapping.

But on the slave (traditionally PITR slave, now also HS/SR), has al
lthat read-modify-write happening in a single thread fasion, meaning
that WAL record $X+1 waits until the buffer $X needs to modify is read
in. All the seek+read delays are serialized.

You can optimize that by keepdng more of them in buffers (shared, or OS
cache), but the WAL producer, by it's very nature being a
multi-task-io-load producing random read/write is always going to go
quicker than single-stream random-io WAL consumer...

a.

--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

#10

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 16 years ago

In reply to: Aidan Van Dyk (#8)

Re: testing HS/SR - 1 vs 2 performance

Aidan Van Dyk <aidan@highrise.ca> wrote:

We've had people complaining that wal-replay can't keep up with a
wal stream from a heavy server.

I thought this thread was about the slow performance running a mix
of read-only queries on the slave versus the master, which doesn't
seem to have anything to do with the old issue you're describing.

-Kevin

#11

Erik Rijkers

er@xs4all.nl

about 16 years ago

In reply to: Kevin Grittner (#10)

Re: testing HS/SR - 1 vs 2 performance

resending this message, as it seems to have bounced.

(below, I did fix the typo in the pseudocode loop)

---------------------------------------- Original Message ----------------------------------------
Subject: Re: [HACKERS] testing HS/SR - 1 vs 2 performance
From: "Erik Rijkers" <er@xs4all.nl>
Date: Mon, April 12, 2010 14:22
To: pgsql-hackers@postgresql.org
--------------------------------------------------------------------------------------------------

On Sat, April 10, 2010 01:23, Erik Rijkers wrote:

Using 9.0devel cvs HEAD, 2010.04.08.

I am trying to understand the performance difference
between primary and standby under a standard pgbench
read-only test.

server has 32 GB, 2 quadcores.

primary:
tps = 34606.747930 (including connections establishing)
tps = 34527.078068 (including connections establishing)
tps = 34654.297319 (including connections establishing)

standby:
tps = 700.346283 (including connections establishing)
tps = 717.576886 (including connections establishing)
tps = 740.522472 (including connections establishing)

transaction type: SELECT only
scaling factor: 1000
query mode: simple
number of clients: 20
number of threads: 1
duration: 900 s

both instances have
max_connections = 100
shared_buffers = 256MB
checkpoint_segments = 50
effective_cache_size= 16GB

See also:

http://archives.postgresql.org/pgsql-testers/2010-04/msg00005.php
(differences with scale 10_000)

To my surprise, I have later seen the opposite behaviour with the standby giving fast runs, and
the primary slow.

FWIW, I've overnight run a larget set of tests. (against same 9.0devel
instances as the ones from the earlier email).

These results are generally more balanced.

for scale in 10 100 500 1000
pgbench ... # initialise
sleep ((scale / 10) * 60)
for clients in 1 5 10 20
for port in 6565 6566 --> primaryport standbyport
for run in `seq 1 3`
pgbench ...
sleep ((scale / 10) * 60)
done
done
done
done

(so below, alternating 3 primary, followed by 3 standby runs)

Import Notes

Resolved by subject fallback

#12

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 16 years ago

In reply to: Erik Rijkers (#11)

Re: testing HS/SR - 1 vs 2 performance

I could reproduce this on my laptop, standby is about 20% slower. I ran
oprofile, and what stands out as the difference between the master and
standby is that on standby about 20% of the CPU time is spent in
hash_seq_search(). The callpath is GetSnapshotDat() ->
KnownAssignedXidsGetAndSetXmin() -> hash_seq_search(). That explains the
difference in performance.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 16 years ago

In reply to: Heikki Linnakangas (#12)

Re: testing HS/SR - 1 vs 2 performance

Heikki Linnakangas wrote:

I could reproduce this on my laptop, standby is about 20% slower. I ran
oprofile, and what stands out as the difference between the master and
standby is that on standby about 20% of the CPU time is spent in
hash_seq_search(). The callpath is GetSnapshotDat() ->
KnownAssignedXidsGetAndSetXmin() -> hash_seq_search(). That explains the
difference in performance.

The slowdown is proportional to the max_connections setting in the
standby. 20% slowdown might still be acceptable, but if you increase
max_connections to say 1000, things get really slow. I wouldn't
recommend max_connections=1000, of course, but I think we need to do
something about this. Changing the KnownAssignedXids data structure from
hash table into something that's quicker to scan. Preferably something
with O(N), where N is the number of entries in the data structure, not
the maximum number of entries it can hold as it is with the hash table
currently.

A quick fix would be to check if there's any entries in the hash table
before scanning it. That would eliminate the overhead when there's no
in-progress transactions in the master. But as soon as there's even one,
the overhead comes back.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14

Dimitri Fontaine

dimitri@2ndQuadrant.fr

about 16 years ago

In reply to: Heikki Linnakangas (#13)

Re: testing HS/SR - 1 vs 2 performance

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Changing the KnownAssignedXids data structure from
hash table into something that's quicker to scan. Preferably something
with O(N), where N is the number of entries in the data structure, not
the maximum number of entries it can hold as it is with the hash table
currently.

So that's pretty good news RedBlack Trees made it in 9.0, isn't it? :)

A quick fix would be to check if there's any entries in the hash table
before scanning it. That would eliminate the overhead when there's no
in-progress transactions in the master. But as soon as there's even one,
the overhead comes back.

Does not sound like typical, does it?
--
dim

testing HS/SR - 1 vs 2 performance

Attachments:

Attachments: