Latch implementation

Started by Ganesh Venkitachalam-1over 15 years ago5 messages
#1Ganesh Venkitachalam-1
ganesh@vmware.com
3 attachment(s)

Hi,

I've been playing around with measuring the latch implementation in 9.1,
and here are the results of a ping-pong test with 2 processes signalling
and waiting on the latch. I did three variations (linux 2.6.18, nehalem
processor).

One is the current one.

The second is built on native semaphors on linux. This one cannot
implement WaitLatchOrSocket, there's no select involved.

The third is an implementation based on pipe() and poll. Note: in its
current incarnation it's essentially a hack to measure performance, it's
not usable in postgres, this assumes all latches are created before any
process is forked. We'd need to use mkfifo to sort that out if we really
want to go this route, or similar.

- Current implementation: 1 pingpong is avg 15 usecs
- Pipe+poll: 9 usecs
- Semaphore: 6 usecs

The test program & modified unix_latch.c is attached, you can compile it
like "gcc -DPIPE -O2 sema.c" or "gcc -DLINUX_SEM -O2 sema.c" or "gcc -O2
sema.c".

Thanks,
--Ganesh

Attachments:

sema.ctext/plain; CHARSET=US-ASCII; NAME=sema.cDownload
latch.htext/plain; CHARSET=US-ASCII; NAME=latch.hDownload
unix_latch.ctext/plain; CHARSET=US-ASCII; NAME=unix_latch.cDownload
#2Robert Haas
robertmhaas@gmail.com
In reply to: Ganesh Venkitachalam-1 (#1)
Re: Latch implementation

On Wed, Sep 22, 2010 at 4:31 PM, Ganesh Venkitachalam-1
<ganesh@vmware.com> wrote:

I've been playing around with measuring the latch implementation in 9.1, and
here are the results of a ping-pong test with 2 processes signalling and
waiting on the latch. I did three variations (linux 2.6.18, nehalem
processor).

One is the current one.

The second is built on native semaphors on linux. This one cannot
implement WaitLatchOrSocket, there's no select involved.

The third is an implementation based on pipe() and poll. Note: in its
current incarnation it's essentially a hack to measure performance, it's not
usable in postgres, this assumes all latches are created before any process
is forked. We'd need to use mkfifo to sort that out if we really want to go
this route, or similar.

- Current implementation: 1 pingpong is avg 15 usecs
- Pipe+poll: 9 usecs
- Semaphore: 6 usecs

Interesting numbers. I guess one question is how much improving the
performance of the latch implementation would affect overall system
performance. Synchronous replication is obviously going to be highly
sensitive to latency, but even in that context I'm not really sure
whether this is enough to matter. Do you have any sense of that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Ganesh Venkitachalam-1 (#1)
Re: Latch implementation

On 22/09/10 23:31, Ganesh Venkitachalam-1 wrote:

I've been playing around with measuring the latch implementation in 9.1,
and here are the results of a ping-pong test with 2 processes signalling
and waiting on the latch. I did three variations (linux 2.6.18, nehalem
processor).

One is the current one.

The second is built on native semaphors on linux. This one cannot
implement WaitLatchOrSocket, there's no select involved.

The third is an implementation based on pipe() and poll. Note: in its
current incarnation it's essentially a hack to measure performance, it's
not usable in postgres, this assumes all latches are created before any
process is forked. We'd need to use mkfifo to sort that out if we really
want to go this route, or similar.

- Current implementation: 1 pingpong is avg 15 usecs
- Pipe+poll: 9 usecs
- Semaphore: 6 usecs

The test program & modified unix_latch.c is attached, you can compile it
like "gcc -DPIPE -O2 sema.c" or "gcc -DLINUX_SEM -O2 sema.c" or "gcc -O2
sema.c".

Interesting, thanks for the testing! Could you also test how much faster
the current implementation gets by just replacing select() with poll()?
That should shave off some overhead.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Ganesh Venkitachalam-1 (#1)
Re: Latch implementation

On Wed, 2010-09-22 at 13:31 -0700, Ganesh Venkitachalam-1 wrote:

Hi,

I've been playing around with measuring the latch implementation in 9.1,
and here are the results of a ping-pong test with 2 processes signalling
and waiting on the latch. I did three variations (linux 2.6.18, nehalem
processor).

One is the current one.

The second is built on native semaphors on linux. This one cannot
implement WaitLatchOrSocket, there's no select involved.

That looks interesting. If we had a need for a latch that would not need
to wait on a socket as well, this would be better. In sync rep, we
certainly do. Thanks for measuring this.

Question is: in that case would we use latches or a PGsemaphore?

If the answer is "latch" then we could just have an additional boolean
option when we request InitLatch() to see what kind of latch we want.

The third is an implementation based on pipe() and poll. Note: in its
current incarnation it's essentially a hack to measure performance, it's
not usable in postgres, this assumes all latches are created before any
process is forked. We'd need to use mkfifo to sort that out if we really
want to go this route, or similar.

- Current implementation: 1 pingpong is avg 15 usecs
- Pipe+poll: 9 usecs
- Semaphore: 6 usecs

Pipe+poll not worth it then.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

#5Ganesh Venkitachalam-1
ganesh@vmware.com
In reply to: Simon Riggs (#4)
3 attachment(s)
Re: Latch implementation

Attached is the current implementation redone with poll. It lands at
around 10.5 usecs, right above pipe, but better than the current
implementation.

As to the other questions: yes, this would matter for sync replication.
Cosider an enterprise use case with 10Gb network & SSDs (not at all
uncommon): a 10Gb network can do a roundtrip with the commitlog in <10
usecs, and SSDs have write latency < 50 usec. Now if the latch takes tens
of usescs (this stuff scales somewhat with the number of processes, my
data is all with 2 processes), that becomes a very significant part of the
net commit latency. So I'd think this is worth fixing.

Thanks,
--Ganesh

On Thu, 23 Sep 2010, Simon Riggs wrote:

Show quoted text

Date: Thu, 23 Sep 2010 06:56:38 -0700
From: Simon Riggs <simon@2ndQuadrant.com>
To: Ganesh Venkitachalam <ganesh@vmware.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Latch implementation

On Wed, 2010-09-22 at 13:31 -0700, Ganesh Venkitachalam-1 wrote:

Hi,

I've been playing around with measuring the latch implementation in 9.1,
and here are the results of a ping-pong test with 2 processes signalling
and waiting on the latch. I did three variations (linux 2.6.18, nehalem
processor).

One is the current one.

The second is built on native semaphors on linux. This one cannot
implement WaitLatchOrSocket, there's no select involved.

That looks interesting. If we had a need for a latch that would not need
to wait on a socket as well, this would be better. In sync rep, we
certainly do. Thanks for measuring this.

Question is: in that case would we use latches or a PGsemaphore?

If the answer is "latch" then we could just have an additional boolean
option when we request InitLatch() to see what kind of latch we want.

The third is an implementation based on pipe() and poll. Note: in its
current incarnation it's essentially a hack to measure performance, it's
not usable in postgres, this assumes all latches are created before any
process is forked. We'd need to use mkfifo to sort that out if we really
want to go this route, or similar.

- Current implementation: 1 pingpong is avg 15 usecs
- Pipe+poll: 9 usecs
- Semaphore: 6 usecs

Pipe+poll not worth it then.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

sema.ctext/plain; charset=US-ASCII; name=sema.cDownload
unix_latch.ctext/plain; charset=US-ASCII; name=unix_latch.cDownload
latch.htext/plain; charset=US-ASCII; name=latch.hDownload