zLinux Load Testing Experience

Started by Andrew Hastieabout 13 years ago10 messagesgeneral

andrew@ahastie.net

about 13 years ago

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the
standard build tools from source. Everything appears run correctly.
However as part of performance testing, our IBM and Linux SysProgs have
been "poking around" using strace and have reported the following (which
they think is an error condition) when hooking up to the postmaster
processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 (Timeout)
... repeated many times

From researching the archives, I "believe" the above to be "as
designed" and simply indicates the Postmaster is attempting to read data
from an IP socket which is timing out. Could I ask :-
1. Is this "normal" ?
2. if abnormal, any pointers as to where to start investigating

The reason they latched onto the postmaster process was due to a
perceived high CPU utilisation. For info, we are load testing with 100
connections being accessed from an IBm WebSphere hosted EJB based
application.

Many thanks,
Andrew

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeff Janes

jeff.janes@gmail.com

about 13 years ago

In reply to: Andrew Hastie (#1)

Re: zLinux Load Testing Experience

On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote:

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the standard
build tools from source. Everything appears run correctly. However as part
of performance testing, our IBM and Linux SysProgs have been "poking
around" using strace and have reported the following (which they think is
an error condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process. It looks like probably the
background writer process.

It is normal, and doesn't explain high CPU utilization.

Cheers,

Jeff

Merlin Moncure

mmoncure@gmail.com

about 13 years ago

In reply to: Jeff Janes (#2)

Re: zLinux Load Testing Experience

On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote:

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the standard
build tools from source. Everything appears run correctly. However as part
of performance testing, our IBM and Linux SysProgs have been "poking around"
using strace and have reported the following (which they think is an error
condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process. It looks like probably the
background writer process.

It is normal, and doesn't explain high CPU utilization.

yeah: we're probably a couple of steps in front of deep system
profiling. Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Andrew Hastie

andrew@ahastie.net

about 13 years ago

In reply to: Merlin Moncure (#3)

Re: zLinux Load Testing Experience

On 30/04/13 20:46, Merlin Moncure wrote:

On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote:

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the standard
build tools from source. Everything appears run correctly. However as part
of performance testing, our IBM and Linux SysProgs have been "poking around"
using strace and have reported the following (which they think is an error
condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process. It looks like probably the
background writer process.

It is normal, and doesn't explain high CPU utilization.

yeah: we're probably a couple of steps in front of deep system
profiling. Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin

I'm not sure how much experience the community has on tuning PostgreSQL
running on RedHat which in turn is hosted on an IBM mainframe under VM
(using zLinux). So I'm happy to start posting further details and
benchmark results and see where we go. Should I be moving this thread
over into the pg-performance list, or is pg-general the right place?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Merlin Moncure

mmoncure@gmail.com

about 13 years ago

In reply to: Andrew Hastie (#4)

Re: zLinux Load Testing Experience

On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:

On 30/04/13 20:46, Merlin Moncure wrote:

On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
wrote:

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the
standard
build tools from source. Everything appears run correctly. However as
part
of performance testing, our IBM and Linux SysProgs have been "poking
around"
using strace and have reported the following (which they think is an
error
condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
(Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process. It looks like probably
the
background writer process.

It is normal, and doesn't explain high CPU utilization.

yeah: we're probably a couple of steps in front of deep system
profiling. Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin

I'm not sure how much experience the community has on tuning PostgreSQL
running on RedHat which in turn is hosted on an IBM mainframe under VM
(using zLinux). So I'm happy to start posting further details and benchmark
results and see where we go. Should I be moving this thread over into the
pg-performance list, or is pg-general the right place?

certainly performance. and yes, zLinux is less well traveled. Did
you compile postgres from source? Did you confirm that there is a
native spinlocks implementation and it is being used?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Andrew Hastie

andrew@ahastie.net

about 13 years ago

In reply to: Merlin Moncure (#5)

Re: zLinux Load Testing Experience

On 01/05/13 15:34, Merlin Moncure wrote:

On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:

On 30/04/13 20:46, Merlin Moncure wrote:

On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
wrote:

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the
standard
build tools from source. Everything appears run correctly. However as
part
of performance testing, our IBM and Linux SysProgs have been "poking
around"
using strace and have reported the following (which they think is an
error
condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
(Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process. It looks like probably
the
background writer process.

It is normal, and doesn't explain high CPU utilization.

yeah: we're probably a couple of steps in front of deep system
profiling. Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin

I'm not sure how much experience the community has on tuning PostgreSQL
running on RedHat which in turn is hosted on an IBM mainframe under VM
(using zLinux). So I'm happy to start posting further details and benchmark
results and see where we go. Should I be moving this thread over into the
pg-performance list, or is pg-general the right place?

certainly performance. and yes, zLinux is less well traveled. Did
you compile postgres from source? Did you confirm that there is a
native spinlocks implementation and it is being used?

merlin

Did you compile postgres from source? - Yes (I need PG v9.n as v8.n
shipped with RedHat Ent6 does not have several v9 specific features we
need).

Did you confirm that there is a native spinlocks implementation and it is being used? - I believe so as no errors or warnings logged during the build. Is there a simple way to check whether spin-locks are running native?

I've started looking at several articles covering pgbench and running some initial tests, so I plan to start a new thread on pg-performance in the next day or so.

Thanks for the advice so far - Appreciated :-)

Andrew

Merlin Moncure

mmoncure@gmail.com

about 13 years ago

In reply to: Andrew Hastie (#6)

Re: zLinux Load Testing Experience

On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote:

On 01/05/13 15:34, Merlin Moncure wrote:

On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:

On 30/04/13 20:46, Merlin Moncure wrote:

On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
wrote:

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the
standard
build tools from source. Everything appears run correctly. However as
part
of performance testing, our IBM and Linux SysProgs have been "poking
around"
using strace and have reported the following (which they think is an
error
condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
(Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process. It looks like probably
the
background writer process.

It is normal, and doesn't explain high CPU utilization.

yeah: we're probably a couple of steps in front of deep system
profiling. Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin

I'm not sure how much experience the community has on tuning PostgreSQL
running on RedHat which in turn is hosted on an IBM mainframe under VM
(using zLinux). So I'm happy to start posting further details and benchmark
results and see where we go. Should I be moving this thread over into the
pg-performance list, or is pg-general the right place?

certainly performance. and yes, zLinux is less well traveled. Did
you compile postgres from source? Did you confirm that there is a
native spinlocks implementation and it is being used?

merlin

Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped
with RedHat Ent6 does not have several v9 specific features we need).

Did you confirm that there is a native spinlocks implementation and it is
being used? - I believe so as no errors or warnings logged during the build.
Is there a simple way to check whether spin-locks are running native?

I've started looking at several articles covering pgbench and running some
initial tests, so I plan to start a new thread on pg-performance in the next
day or so.

Thanks for the advice so far - Appreciated :-)

I can't remember off the top of my head if configure forces you to
specifically unset spinlocks to get through a build on a non-hardware
spinlock platform. Point being: the interesting stuff happens during
configure, not build.

Check the contents of src/include/pg_config.h and look for this line:
#define HAVE_SPINLOCKS 1

to see if you have hardware spinlocks.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Merlin Moncure

mmoncure@gmail.com

about 13 years ago

In reply to: Merlin Moncure (#7)

Re: zLinux Load Testing Experience

On Wed, May 1, 2013 at 1:21 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote:

On 01/05/13 15:34, Merlin Moncure wrote:

On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:

On 30/04/13 20:46, Merlin Moncure wrote:

On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
wrote:

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the
standard
build tools from source. Everything appears run correctly. However as
part
of performance testing, our IBM and Linux SysProgs have been "poking
around"
using strace and have reported the following (which they think is an
error
condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
(Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process. It looks like probably
the
background writer process.

It is normal, and doesn't explain high CPU utilization.

yeah: we're probably a couple of steps in front of deep system
profiling. Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin

I'm not sure how much experience the community has on tuning PostgreSQL
running on RedHat which in turn is hosted on an IBM mainframe under VM
(using zLinux). So I'm happy to start posting further details and benchmark
results and see where we go. Should I be moving this thread over into the
pg-performance list, or is pg-general the right place?

certainly performance. and yes, zLinux is less well traveled. Did
you compile postgres from source? Did you confirm that there is a
native spinlocks implementation and it is being used?

merlin

Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped
with RedHat Ent6 does not have several v9 specific features we need).

Did you confirm that there is a native spinlocks implementation and it is
being used? - I believe so as no errors or warnings logged during the build.
Is there a simple way to check whether spin-locks are running native?

I've started looking at several articles covering pgbench and running some
initial tests, so I plan to start a new thread on pg-performance in the next
day or so.

Thanks for the advice so far - Appreciated :-)

I can't remember off the top of my head if configure forces you to
specifically unset spinlocks to get through a build on a non-hardware
spinlock platform. Point being: the interesting stuff happens during
configure, not build.

Check the contents of src/include/pg_config.h and look for this line:
#define HAVE_SPINLOCKS 1

to see if you have hardware spinlocks.

Just a follow up here since I'm about to go on vacation and will be
out of pocket for the next several days. If you do indeed find out
that you are using non TAS spinlocks, and are suspicious that this is
causing your load issues, and are feeling experimental, and are using
gcc to compile postgres, and have determined that the
HAVE_GCC_INT_ATOMICS macro is set, I'd maybe consider hacking s_lock.h
to use the gcc __sync_lock_test_and_set variant of TAS (see around
line 300) in s_lock.h.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Andrew Hastie (#6)

Re: zLinux Load Testing Experience

Andrew Hastie <andrew@ahastie.net> writes:

Did you confirm that there is a native spinlocks implementation and it is being used? - I believe so as no errors or warnings logged during the build. Is there a simple way to check whether spin-locks are running native?

All non-ancient versions of PG force you to say "configure --disable-spinlocks"
to get a build without native spinlocks. Such builds are only
considered suitable for zero-order port testing, because the performance
hit is so bad.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Andrew Hastie

andrew@ahastie.net

about 13 years ago

In reply to: Merlin Moncure (#7)

Re: zLinux Load Testing Experience

On 01/05/13 19:21, Merlin Moncure wrote:

On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote:

On 01/05/13 15:34, Merlin Moncure wrote:

On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:

On 30/04/13 20:46, Merlin Moncure wrote:

On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net>
wrote:

I'm currently working on a project porting an application from RedHat
Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at
version 9.n, so the PostgreSQL binaries have been built using the
standard
build tools from source. Everything appears run correctly. However as
part
of performance testing, our IBM and Linux SysProgs have been "poking
around"
using strace and have reported the following (which they think is an
error
condition) when hooking up to the postmaster processes:-

read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0
(Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily
unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0
(Timeout)
... repeated many times

That does not look like the postmaster process. It looks like probably
the
background writer process.

It is normal, and doesn't explain high CPU utilization.

yeah: we're probably a couple of steps in front of deep system
profiling. Helpful things to provide to help diagnose would be:

*) 'explain analyze' of the queries that are eating cpu
*) more details about the hardware -- how many cpu, etc.
*) better definition of 'perceived high CPU utilisation'
*) some correlating performance tests, expecially cpu bound pgbench
tests (pgbench -S)

merlin

I'm not sure how much experience the community has on tuning PostgreSQL
running on RedHat which in turn is hosted on an IBM mainframe under VM
(using zLinux). So I'm happy to start posting further details and benchmark
results and see where we go. Should I be moving this thread over into the
pg-performance list, or is pg-general the right place?

certainly performance. and yes, zLinux is less well traveled. Did
you compile postgres from source? Did you confirm that there is a
native spinlocks implementation and it is being used?

merlin

Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped
with RedHat Ent6 does not have several v9 specific features we need).

Did you confirm that there is a native spinlocks implementation and it is
being used? - I believe so as no errors or warnings logged during the build.
Is there a simple way to check whether spin-locks are running native?

I've started looking at several articles covering pgbench and running some
initial tests, so I plan to start a new thread on pg-performance in the next
day or so.

Thanks for the advice so far - Appreciated :-)

I can't remember off the top of my head if configure forces you to
specifically unset spinlocks to get through a build on a non-hardware
spinlock platform. Point being: the interesting stuff happens during
configure, not build.

Check the contents of src/include/pg_config.h and look for this line:
#define HAVE_SPINLOCKS 1

to see if you have hardware spinlocks.

merlin

Confirm that #define HAVE_SPINLOCKS 1 is present and correct.

Will move any performance related issues I find onto the pg-performance
list.
Many thanks for all the help and advice so far :-)
Andrew

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general