"pgstat wait timeout" warnings

Started by Tom Lanealmost 15 years ago6 messageshackers

tgl@sss.pgh.pa.us

almost 15 years ago

We occasionally see $SUBJECT in the buildfarm, and I've also recently
had reports of them from Red Hat customers. The obvious theory is that
these reflect high load preventing the stats collector from responding,
but it would really take pretty crushing load to make that happen if
there were not anything funny going on.

It struck me just now while reviewing the latch code that pg_usleep
could sleep for less than the expected time if a signal happened, and
if that happened repeatedly for some reason, perhaps the loop could
complete in much less than the nominal time. I'm not sure I believe
that idea either, but anyway I'm feeling motivated to try to gather more
data.

Does anyone have a problem with sticking a lot of debugging printout
into backend_read_statsfile() in HEAD only? I'm envisioning it starting
to dump assorted information including elapsed time, errno values, etc
once the loop counter is more than halfway to expiration, which is
already a situation that we shouldn't see under normal conditions.

regards, tom lane

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Tom Lane (#1)

Re: "pgstat wait timeout" warnings

On 10.08.2011 21:45, Tom Lane wrote:

We occasionally see $SUBJECT in the buildfarm, and I've also recently
had reports of them from Red Hat customers. The obvious theory is that
these reflect high load preventing the stats collector from responding,
but it would really take pretty crushing load to make that happen if
there were not anything funny going on.

It struck me just now while reviewing the latch code that pg_usleep
could sleep for less than the expected time if a signal happened, and
if that happened repeatedly for some reason, perhaps the loop could
complete in much less than the nominal time. I'm not sure I believe
that idea either, but anyway I'm feeling motivated to try to gather more
data.

I've also seen this on my laptop occasionally. The most recent case I
remember was when I COPYed a lot of data, so that the harddisk was
really busy. The system was a bit unresponsive anyway, because of all
the I/O happening.

So my theory is that if the I/O is really busy, write() on the stats
file blocks for more than 5 seconds, and you get the timeout.

Does anyone have a problem with sticking a lot of debugging printout
into backend_read_statsfile() in HEAD only? I'm envisioning it starting
to dump assorted information including elapsed time, errno values, etc
once the loop counter is more than halfway to expiration, which is
already a situation that we shouldn't see under normal conditions.

No objections here.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Bernd Helmle

mailings@oopsware.de

almost 15 years ago

In reply to: Heikki Linnakangas (#2)

Re: "pgstat wait timeout" warnings

--On 10. August 2011 21:54:06 +0300 Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

So my theory is that if the I/O is really busy, write() on the stats file
blocks for more than 5 seconds, and you get the timeout.

I've seen it on customer instances with very high INSERT peak loads (several
dozens backends INSERTing/UPDATEing data concurrently). We are using a RAM disk
for stats_temp_directory now for a while, and the timeout never occured again.

--
Thanks

Bernd

Andres Freund

andres@anarazel.de

almost 15 years ago

In reply to: Bernd Helmle (#3)

Re: "pgstat wait timeout" warnings

On Thursday, August 11, 2011 11:49:12 Bernd Helmle wrote:

--On 10. August 2011 21:54:06 +0300 Heikki Linnakangas

<heikki.linnakangas@enterprisedb.com> wrote:

So my theory is that if the I/O is really busy, write() on the stats
file
blocks for more than 5 seconds, and you get the timeout.

I've seen it on customer instances with very high INSERT peak loads (several
dozens backends INSERTing/UPDATEing data concurrently). We are using a RAM
disk for stats_temp_directory now for a while, and the timeout never
occured again.

Yes, I have seen it several times as well. I can actually reproduce it without
much problems, so if you have some idea to test...

I also routinely use stats_temp_directory + tmpfs to solve this (and related
issues). I really think the stats file mechanism should be improved
fundamentally.

Andres

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Andres Freund (#4)

Re: "pgstat wait timeout" warnings

Andres Freund <andres@anarazel.de> writes:

--On 10. August 2011 21:54:06 +0300 Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

So my theory is that if the I/O is really busy, write() on the stats file
blocks for more than 5 seconds, and you get the timeout.

Yes, I have seen it several times as well. I can actually reproduce it
without much problems, so if you have some idea to test...

It doesn't surprise me that it's possible to reproduce it under extreme
I/O load. What I am wondering about is whether there's some bug/effect
that allows it to happen without that.

regards, tom lane

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Tom Lane (#5)

Re: "pgstat wait timeout" warnings

On Thu, Aug 11, 2011 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

--On 10. August 2011 21:54:06 +0300 Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

So my theory is that if the I/O is really busy, write() on the stats file
blocks for more than 5 seconds, and you get the timeout.

Yes, I have seen it several times as well. I can actually reproduce it
without much problems, so if you have some idea to test...

It doesn't surprise me that it's possible to reproduce it under extreme
I/O load. What I am wondering about is whether there's some bug/effect
that allows it to happen without that.

I got it several times during a pgbench -i -s 5000 run this morning.
I guess that's a lot of I/O, but I'm not sure I'd refer to one process
filling a table with data as "extreme".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company