Better way of dealing with pgstat wait timeout during buildfarm runs?

Started by Andres Freundover 11 years ago44 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

We quite regularly have buildfarm failures that are caused by 'WARNING:
pgstat wait timeout' at random points during the build. Especially on
some of the ARM buildfarm animals those are really frequent, to the
point that it's hard to know the actual state of the buildfarm without
looking at several animals. The ARM ones are probably affected more
frequently because they'll often run on sd cards and such.

So I think a better way to deal with that warning would be a good
idea. Besides somehow making the mechanism there are two ways to attack
this that I can think of, neither of them awe inspiring:

1) Make that WARNING a LOG message instead. Since those don't get send
to the client with default settings...
2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
it to.

Does anybody have an actually good idea?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Andres Freund <andres@2ndquadrant.com> writes:

So I think a better way to deal with that warning would be a good
idea. Besides somehow making the mechanism there are two ways to attack
this that I can think of, neither of them awe inspiring:

1) Make that WARNING a LOG message instead. Since those don't get send
to the client with default settings...
2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
it to.

Yeah, I've been getting more annoyed by that too lately. I keep wondering
though whether there's an actual bug underneath that behavior that we're
failing to see. PGSTAT_MAX_WAIT_TIME is already 10 seconds; it's hard to
credit that increasing it still further would be "fixing" anything.
The other change would also mainly just sweep the issue under the rug,
if there is any issue and it's not just that we're overloading
underpowered buildfarm machines. (Maybe a better fix would be to reduce
MAX_CONNECTIONS for the tests on these machines?)

I wonder whether when multiple processes are demanding statsfile updates,
there's some misbehavior that causes them to suck CPU away from the stats
collector and/or convince it that it doesn't need to write anything.
There are odd things in the logs in some of these events. For example in
today's failure on hamster,
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2014-12-25%2016%3A00%3A07
there are two client-visible wait-timeout warnings, one each in the
gist and spgist tests. But in the postmaster log we find these in
fairly close succession:

[549c38ba.724d:2] WARNING: pgstat wait timeout
[549c39b1.73e7:10] WARNING: pgstat wait timeout
[549c38ba.724d:3] WARNING: pgstat wait timeout

Correlating these with other log entries shows that the first and third
are from the autovacuum launcher while the second is from the gist test
session. So the spgist failure failed to get logged, and in any case the
big picture is that we had four timeout warnings occurring in a pretty
short span of time, in a parallel test set that's not all that demanding
(12 parallel tests, well below our max). Not sure what to make of that.

BTW, I notice that in the current state of pgstat.c, all the logic for
keeping track of request arrival times is dead code, because nothing is
actually looking at DBWriteRequest.request_time. This makes me think that
somebody simplified away some logic we maybe should have kept. But if
we're going to leave it like this, we could replace the DBWriteRequest
data structure with a simple OID list and save a fair amount of code.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On 2014-12-25 14:36:42 -0500, Tom Lane wrote:

I wonder whether when multiple processes are demanding statsfile updates,
there's some misbehavior that causes them to suck CPU away from the stats
collector and/or convince it that it doesn't need to write anything.
There are odd things in the logs in some of these events. For example in
today's failure on hamster,
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2014-12-25%2016%3A00%3A07
there are two client-visible wait-timeout warnings, one each in the
gist and spgist tests. But in the postmaster log we find these in
fairly close succession:

[549c38ba.724d:2] WARNING: pgstat wait timeout
[549c39b1.73e7:10] WARNING: pgstat wait timeout
[549c38ba.724d:3] WARNING: pgstat wait timeout

Correlating these with other log entries shows that the first and third
are from the autovacuum launcher while the second is from the gist test
session. So the spgist failure failed to get logged, and in any case the
big picture is that we had four timeout warnings occurring in a pretty
short span of time, in a parallel test set that's not all that demanding
(12 parallel tests, well below our max). Not sure what to make of that.

My guess is that a checkpoint happened at that time. Maybe it'd be a
good idea to make pg_regress start postgres with log_checkpoints
enabled? My guess is that we'd find horrendous 'sync' times.

Michael: Could you perhaps turn log_checkpoints on in the config?

BTW, I notice that in the current state of pgstat.c, all the logic for
keeping track of request arrival times is dead code, because nothing is
actually looking at DBWriteRequest.request_time. This makes me think that
somebody simplified away some logic we maybe should have kept. But if
we're going to leave it like this, we could replace the DBWriteRequest
data structure with a simple OID list and save a fair amount of code.

That's indeed odd. Seems to have been lost when the statsfile was split
into multiple files. Alvaro, Tomas?

I wondered for a second whether the split could be responsible somehow,
but there's reports of that in older backbranches as well:
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&amp;dt=2014-12-23%2019%3A17%3A41

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On 25.12.2014 20:36, Tom Lane wrote:

Yeah, I've been getting more annoyed by that too lately. I keep
wondering though whether there's an actual bug underneath that
behavior that we're failing to see. PGSTAT_MAX_WAIT_TIME is already
10 seconds; it's hard to credit that increasing it still further
would be "fixing" anything. The other change would also mainly just
sweep the issue under the rug, if there is any issue and it's not
just that we're overloading underpowered buildfarm machines. (Maybe a
better fix would be to reduce MAX_CONNECTIONS for the tests on these
machines?)

I agree that increasing the limit further is not a good solution.

BTW, I notice that in the current state of pgstat.c, all the logic
for keeping track of request arrival times is dead code, because
nothing is actually looking at DBWriteRequest.request_time. This
makes me think that somebody simplified away some logic we maybe
should have kept. But if we're going to leave it like this, we could
replace the DBWriteRequest data structure with a simple OID list and
save a fair amount of code.

Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
updating the request_time after receiving the inquiry, and
pgstat_db_requested() is looking at it when writing the files.

If we can simplify the code by keeping just OIDs, let's do that. I think
the main reason why we haven't done that in 187492b6 was to keep as much
of the existing logic (and maybe change it in a separate patch).

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#4)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Tomas Vondra <tv@fuzzy.cz> writes:

On 25.12.2014 20:36, Tom Lane wrote:

BTW, I notice that in the current state of pgstat.c, all the logic
for keeping track of request arrival times is dead code, because
nothing is actually looking at DBWriteRequest.request_time.

Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
updating the request_time after receiving the inquiry, and
pgstat_db_requested() is looking at it when writing the files.

Where is pgstat_db_requested() looking at request_time?

If we can simplify the code by keeping just OIDs, let's do that. I think
the main reason why we haven't done that in 187492b6 was to keep as much
of the existing logic (and maybe change it in a separate patch).

The real point here is that I think that commit *already* changed the
existing logic, because the time-of-receipt used to matter. In
particular, there used to be a throttle on how often the stats file
could get written, which seems to have vanished. I seriously doubt
that that was a good change, especially on write-bandwidth-challenged
platforms.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#3)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On 25.12.2014 21:14, Andres Freund wrote:

On 2014-12-25 14:36:42 -0500, Tom Lane wrote:

My guess is that a checkpoint happened at that time. Maybe it'd be a
good idea to make pg_regress start postgres with log_checkpoints
enabled? My guess is that we'd find horrendous 'sync' times.

Michael: Could you perhaps turn log_checkpoints on in the config?

Logging timestamps (using log_line_prefux) would be also helpful.

BTW, I notice that in the current state of pgstat.c, all the logic
for keeping track of request arrival times is dead code, because
nothing is actually looking at DBWriteRequest.request_time. This
makes me think that somebody simplified away some logic we maybe
should have kept. But if we're going to leave it like this, we
could replace the DBWriteRequest data structure with a simple OID
list and save a fair amount of code.

That's indeed odd. Seems to have been lost when the statsfile was
split into multiple files. Alvaro, Tomas?

The goal was to keep the logic as close to the original as possible.
IIRC there were "pgstat wait timeout" issues before, and in most cases
the conclusion was that it's probably because of overloaded I/O.

But maybe there actually was another bug, and it's entirely possible
that the split introduced a new one, and that's what we're seeing now.
The strange thing is that the split happened ~2 years ago, which is
inconsistent with the sudden increase of this kind of issues. So maybe
something changed on that particular animal (a failing SD card causing
I/O stalls, perhaps)?

Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
and analyze the issue locally. But that won't happen until January.

I wondered for a second whether the split could be responsible
somehow, but there's reports of that in older backbranches as well:
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&amp;dt=2014-12-23%2019%3A17%3A41

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#6)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Tomas Vondra <tv@fuzzy.cz> writes:

The strange thing is that the split happened ~2 years ago, which is
inconsistent with the sudden increase of this kind of issues. So maybe
something changed on that particular animal (a failing SD card causing
I/O stalls, perhaps)?

I think that hamster has basically got a tin can and string for an I/O
subsystem. It's not real clear to me whether there's actually been an
increase in "wait timeout" failures recently; somebody would have to
go through and count them before I'd have much faith in that thesis.
However, I notice that at least the last few occurrences on "hamster"
all seem to have been in this parallel block:

test: brin gin gist spgist privileges security_label collate matview lock replica_identity rowsecurity object_address

which as recently as 9.4 contained just these tests:

test: privileges security_label collate matview lock replica_identity

I'm fairly sure that the index-related tests in this batch are I/O
intensive, and since they were not there at all six months ago, it's not
hard to believe that this block of tests has far greater I/O demands than
used to exist. Draw your own conclusions ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On 25.12.2014 22:16, Tom Lane wrote:

Tomas Vondra <tv@fuzzy.cz> writes:

On 25.12.2014 20:36, Tom Lane wrote:

BTW, I notice that in the current state of pgstat.c, all the logic
for keeping track of request arrival times is dead code, because
nothing is actually looking at DBWriteRequest.request_time.

Really? Which part of the code is dead? I see pgstat_recv_inquiry() is
updating the request_time after receiving the inquiry, and
pgstat_db_requested() is looking at it when writing the files.

Where is pgstat_db_requested() looking at request_time?

Oh, right. pgstat_db_requested() is not looking at the timestamp - it
only checks the OID of the database. But pgstat_recv_inquiry() is
checking it, comparing it to cutoff_time etc.

ISTM the main change related to this is that this:

if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);

got replaced by this:

if (pgstat_write_statsfile_needed())
pgstat_write_statsfiles(false, false);

where pgstat_write_statsfile_needed() only checks if the list is empty
(and ignores the request/write timestamps).

If I understand that correctly, this can would lead to writing the files
more often, and we're dealing with a timeout.

If we can simplify the code by keeping just OIDs, let's do that. I think
the main reason why we haven't done that in 187492b6 was to keep as much
of the existing logic (and maybe change it in a separate patch).

The real point here is that I think that commit *already* changed
the existing logic, because the time-of-receipt used to matter. In
particular, there used to be a throttle on how often the stats file
could get written, which seems to have vanished. I seriously doubt
that that was a good change, especially on
write-bandwidth-challenged platforms.

Yes - if that change causes writing the files being written more
frequently, it's not a good change.

But I think the time-of-receipt still matters - pgstat_recv_inquiry
logic remained the same, just applied per database. ISTM the only thing
that disappeared is the (last_statwrite < last_statrequest) check.

I have to think about this a bit more, I haven't seen this code since
the split patch.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#7)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On 25.12.2014 22:40, Tom Lane wrote:

Tomas Vondra <tv@fuzzy.cz> writes:

The strange thing is that the split happened ~2 years ago, which is
inconsistent with the sudden increase of this kind of issues. So maybe
something changed on that particular animal (a failing SD card causing
I/O stalls, perhaps)?

I think that hamster has basically got a tin can and string for an I/O
subsystem. It's not real clear to me whether there's actually been an

Yes. It's called "SD card".

increase in "wait timeout" failures recently; somebody would have to
go through and count them before I'd have much faith in that thesis.

That's what I did. On hamster I see this (in the HEAD):

2014-12-25 16:00:07 yes
2014-12-24 16:00:07 yes
2014-12-23 16:00:07 yes
2014-12-22 16:00:07 yes
2014-12-19 16:00:07 yes
2014-12-15 16:00:11 no
2014-10-25 16:00:06 no
2014-10-24 16:00:06 no
2014-10-23 16:00:06 no
2014-10-22 16:00:06 no
2014-10-21 16:00:07 no
2014-10-19 16:00:06 no
2014-09-28 16:00:06 no
2014-09-26 16:00:07 no
2014-08-28 16:00:06 no
2014-08-12 16:00:06 no
2014-08-05 22:04:48 no
2014-07-19 01:53:30 no
2014-07-06 16:00:06 no
2014-07-04 16:00:06 no
2014-06-29 16:00:06 no
2014-05-09 16:00:04 no
2014-05-07 16:00:04 no
2014-05-04 16:00:04 no
2014-04-28 16:00:04 no
2014-04-18 16:00:04 no
2014-04-04 16:00:04 no

(where "yes" means "pgstat wait timeout" is in the logs). On chipmunk,
the trend is much less convincing (but there's much less failures, and
only 3 of them failed because of the "pgstat wait timeout").

However, it's worth mentioning that all the pgstat failures happened at
"16:00:07" and most of the older failures are before that. So it may be
that it was failing because of something else, and the pgstat timeout
could not happen because of that. OTOH, there's a fair amount of
successful runs.

However, I notice that at least the last few occurrences on "hamster"
all seem to have been in this parallel block:

test: brin gin gist spgist privileges security_label collate matview
lock replica_identity rowsecurity object_address

which as recently as 9.4 contained just these tests:

test: privileges security_label collate matview lock replica_identity

I'm fairly sure that the index-related tests in this batch are I/O
intensive, and since they were not there at all six months ago, it's
not hard to believe that this block of tests has far greater I/O
demands than used to exist. Draw your own conclusions ...

Yes, that might be the culprit here. Would be interesting to know what's
happening on the machine while the tests are running to confirm this
hypothesis.

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#9)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Tomas Vondra <tv@fuzzy.cz> writes:

On 25.12.2014 22:40, Tom Lane wrote:

I think that hamster has basically got a tin can and string for an I/O
subsystem. It's not real clear to me whether there's actually been an
increase in "wait timeout" failures recently; somebody would have to
go through and count them before I'd have much faith in that thesis.

That's what I did. On hamster I see this (in the HEAD):

2014-12-25 16:00:07 yes
2014-12-24 16:00:07 yes
2014-12-23 16:00:07 yes
2014-12-22 16:00:07 yes
2014-12-19 16:00:07 yes
2014-12-15 16:00:11 no
2014-10-25 16:00:06 no
2014-10-24 16:00:06 no
2014-10-23 16:00:06 no
2014-10-22 16:00:06 no
2014-10-21 16:00:07 no
2014-10-19 16:00:06 no
2014-09-28 16:00:06 no
2014-09-26 16:00:07 no
2014-08-28 16:00:06 no
2014-08-12 16:00:06 no
2014-08-05 22:04:48 no
2014-07-19 01:53:30 no
2014-07-06 16:00:06 no
2014-07-04 16:00:06 no
2014-06-29 16:00:06 no
2014-05-09 16:00:04 no
2014-05-07 16:00:04 no
2014-05-04 16:00:04 no
2014-04-28 16:00:04 no
2014-04-18 16:00:04 no
2014-04-04 16:00:04 no

(where "yes" means "pgstat wait timeout" is in the logs). On chipmunk,
the trend is much less convincing (but there's much less failures, and
only 3 of them failed because of the "pgstat wait timeout").

mereswine's history is also pretty interesting in this context. That
series makes it look like the probability of "pgstat wait timeout" took
a big jump around the beginning of December, especially if you make the
unproven-but-not-unreasonable assumption that the two pg_upgradecheck
failures since then were also wait timeout failures. That's close enough
after commit 88fc71926392115c (Nov 19) to make me suspect that that was
what put us over the edge: that added a bunch more I/O *and* a bunch more
statistics demands to this one block of parallel tests.

But even if we are vastly overstressing the I/O subsystem on these boxes,
why is it manifesting like this? pgstat never fsyncs the stats temp file,
so it should not have to wait for physical I/O I'd think. Or perhaps the
file rename() operations get fsync'd behind the scenes by the filesystem?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#10)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On 26.12.2014 02:59, Tom Lane wrote:

Tomas Vondra <tv@fuzzy.cz> writes:

On 25.12.2014 22:40, Tom Lane wrote:

I think that hamster has basically got a tin can and string for an I/O
subsystem. It's not real clear to me whether there's actually been an
increase in "wait timeout" failures recently; somebody would have to
go through and count them before I'd have much faith in that thesis.

That's what I did. On hamster I see this (in the HEAD):

2014-12-25 16:00:07 yes
2014-12-24 16:00:07 yes
2014-12-23 16:00:07 yes
2014-12-22 16:00:07 yes
2014-12-19 16:00:07 yes
2014-12-15 16:00:11 no
2014-10-25 16:00:06 no
2014-10-24 16:00:06 no
2014-10-23 16:00:06 no
2014-10-22 16:00:06 no
2014-10-21 16:00:07 no
2014-10-19 16:00:06 no
2014-09-28 16:00:06 no
2014-09-26 16:00:07 no
2014-08-28 16:00:06 no
2014-08-12 16:00:06 no
2014-08-05 22:04:48 no
2014-07-19 01:53:30 no
2014-07-06 16:00:06 no
2014-07-04 16:00:06 no
2014-06-29 16:00:06 no
2014-05-09 16:00:04 no
2014-05-07 16:00:04 no
2014-05-04 16:00:04 no
2014-04-28 16:00:04 no
2014-04-18 16:00:04 no
2014-04-04 16:00:04 no

(where "yes" means "pgstat wait timeout" is in the logs). On
chipmunk, the trend is much less convincing (but there's much less
failures, and only 3 of them failed because of the "pgstat wait
timeout").

mereswine's history is also pretty interesting in this context. That
series makes it look like the probability of "pgstat wait timeout"
took a big jump around the beginning of December, especially if you
make the unproven-but-not-unreasonable assumption that the two
pg_upgradecheck failures since then were also wait timeout failures.
That's close enough after commit 88fc71926392115c (Nov 19) to make me
suspect that that was what put us over the edge: that added a bunch
more I/O *and* a bunch more statistics demands to this one block of
parallel tests.

Interesting. But even if this commit tipped us over the edge, it's not a
proof that the split patch was perfectly correct.

But even if we are vastly overstressing the I/O subsystem on these
boxes, why is it manifesting like this? pgstat never fsyncs the stats
temp file, so it should not have to wait for physical I/O I'd think.
Or perhaps the file rename() operations get fsync'd behind the scenes
by the filesystem?

My guess is that the amount of dirty data in page cache reaches, reaches
dirty_ratio/dirty_bytes, effectively forcing the writes to go directly
to the disks. Those ARM machines have rather low amounts of RAM
(typically 256-512MB), and the default values for dirty_ratio are ~20%
IIRC. So that's ~50MB-100MB.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#6)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On Fri, Dec 26, 2014 at 6:28 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 25.12.2014 21:14, Andres Freund wrote:

On 2014-12-25 14:36:42 -0500, Tom Lane wrote:

My guess is that a checkpoint happened at that time. Maybe it'd be a
good idea to make pg_regress start postgres with log_checkpoints
enabled? My guess is that we'd find horrendous 'sync' times.

Michael: Could you perhaps turn log_checkpoints on in the config?

Logging timestamps (using log_line_prefux) would be also helpful.

Done. If have been wondering about those failures as well but didn't
get the time to look around. FYI, hamster is running with a 8GB class
10 SD card able to do 40M/s in read, card that I changed 2 weeks ago
btw, the former 4GB card beginning to show I/O errors, RIP to it. So
this is what is triggering the wait timeouts more frequently... But I
have no real explanation why REL9_4_STABLE shows no signs of failures.

For the time being, what about putting pg_stats_tmp into a ram disk to
leverage the I/O? Whatever what we come up with, I imagine that I/O
will still be tight on hamster.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

So I think a better way to deal with that warning would be a good
idea. Besides somehow making the mechanism there are two ways to attack
this that I can think of, neither of them awe inspiring:

1) Make that WARNING a LOG message instead. Since those don't get send
to the client with default settings...
2) Increase PGSTAT_MAX_WAIT_TIME even further than what 99b545 increased
it to.

Yeah, I've been getting more annoyed by that too lately. I keep wondering
though whether there's an actual bug underneath that behavior that we're
failing to see.

I think the first thing to do is reconsider usage of PGSTAT_RETRY_DELAY
instead of PGSTAT_STAT_INTERVAL in autovacuum workers. That decreases
the wait time 50-fold, if I recall this correctly, and causes large
amounts of extra I/O traffic.

BTW, I notice that in the current state of pgstat.c, all the logic for
keeping track of request arrival times is dead code, because nothing is
actually looking at DBWriteRequest.request_time. This makes me think that
somebody simplified away some logic we maybe should have kept.

I will have a look. I remember being confused about this at some point
when reviewing that patch.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#13)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

Yeah, I've been getting more annoyed by that too lately. I keep wondering
though whether there's an actual bug underneath that behavior that we're
failing to see.

I think the first thing to do is reconsider usage of PGSTAT_RETRY_DELAY
instead of PGSTAT_STAT_INTERVAL in autovacuum workers. That decreases
the wait time 50-fold, if I recall this correctly, and causes large
amounts of extra I/O traffic.

Yeah --- that means that instead of the normal behavior that a stats file
newer than 500 msec is good enough, an autovac worker insists on a stats
file newer than 10 msec. I did some experimentation on prairiedog, and
found that it's not hard at all to see autovac workers demanding multiple
stats writes per second:

2014-12-26 16:26:52.958 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:26:53.128 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:26:53.188 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:26:54.903 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:26:55.058 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:00.022 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:00.285 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:00.792 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:01.010 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:01.163 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:01.193 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:03.595 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:03.673 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:03.839 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:03.878 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:05.878 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:06.571 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:07.001 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:07.769 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:07.950 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:10.256 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:11.039 EST 21026 LOG: sending inquiry for database 45116
2014-12-26 16:27:11.402 EST 21026 LOG: sending inquiry for database 45116

The argument that autovac workers need fresher stats than anything else
seems pretty dubious to start with. Why shouldn't we simplify that down
to "they use PGSTAT_STAT_INTERVAL like everybody else"?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#14)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Tom Lane wrote:

The argument that autovac workers need fresher stats than anything else
seems pretty dubious to start with. Why shouldn't we simplify that down
to "they use PGSTAT_STAT_INTERVAL like everybody else"?

The point of wanting fresher stats than that, eons ago, was to avoid a
worker vacuuming a table that some other worker vacuumed more recently
than PGSTAT_STAT_INTERVAL. I realize now that the semantics we really
want was something like "stats no older than XYZ" where the given value
is the timestamp at which we start checking; if we get anything newer
than that it would be okay, but we currently reject it because of lack
of a more appropriate API. (If it takes more than PGSTAT_STAT_INTERVAL
to get the stats back, a regular backend would ask for fresher stats,
but to an autovac worker they would be good enough as long as they are
newer than its recheck start time.)

Nowadays we can probably disregard the whole issue, since starting a new
vacuum just after the prior one finished should not cause much stress to
the system thanks to the visibility map.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alvaro Herrera (#15)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On 12/27/2014 12:16 AM, Alvaro Herrera wrote:

Tom Lane wrote:

The argument that autovac workers need fresher stats than anything else
seems pretty dubious to start with. Why shouldn't we simplify that down
to "they use PGSTAT_STAT_INTERVAL like everybody else"?

The point of wanting fresher stats than that, eons ago, was to avoid a
worker vacuuming a table that some other worker vacuumed more recently
than PGSTAT_STAT_INTERVAL. I realize now that the semantics we really
want was something like "stats no older than XYZ" where the given value
is the timestamp at which we start checking; if we get anything newer
than that it would be okay, but we currently reject it because of lack
of a more appropriate API. (If it takes more than PGSTAT_STAT_INTERVAL
to get the stats back, a regular backend would ask for fresher stats,
but to an autovac worker they would be good enough as long as they are
newer than its recheck start time.)

Nowadays we can probably disregard the whole issue, since starting a new
vacuum just after the prior one finished should not cause much stress to
the system thanks to the visibility map.

Vacuuming is far from free, even if the visibility map says that most
pages are visible to all: you still scan all indexes, if you remove any
dead tuples at all.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#16)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 12/27/2014 12:16 AM, Alvaro Herrera wrote:

Tom Lane wrote:

The argument that autovac workers need fresher stats than anything else
seems pretty dubious to start with. Why shouldn't we simplify that down
to "they use PGSTAT_STAT_INTERVAL like everybody else"?

The point of wanting fresher stats than that, eons ago, was to avoid a
worker vacuuming a table that some other worker vacuumed more recently
than PGSTAT_STAT_INTERVAL. ...
Nowadays we can probably disregard the whole issue, since starting a new
vacuum just after the prior one finished should not cause much stress to
the system thanks to the visibility map.

Vacuuming is far from free, even if the visibility map says that most
pages are visible to all: you still scan all indexes, if you remove any
dead tuples at all.

With typical autovacuum settings, I kinda doubt that there's much value in
reducing the window for this problem from 500ms to 10ms. As Alvaro says,
this was just a partial, kluge solution from the start --- if we're
worried about such duplicate vacuuming, we should undertake a real
solution that closes the window altogether. In any case, timeouts
occurring inside autovacuum are not directly causing the buildfarm
failures, since autovacuum's log entries don't reflect into regression
outputs. (It's possible that autovacuum's tight tolerance is contributing
to the failures by increasing the load on the stats collector, but I'm
not sure I believe that.)

To get back to that original complaint about buildfarm runs failing,
I notice that essentially all of those failures are coming from "wait
timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself
has no need to read the stats files. What's actually causing these
messages is failure to get a timely response in pgstat_vacuum_stat().
So let me propose a drastic solution: let's dike out this bit in vacuum.c:

/*
* Send info about dead objects to the statistics collector, unless we are
* in autovacuum --- autovacuum.c does this for itself.
*/
if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
pgstat_vacuum_stat();

This would have the effect of transferring all responsibility for
dead-stats-entry cleanup to autovacuum. For ordinary users, I think
that'd be just fine. It might be less fine though for people who
disable autovacuum, if there still are any. To analyze the effects
on them, let's break down what pgstat_vacuum_stat() actually does:

1. It causes drops of whole databases from pgstat's tables. I claim
we don't need to do that at this level. There's no issue unless
the stats collector missed the PgStat_MsgDropdb message when the database
was dropped; and even if it did, the negative consequences of having
a dead DB's stats still lying around are pretty minimal since we split
up the stats files. There will be no extra I/O except for one small
record in the global stats file. So I think we can well afford to say
that with autovac off, such missed databases only get cleaned up the
next time an autovac is forced for antiwraparound.

2. Within the current database, it causes drops of pgstat entries for
dropped tables. This would be a tad annoying if you have lots of
transient tables and no or minimal autovacuuming. However, lots of
transient tables can be a pain point anyway, because we don't currently
have any mechanism other than pgstat_vacuum_stat() for cleaning up
per-table stats for dropped tables. It seems like it might be time to do
something about that. We could probably extend the PgStat_TableXactStatus
infrastructure to keep track of whether a table was created or deleted in
the current transaction, and solve the problem without very much new code.
Or maybe we could move the responsibility for reporting stale entries
into the code that reads the stats files for the stats views.

3. Within the current database, it causes drops of pgstat entries for
dropped functions, if you're tracking per-function execution stats.
Since that's not the default, maybe we could get away with saying that
we don't clean up such dead entries very often unless you're running
autovacuum. I don't really want to propose building the infrastructure
to support dropping such entries retail.

Or, much more simply, we could conclude that it's not that important
if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
the code so that we don't bleat when the file-reading request comes from
that source. This should be a back-patchable fix, whereas #2 above might
be a bit too complicated for that.

Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#3)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On Thu, Dec 25, 2014 at 09:14:36PM +0100, Andres Freund wrote:

My guess is that a checkpoint happened at that time. Maybe it'd be a
good idea to make pg_regress start postgres with log_checkpoints
enabled? My guess is that we'd find horrendous 'sync' times.

+1, independent of $SUBJECT. How about log_autovacuum_min_duration=0, too?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#17)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 12/27/2014 12:16 AM, Alvaro Herrera wrote:

Tom Lane wrote:

The argument that autovac workers need fresher stats than anything else
seems pretty dubious to start with. Why shouldn't we simplify that down
to "they use PGSTAT_STAT_INTERVAL like everybody else"?

The point of wanting fresher stats than that, eons ago, was to avoid a
worker vacuuming a table that some other worker vacuumed more recently
than PGSTAT_STAT_INTERVAL. ...
Nowadays we can probably disregard the whole issue, since starting a new
vacuum just after the prior one finished should not cause much stress to
the system thanks to the visibility map.

Vacuuming is far from free, even if the visibility map says that most
pages are visible to all: you still scan all indexes, if you remove any
dead tuples at all.

With typical autovacuum settings, I kinda doubt that there's much value in
reducing the window for this problem from 500ms to 10ms. As Alvaro says,
this was just a partial, kluge solution from the start --- if we're
worried about such duplicate vacuuming, we should undertake a real
solution that closes the window altogether. In any case, timeouts
occurring inside autovacuum are not directly causing the buildfarm
failures, since autovacuum's log entries don't reflect into regression
outputs. (It's possible that autovacuum's tight tolerance is contributing
to the failures by increasing the load on the stats collector, but I'm
not sure I believe that.)

To get back to that original complaint about buildfarm runs failing,
I notice that essentially all of those failures are coming from "wait
timeout" warnings reported by manual VACUUM commands. Now, VACUUM itself
has no need to read the stats files. What's actually causing these
messages is failure to get a timely response in pgstat_vacuum_stat().
So let me propose a drastic solution: let's dike out this bit in vacuum.c:

/*
* Send info about dead objects to the statistics collector, unless we are
* in autovacuum --- autovacuum.c does this for itself.
*/
if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
pgstat_vacuum_stat();

This would have the effect of transferring all responsibility for
dead-stats-entry cleanup to autovacuum. For ordinary users, I think
that'd be just fine. It might be less fine though for people who
disable autovacuum, if there still are any.

-1. I don't think it's a good idea to inflict pain on people who want
to schedule their vacuums manually (and yes, there are some) to get
clean buildfarm runs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#19)
Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This would have the effect of transferring all responsibility for
dead-stats-entry cleanup to autovacuum. For ordinary users, I think
that'd be just fine. It might be less fine though for people who
disable autovacuum, if there still are any.

-1. I don't think it's a good idea to inflict pain on people who want
to schedule their vacuums manually (and yes, there are some) to get
clean buildfarm runs.

Did you read the rest of it?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#20)
#22Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#17)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#22)
#24Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#24)
#26Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#23)
#28Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#31)
#33Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#32)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#32)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#33)
#36Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#37)
#39Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#38)
#40Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#6)
#41Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#40)
#42Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Michael Paquier (#41)
#43Matt Kelly
mkellycs@gmail.com
In reply to: Tomas Vondra (#42)
#44Andres Freund
andres@anarazel.de
In reply to: Matt Kelly (#43)