AutoVacuum starvation from sinval messages

Started by Jeff Janesabout 13 years ago20 messages
#1Jeff Janes
jeff.janes@gmail.com

While doing a rather contrived test of something else, using:

perl -le 'print "create table foo (x serial); drop table foo;"
foreach (1..1e6)'|psql

I noticed starvation of autovacuum launcher process and bloating of catalogs.

Basically the WaitLatch in autovacuum.c line 602 (in head) never returns.

This was introduced by a180776f7a1c4554f214b, "Teach unix_latch.c to
use poll() where available"

When the poll() gets EINTR by SIGUSR1 for the invalidation, it
restarts at the full timeout (60 seconds in this case) with no
accounting for the time already waited.

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Should gettimeofday be called before and after the poll() and then the
difference deducted from timeout?

Cheers,

Jeff

#2Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#1)
1 attachment(s)
Re: AutoVacuum starvation from sinval messages

On Thu, Nov 8, 2012 at 12:36 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Should gettimeofday be called before and after the poll() and then the
difference deducted from timeout?

Something like this?

Attachments:

poll_timeout.patchapplication/octet-stream; name=poll_timeout.patchDownload
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
new file mode 100644
index bbd1810..9f934f3
*** a/src/backend/port/unix_latch.c
--- b/src/backend/port/unix_latch.c
***************
*** 46,51 ****
--- 46,54 ----
  #ifdef HAVE_SYS_SELECT_H
  #include <sys/select.h>
  #endif
+ #ifdef HAVE_POLL
+ #include "portability/instr_time.h"
+ #endif
  
  #include "miscadmin.h"
  #include "postmaster/postmaster.h"
*************** WaitLatchOrSocket(volatile Latch *latch,
*** 255,260 ****
--- 258,267 ----
  	waiting = true;
  	do
  	{
+ #ifdef HAVE_POLL
+ 		instr_time  poll_start;
+ 		instr_time  poll_time;
+ #endif
  		/*
  		 * Clear the pipe, then check if the latch is set already. If someone
  		 * sets the latch between this and the poll()/select() below, the
*************** WaitLatchOrSocket(volatile Latch *latch,
*** 311,323 ****
  		}
  
  		/* Sleep */
  		rc = poll(pfds, nfds, (int) timeout);
  
  		/* Check return code */
  		if (rc < 0)
  		{
! 			if (errno == EINTR)
  				continue;
  			waiting = false;
  			ereport(ERROR,
  					(errcode_for_socket_access(),
--- 318,340 ----
  		}
  
  		/* Sleep */
+ 		INSTR_TIME_SET_CURRENT(poll_start);
  		rc = poll(pfds, nfds, (int) timeout);
  
  		/* Check return code */
  		if (rc < 0)
  		{
! 			if (errno == EINTR) 
! 			{
! 				if (timeout>0) 
! 				{
! 					INSTR_TIME_SET_CURRENT(poll_time);
! 					INSTR_TIME_SUBTRACT(poll_time, poll_start);
! 					timeout-=INSTR_TIME_GET_MILLISEC(poll_time);
! 					if (timeout < 0) timeout=0;
! 				};
  				continue;
+ 			};
  			waiting = false;
  			ereport(ERROR,
  					(errcode_for_socket_access(),
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#1)
Re: AutoVacuum starvation from sinval messages

Jeff Janes <jeff.janes@gmail.com> writes:

Basically the WaitLatch in autovacuum.c line 602 (in head) never returns.

This was introduced by a180776f7a1c4554f214b, "Teach unix_latch.c to
use poll() where available"

When the poll() gets EINTR by SIGUSR1 for the invalidation, it
restarts at the full timeout (60 seconds in this case) with no
accounting for the time already waited.

Hmm. That seems like a problem, all right, but I don't believe the
claim that it applies only to the poll() code path. On many platforms
the select() path would have the same issue.

It's a bit annoying to have to incur an extra gettimeofday call per
WaitLatch operation in case this happens, but there doesn't seem to
be a lot of choice. I guess we can arrange to incur the extra call
only when a timeout is requested --- but IIRC that's going to be
the majority of usages anyway ...

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#2)
Re: AutoVacuum starvation from sinval messages

Jeff Janes <jeff.janes@gmail.com> writes:

On Thu, Nov 8, 2012 at 12:36 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Should gettimeofday be called before and after the poll() and then the
difference deducted from timeout?

Something like this?

Meh. Not like that, because

(1) we shouldn't add overhead when no timeout is requested.
(2) it'd be better to do only one gettimeofday at the start and then
one in the EINTR path, not one per loop; this saves syscalls and also
avoids slippage of the timeout time.
(3) the select() path needs a similar fix.

But I think you have a good idea to use the INSTR_TIME macros instead
of custom code, even though this is Unix-only so there's not more than
one underlying implementation.

Will fix this up and commit.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: AutoVacuum starvation from sinval messages

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

When the poll() gets EINTR by SIGUSR1 for the invalidation, it
restarts at the full timeout (60 seconds in this case) with no
accounting for the time already waited.

Hmm. That seems like a problem, all right, but I don't believe the
claim that it applies only to the poll() code path. On many platforms
the select() path would have the same issue.

BTW, doesn't win32_latch.c have the identical problem? I don't see it
updating the timeout after a signal, either.

regards, tom lane

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeff Janes (#1)
Re: AutoVacuum starvation from sinval messages

On 8 November 2012 20:36, Jeff Janes <jeff.janes@gmail.com> wrote:

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Yes, its a bug, but do you really believe the above? In what cases?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#6)
Re: AutoVacuum starvation from sinval messages

On 2012-11-08 22:40:43 +0000, Simon Riggs wrote:

On 8 November 2012 20:36, Jeff Janes <jeff.janes@gmail.com> wrote:

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Yes, its a bug, but do you really believe the above? In what cases?

Temporary tables might a good contender.

Greetings,

Andres Freund

#8Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Simon Riggs (#6)
Re: AutoVacuum starvation from sinval messages

On 11/08/2012 11:40 PM, Simon Riggs wrote:

On 8 November 2012 20:36, Jeff Janes <jeff.janes@gmail.com> wrote:

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Yes, its a bug, but do you really believe the above? In what cases?

Could heavy use of temporary tables cause this ?

Hannu

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#6)
Re: AutoVacuum starvation from sinval messages

Simon Riggs <simon@2ndQuadrant.com> writes:

On 8 November 2012 20:36, Jeff Janes <jeff.janes@gmail.com> wrote:

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Yes, its a bug, but do you really believe the above? In what cases?

It doesn't take a whole lot of DDL to provoke an sinval overrun, if
the recipient process is just sitting idle and not servicing the
messages. I think Jeff's concern is entirely valid.

regards, tom lane

#10Jeff Janes
jeff.janes@gmail.com
In reply to: Hannu Krosing (#8)
Re: AutoVacuum starvation from sinval messages

On Thu, Nov 8, 2012 at 2:50 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

On 11/08/2012 11:40 PM, Simon Riggs wrote:

On 8 November 2012 20:36, Jeff Janes <jeff.janes@gmail.com> wrote:

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Yes, its a bug, but do you really believe the above? In what cases?

We see lots of traffic on the mail list about people trying to dump
several hundred thousand tables, or they can only create one database
every two minutes, or truncating hundreds of tables at a time over and
over again gets slow, etc. I know little about the internal of the
invalidation code, but I would think doing that kind of thing must
generate a lot of them.

Could heavy use of temporary tables cause this ?

It looks like they do. I'm not sure what is being invalidated in
those cases, but something seems to be.

Cheers,

Jeff

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: AutoVacuum starvation from sinval messages

I wrote:

Hmm. That seems like a problem, all right, but I don't believe the
claim that it applies only to the poll() code path. On many platforms
the select() path would have the same issue.

BTW, doesn't win32_latch.c have the identical problem? I don't see it
updating the timeout after a signal, either.

I've committed a patch for this, but the win32_latch version is entirely
innocent of any testing. The buildfarm will probably show any mistake
there, but if someone would like to try Jeff's test case on Windows just
to make sure the autovac launcher wakes up on time, it wouldn't hurt.

regards, tom lane

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeff Janes (#10)
Re: AutoVacuum starvation from sinval messages

On 8 November 2012 23:58, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Nov 8, 2012 at 2:50 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

On 11/08/2012 11:40 PM, Simon Riggs wrote:

On 8 November 2012 20:36, Jeff Janes <jeff.janes@gmail.com> wrote:

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Yes, its a bug, but do you really believe the above? In what cases?

We see lots of traffic on the mail list about people trying to dump
several hundred thousand tables, or they can only create one database
every two minutes, or truncating hundreds of tables at a time over and
over again gets slow, etc. I know little about the internal of the
invalidation code, but I would think doing that kind of thing must
generate a lot of them.

OK, so the problem is *any* sinval. I thought you meant one sinval per
object per minute, which seemed much less likely.

I agree one sinval per minute for long periods is actually quite likely.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#9)
Re: AutoVacuum starvation from sinval messages

On Thu, Nov 8, 2012 at 6:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 8 November 2012 20:36, Jeff Janes <jeff.janes@gmail.com> wrote:

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Yes, its a bug, but do you really believe the above? In what cases?

It doesn't take a whole lot of DDL to provoke an sinval overrun, if
the recipient process is just sitting idle and not servicing the
messages. I think Jeff's concern is entirely valid.

So, do we need a sinval overrun or just a sinval message to provoke
starvation? The former would be bad but the latter would be really,
really bad. IIRC the queue has 4K entries, and IIRC a single DDL
operation might provoke a couple of sinvals, but I'm thinking that
somebody would probably have to be creating >1024 temp tables a minute
to overrun the queue, which is very possible but not necessarily
common. OTOH, creating 1 temp table a minute would hit a much broader
swath of users.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#13)
Re: AutoVacuum starvation from sinval messages

On 9 November 2012 13:42, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 8, 2012 at 6:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 8 November 2012 20:36, Jeff Janes <jeff.janes@gmail.com> wrote:

It does not seem outrageous to me that there would be real-world
conditions in which invalidations would be sent more than once a
minute over prolonged periods, so this total starvation seems like a
bug.

Yes, its a bug, but do you really believe the above? In what cases?

It doesn't take a whole lot of DDL to provoke an sinval overrun, if
the recipient process is just sitting idle and not servicing the
messages. I think Jeff's concern is entirely valid.

So, do we need a sinval overrun or just a sinval message to provoke
starvation? The former would be bad but the latter would be really,
really bad. IIRC the queue has 4K entries, and IIRC a single DDL
operation might provoke a couple of sinvals, but I'm thinking that
somebody would probably have to be creating >1024 temp tables a minute
to overrun the queue, which is very possible but not necessarily
common. OTOH, creating 1 temp table a minute would hit a much broader
swath of users.

The point is moot because latches don't work that way anymore.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#14)
Re: AutoVacuum starvation from sinval messages

On Fri, Nov 9, 2012 at 9:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

So, do we need a sinval overrun or just a sinval message to provoke
starvation? The former would be bad but the latter would be really,
really bad. IIRC the queue has 4K entries, and IIRC a single DDL
operation might provoke a couple of sinvals, but I'm thinking that
somebody would probably have to be creating >1024 temp tables a minute
to overrun the queue, which is very possible but not necessarily
common. OTOH, creating 1 temp table a minute would hit a much broader
swath of users.

The point is moot because latches don't work that way anymore.

One of us is confused, because IIUC Tom just fixed this this morning,
and I'm trying to figure out how many users will be affected by it,
and how seriously. Like, do we need an immediate minor release?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#15)
Re: AutoVacuum starvation from sinval messages

On 9 November 2012 14:16, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 9, 2012 at 9:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

So, do we need a sinval overrun or just a sinval message to provoke
starvation? The former would be bad but the latter would be really,
really bad. IIRC the queue has 4K entries, and IIRC a single DDL
operation might provoke a couple of sinvals, but I'm thinking that
somebody would probably have to be creating >1024 temp tables a minute
to overrun the queue, which is very possible but not necessarily
common. OTOH, creating 1 temp table a minute would hit a much broader
swath of users.

The point is moot because latches don't work that way anymore.

One of us is confused, because IIUC Tom just fixed this this morning,
and I'm trying to figure out how many users will be affected by it,
and how seriously. Like, do we need an immediate minor release?

You asked what provokes starvation, and the answer is nothing anymore,
since Tom's commit. No confusion here...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17Kevin Grittner
kgrittn@mail.com
In reply to: Simon Riggs (#16)
Re: AutoVacuum starvation from sinval messages

Simon Riggs wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

One of us is confused, because IIUC Tom just fixed this this
morning, and I'm trying to figure out how many users will be
affected by it, and how seriously. Like, do we need an immediate
minor release?

You asked what provokes starvation, and the answer is nothing
anymore, since Tom's commit. No confusion here...

So for all those end users running production with a build compiled
from HEAD after Tom's commit this morning there is no issue. I'm
wondering about those using something else in production. What does
it take for their autovacuum to be stalled?

-Kevin

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#13)
Re: AutoVacuum starvation from sinval messages

Robert Haas <robertmhaas@gmail.com> writes:

So, do we need a sinval overrun or just a sinval message to provoke
starvation? The former would be bad but the latter would be really,
really bad.

The former. Actually, a signal will be sent as soon as a backend is
determined to be unreasonably far behind, which I think is when the
message queue is half full. In the AV launcher case, the queue probably
never actually overflows; but the launcher has to wake up to process
messages every so often, and that wakeup cycle is breaking the timeout
management in WaitLatch.

IIRC the queue has 4K entries, and IIRC a single DDL
operation might provoke a couple of sinvals, but I'm thinking that
somebody would probably have to be creating >1024 temp tables a minute
to overrun the queue, which is very possible but not necessarily
common.

Well, one DDL typically generates multiple messages --- one for each
catalog row added/modified/removed, roughly speaking. When I run the
constant create/drop example Jeff posted, I see the AV launcher getting
a catchup signal every few seconds. I didn't try to determine exactly
how many create/drop cycles that was, but I'm pretty sure it's a lot
less than 1000.

regards, tom lane

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#18)
Re: AutoVacuum starvation from sinval messages

Tom Lane escribió:

Robert Haas <robertmhaas@gmail.com> writes:

IIRC the queue has 4K entries, and IIRC a single DDL
operation might provoke a couple of sinvals, but I'm thinking that
somebody would probably have to be creating >1024 temp tables a minute
to overrun the queue, which is very possible but not necessarily
common.

Well, one DDL typically generates multiple messages --- one for each
catalog row added/modified/removed, roughly speaking. When I run the
constant create/drop example Jeff posted, I see the AV launcher getting
a catchup signal every few seconds. I didn't try to determine exactly
how many create/drop cycles that was, but I'm pretty sure it's a lot
less than 1000.

Just creating the sequence for the serial column means 16 pg_attribute
tuples. There's also two pg_class entries, one more pg_attribute, two
pg_type entries, a bunch of pg_depend entries ... I doubt it's less than
30 catalog tuples, all things considered. Double that for the drop. So
for a 4k entry table that needs to get 50% full, that's only ~35 temp
table creations like that.

I hadn't realized sequences used so many pg_attribute entries.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#20Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#19)
Re: AutoVacuum starvation from sinval messages

On Fri, Nov 9, 2012 at 10:08 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Tom Lane escribió:

Robert Haas <robertmhaas@gmail.com> writes:

IIRC the queue has 4K entries, and IIRC a single DDL
operation might provoke a couple of sinvals, but I'm thinking that
somebody would probably have to be creating >1024 temp tables a minute
to overrun the queue, which is very possible but not necessarily
common.

Well, one DDL typically generates multiple messages --- one for each
catalog row added/modified/removed, roughly speaking. When I run the
constant create/drop example Jeff posted, I see the AV launcher getting
a catchup signal every few seconds. I didn't try to determine exactly
how many create/drop cycles that was, but I'm pretty sure it's a lot
less than 1000.

Just creating the sequence for the serial column means 16 pg_attribute
tuples. There's also two pg_class entries, one more pg_attribute, two
pg_type entries, a bunch of pg_depend entries ... I doubt it's less than
30 catalog tuples, all things considered. Double that for the drop. So
for a 4k entry table that needs to get 50% full, that's only ~35 temp
table creations like that.

I hadn't realized sequences used so many pg_attribute entries.

Hmm. So, are we going to force a minor release for this, or do we
think it's not serious enough to warrant that?

I'm not expressing an opinion either way, just asking.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company