Feature request (or at least discussion): enable autovaccum on temp tables

Started by Ivan Vorasalmost 7 years ago11 messagesgeneral
Jump to latest
#1Ivan Voras
ivoras@gmail.com

Hello,

The reason why we are using temp tables is to allow concurrent runs on some
very large reports which involve creating and heavily churning dozens of
very large tables.

The problem we're facing is that if we DON'T use temp tables, the reports
finish in a couple of hours. If we DO use temp tables, they finish in a
couple of weeks. While using regular tables, autovacuum runs pretty much
continuously.

We'll try manually vacuuming the tables after major operations, but that's
kind of kludgy. What would help is a GUC setting which toggles this
behaviour.

Since AFAIK temp tables are very close to unlogged ordinary tables, what
would stop this feature from being implemented?

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Ivan Voras (#1)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

Hi

pá 31. 5. 2019 v 17:26 odesílatel Ivan Voras <ivoras@gmail.com> napsal:

Hello,

The reason why we are using temp tables is to allow concurrent runs on
some very large reports which involve creating and heavily churning dozens
of very large tables.

The problem we're facing is that if we DON'T use temp tables, the reports
finish in a couple of hours. If we DO use temp tables, they finish in a
couple of weeks. While using regular tables, autovacuum runs pretty much
continuously.

We'll try manually vacuuming the tables after major operations, but that's
kind of kludgy. What would help is a GUC setting which toggles this
behaviour.

Since AFAIK temp tables are very close to unlogged ordinary tables, what
would stop this feature from being implemented?

I agree so this is not consistent behave. On second hand, for almost temp
table usage using autovacuum has not benefit.

autovacuum check tables once per minute, and working on closed
transactions. Lot of times temporary tables are filled inside some batch,
and immediately some query is started over this table. It is very low
probability to analyze temp table at good time automatically. So only
explicit call of ANALYZE is correct solution.

Regards

Pavel

#3Ivan Voras
ivoras@gmail.com
In reply to: Pavel Stehule (#2)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

On Fri, 31 May 2019 at 17:34, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hi

pá 31. 5. 2019 v 17:26 odesílatel Ivan Voras <ivoras@gmail.com> napsal:

Hello,

autovacuum check tables once per minute, and working on closed

transactions. Lot of times temporary tables are filled inside some batch,
and immediately some query is started over this table. It is very low
probability to analyze temp table at good time automatically. So only
explicit call of ANALYZE is correct solution.

There are two types of workloads we're talking about: short ones, for which
autovacuum is probably not worth it, and long-running ones (in our case,
hours), where it is. So that's why I'm advocating a GUC toggle.

#4Adam Brusselback
adambrusselback@gmail.com
In reply to: Ivan Voras (#1)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

Temp tables are not visibile outside of a single connection, so the
autovacuum worker connection isn't able to see it.

Are you sure that it's actually an issue with accumulating dead tuples, and
not an issue with bad statistics?

In my processes which are heavy on temp tables, I have to manually run
analyze after populating any temp tables.

Just a thought.
-Adam

#5Ivan Voras
ivoras@gmail.com
In reply to: Adam Brusselback (#4)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

On Fri, 31 May 2019 at 17:38, Adam Brusselback <adambrusselback@gmail.com>
wrote:

Temp tables are not visibile outside of a single connection, so the
autovacuum worker connection isn't able to see it.

I can connect as a superuser via psql while temp tables are being used and
I can examine tables in the pg_temp_* schemas, so that particular
separation isn't really a strong one. If there's a reason why autovacuum
isn't doable on temp tables, it's not that one. (maybe autovacuum would
work on the tables while the session disconnects, preventing the temp
tables from being dropped?)

#6Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Ivan Voras (#1)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

"Ivan" == Ivan Voras <ivoras@gmail.com> writes:

Ivan> Since AFAIK temp tables are very close to unlogged ordinary
Ivan> tables, what would stop this feature from being implemented?

The key difference between temp tables and other tables is that temp
table data does not use the shared_buffers but is buffered only in
backend-local memory.

This means that other processes (like, say, an autovacuum process) can
not access the content of temp tables. So what you want is not possible.

--
Andrew (irc:RhodiumToad)

#7Chris Browne
cbbrowne@acm.org
In reply to: Ivan Voras (#1)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

On Fri, 31 May 2019 at 11:26, Ivan Voras <ivoras@gmail.com> wrote:

Hello,

The reason why we are using temp tables is to allow concurrent runs on
some very large reports which involve creating and heavily churning dozens
of very large tables.

The problem we're facing is that if we DON'T use temp tables, the reports
finish in a couple of hours. If we DO use temp tables, they finish in a
couple of weeks. While using regular tables, autovacuum runs pretty much
continuously.

We'll try manually vacuuming the tables after major operations, but that's
kind of kludgy. What would help is a GUC setting which toggles this
behaviour.

Since AFAIK temp tables are very close to unlogged ordinary tables, what
would stop this feature from being implemented?

I do hit some situations similar to this; it does NOT lead me to wanting to
VACUUM a lot; rather, after any query where I populate a lot of data in a
temporary table, I need to add an explicit ANALYZE, which tends to be more
than sufficient.

It is problematic for autovacuum to do this for two reasons:

1. Frequently, the moment that the ANALYZE *needs* to be run is the instant
after the table is populated, because if it waits any longer, that will be
too late to help the Immediately Following Query where the freshly
populated temp table's data was used.

In effect, the only "safe" answer for those processes is for the SQL code
to include an ANALYZE in the crucial spot. If autovacuum tried to do this
work, it might get to the table 5 seconds late, and thereby not do the good
that you need.

2. Temp tables are only associated (and visible) in the session in which
you are doing the work. autovacuum operates inside an ordinary session
context, and in a separate connected session, so it can't see your temp
tables; they are not handled in shared memory that crosses contexts to the
one in which those temp tables exist. Changing that would be way more
difficult than meets the eye.

I would actually suggest that it is likely that VACUUM never needs to be
done, as there's something wrong if a lot of garbage is being generated in
temp tables. All that should be needed are ANALYZE requests, and it makes
sense to annotate the code with an ANALYZE after any time a table is
massively updated.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#8Ivan Voras
ivoras@gmail.com
In reply to: Chris Browne (#7)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

hi,

On Fri, 31 May 2019 at 17:54, Christopher Browne <cbbrowne@gmail.com> wrote:

2. Temp tables are only associated (and visible) in the session in which
you are doing the work. autovacuum operates inside an ordinary session
context, and in a separate connected session, so it can't see your temp
tables; they are not handled in shared memory that crosses contexts to the
one in which those temp tables exist. Changing that would be way more
difficult than meets the eye.

Hmm yeah, I forgot about that aspect - temp tables not going though shared
buffers.

Thanks!

#9Ivan Voras
ivoras@gmail.com
In reply to: Andrew Gierth (#6)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

Hi,
On Fri, 31 May 2019 at 17:43, Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:

"Ivan" == Ivan Voras <ivoras@gmail.com> writes:

Ivan> Since AFAIK temp tables are very close to unlogged ordinary
Ivan> tables, what would stop this feature from being implemented?

The key difference between temp tables and other tables is that temp
table data does not use the shared_buffers but is buffered only in
backend-local memory.

This means that other processes (like, say, an autovacuum process) can
not access the content of temp tables. So what you want is not possible.

Understood.

So, a related question, since we have dozens of temp tables and a lot of
code, is there a way to look up what temp tables are being created by the
current session, so I can do a VACUUM or ANALYZE on all of them in bulk? I
know I can inspect pg_temp_* schema, but how to figure out which one is
from the current session?

#10Chris Browne
cbbrowne@acm.org
In reply to: Ivan Voras (#9)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

On Fri, May 31, 2019, 12:26 PM Ivan Voras <ivoras@gmail.com> wrote:

Hi,
On Fri, 31 May 2019 at 17:43, Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:

"Ivan" == Ivan Voras <ivoras@gmail.com> writes:

Ivan> Since AFAIK temp tables are very close to unlogged ordinary
Ivan> tables, what would stop this feature from being implemented?

The key difference between temp tables and other tables is that temp
table data does not use the shared_buffers but is buffered only in
backend-local memory.

This means that other processes (like, say, an autovacuum process) can
not access the content of temp tables. So what you want is not possible.

Understood.

So, a related question, since we have dozens of temp tables and a lot of
code, is there a way to look up what temp tables are being created by the
current session, so I can do a VACUUM or ANALYZE on all of them in bulk? I
know I can inspect pg_temp_* schema, but how to figure out which one is
from the current session?

I think you might want to look at information_schema.tables; I seem to
recall that view restricting itself to the tables actually visible to your
connection.

Show quoted text
#11Michael Lewis
mlewis@entrata.com
In reply to: Ivan Voras (#9)
Re: Feature request (or at least discussion): enable autovaccum on temp tables

So, a related question, since we have dozens of temp tables and a lot of
code, is there a way to look up what temp tables are being created by the
current session, so I can do a VACUUM or ANALYZE on all of them in bulk? I
know I can inspect pg_temp_* schema, but how to figure out which one is
from the current session?

Any table in pg_temp schema would belong to the current session.
Technically the schema has some number appended on the end, but to the
creating connection of those temp tables, it is just pg_temp. Probably you
should be analyzing the temp tables immediately after insert/update/delete,
OR just before first use. As others have said, you have no stats of most
common values or historgrams or any other info unless you analyze. If your
temp tables have many columns which are only selected and not used for
joins or ordering, then you may not need stats on those columns and could
save some cycles by specifying the columns, eg. *analyze
pg_temp.temp_table_name( join_field1, join_field2, order_by_field );*