BUG #14290: materialized view refresh doesn't use temp_tablespace

Started by Rick Ottenover 9 years ago11 messagesbugs
Jump to latest
#1Rick Otten
rotten@windfish.net

The following bug has been logged on the website:

Bug reference: 14290
Logged by: Rick Otten
Email address: rotten@windfish.net
PostgreSQL version: 9.5.4
Operating system: Mac or Ubuntu
Description:

I've recently observed that if while refreshing a materialized view, temp
space is required, it will not use the default temp_tablespaces defined in
postgresql.conf. Instead it uses the tablespace where the materialized view
lives.

This surprised me the other day when my production database ran out of disk
space. I was able to confirm the behavior in my development environment.

To reproduce:
1) Set up a temp tablespace.
2) Configure temp_tablespaces in postgrsql.conf to use that tablespace.
3) Turn on logging of tablespace names.
4) Reduce work_mem (so that temp space is more likely to be required).
5) Bounce the DB, or reload the postgresql.conf
6) Refresh a fairly large materialized view.
7) Observe in the logs where the temp tablespace was created.

You can do it concurrently, or not. In both cases the temporary tables end
up in the same tablespace as the materialized view, not in the
temp_tablespace.

If this can't be easily changed to use the default temp tablespaces, it
would be good if the documentation were updated to alert materialized view
users to this caveat.

Refreshing a materialized view may use as much temporary space, or more,
than its current size. For example, I have a 100G materialized view.
Refreshing it uses almost 300G additional disk while the refresh is running.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Michael Paquier
michael@paquier.xyz
In reply to: Rick Otten (#1)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

On Fri, Aug 19, 2016 at 4:06 AM, <rotten@windfish.net> wrote:

I've recently observed that if while refreshing a materialized view, temp
space is required, it will not use the default temp_tablespaces defined in
postgresql.conf. Instead it uses the tablespace where the materialized view
lives.

If this can't be easily changed to use the default temp tablespaces, it
would be good if the documentation were updated to alert materialized view
users to this caveat.

Refreshing a materialized view may use as much temporary space, or more,
than its current size. For example, I have a 100G materialized view.
Refreshing it uses almost 300G additional disk while the refresh is running.

Per the docs:
https://www.postgresql.org/docs/9.6/static/runtime-config-client.html
This variable specifies tablespaces in which to create temporary
objects (temp tables and indexes on temp tables) when a CREATE command
does not explicitly specify a tablespace. Temporary files for purposes
such as sorting large data sets are also created in these tablespaces.

And as far as I know, there is no concept of temporary object for
matviews, so the documentation is correct IMO, and the behavior you
are seeing expected.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Rick Otten
rotten@windfish.net
In reply to: Michael Paquier (#2)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

Refreshing materialized views can definitely create temporary objects.
They even get logged when I turn on logging of temporary objects.

On 2016-08-24 03:47, Michael Paquier wrote:

On Fri, Aug 19, 2016 at 4:06 AM, <rotten@windfish.net> wrote:

I've recently observed that if while refreshing a materialized view, temp space is required, it will not use the default temp_tablespaces defined in postgresql.conf. Instead it uses the tablespace where the materialized view lives. If this can't be easily changed to use the default temp tablespaces, it would be good if the documentation were updated to alert materialized view users to this caveat. Refreshing a materialized view may use as much temporary space, or more, than its current size. For example, I have a 100G materialized view. Refreshing it uses almost 300G additional disk while the refresh is running.

Per the docs:
https://www.postgresql.org/docs/9.6/static/runtime-config-client.html [1]
This variable specifies tablespaces in which to create temporary
objects (temp tables and indexes on temp tables) when a CREATE command
does not explicitly specify a tablespace. Temporary files for purposes
such as sorting large data sets are also created in these tablespaces.

And as far as I know, there is no concept of temporary object for
matviews, so the documentation is correct IMO, and the behavior you
are seeing expected.

Links:
------
[1]: https://www.postgresql.org/docs/9.6/static/runtime-config-client.html
https://www.postgresql.org/docs/9.6/static/runtime-config-client.html

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Rick Otten (#1)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

On Thu, Aug 18, 2016 at 2:06 PM, <rotten@windfish.net> wrote:

I've recently observed that if while refreshing a materialized view, temp
space is required, it will not use the default temp_tablespaces defined in
postgresql.conf. Instead it uses the tablespace where the materialized view
lives.

This surprised me the other day when my production database ran out of disk
space. I was able to confirm the behavior in my development environment.

To reproduce:
1) Set up a temp tablespace.
2) Configure temp_tablespaces in postgrsql.conf to use that tablespace.
3) Turn on logging of tablespace names.
4) Reduce work_mem (so that temp space is more likely to be required).
5) Bounce the DB, or reload the postgresql.conf
6) Refresh a fairly large materialized view.
7) Observe in the logs where the temp tablespace was created.

You can do it concurrently, or not. In both cases the temporary tables end
up in the same tablespace as the materialized view, not in the
temp_tablespace.

You are confusing two completely different things: temporary
objects (created by statements starting with CREATE TEMPORARY) and
temporary files (created when data spills to disk during, for
example, a sort, hash, or materialization of data internal to
processing some statement). The former are placed based on
temp_tablespaces; the latter are normally placed in the
base/pgsql_tmp/ subdirectory.

It would be possible to place them underneath a tablespace
specified by temp_tablespaces, and it might even be a good
enhancement to implement, but that is not the normal or default
location for temporary files.

Out of curiosity, and to help justify this as a feature request
worth pursuing, can you explain why you want to do this? For
example, have you placed your temporary tablespace on a faster
medium?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Rick Otten
rotten@windfish.net
In reply to: Kevin Grittner (#4)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

The distinction between temporary objects that write to disk, and
temporary files, never occurred to me before. I had always thought they
were the same thing. I think it is a pretty confusing fine point.

Yes, I do have the temporary tablespace on the fastest disk since I
thought that when I spilled out of memory, I'd still want the best
performance I could get during those operations.

The temporary files generated during the refresh are several times
larger than the final materialized view. This tells me they are probably
doing much more I/O than the final (concurrent) write into the
materialized view. It made sense to put those temporary writes onto the
faster disk.

Additionally, in a tiered storage environment, I might put my regular
tablespace on a highly redundant (ie, expensive) disk array, and then
put my temporary tablespace on a fast, but cheaper disk array because I
don't really need the redundancy in the temporary tables the way I do
with my regular tables.

Another issue is a psychology of disk management. When I look at a 1000G
"regular" tablespace that only has 200G of tables on it, I might think
"oh, there is plenty of space there for more tables" or "oh, look at all
the space I'm wasting, I should pare that back and save a few dollars".

Whereas, when I see an 800G temp tablespace on the database, even if it
isn't full _at_the_moment_, I think to myself "wow, there must be some
big transient queries that need that space which must be why it is
there".

I now know that I need all that extra space in the "regular tablespace"
to support refreshing the materialized views and that in general I
should plan for lots of empty space in my regular tablespaces to support
that. As I mentioned, I discovered this when a refresh view concurrently
failed because it ran the regular tablespace out of disk, even though
the disk was only at 40% capacity (when the refresh wasn't running) and
I had a large temporary tablespace set up to support things like that.
Fortunately I had everything on logical volume managers so it was easy
to extend the regular tablespace on the fly once the issue became
apparent. This could have been a much more painful thing to discover.

Lastly, if I had several tablespaces with these sorts of materialized
views in them, they could all share the same temporary tablespace for
the refresh. Which means I wouldn't need so much extra space in every
tablespace (as long as they didn't all refresh at the same time).

On 2016-08-24 13:46, Kevin Grittner wrote:

You are confusing two completely different things: temporary

objects (created by statements starting with CREATE TEMPORARY) and
temporary files (created when data spills to disk during, for
example, a sort, hash, or materialization of data internal to
processing some statement). The former are placed based on
temp_tablespaces; the latter are normally placed in the
base/pgsql_tmp/ subdirectory.

It would be possible to place them underneath a tablespace
specified by temp_tablespaces, and it might even be a good
enhancement to implement, but that is not the normal or default
location for temporary files.

Out of curiosity, and to help justify this as a feature request
worth pursuing, can you explain why you want to do this? For
example, have you placed your temporary tablespace on a faster
medium?

--
Kevin Grittner
EDB: http://www.enterprisedb.com [1]
The Enterprise PostgreSQL Company

Links:
------
[1]: http://www.enterprisedb.com

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Rick Otten (#5)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

On Wed, Aug 24, 2016 at 1:21 PM, Rick Otten <rotten@windfish.net> wrote:

The distinction between temporary objects that write to disk, and temporary
files, never occurred to me before. I had always thought they were the same
thing. I think it is a pretty confusing fine point.

On further investigation, both are supposed to use the tablespaces
you specify, which does make this a bug. Will fix.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Kevin Grittner (#4)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

"Kevin" == Kevin Grittner <kgrittn@gmail.com> writes:

Kevin> You are confusing two completely different things: temporary
Kevin> objects (created by statements starting with CREATE TEMPORARY)
Kevin> and temporary files (created when data spills to disk during,
Kevin> for example, a sort, hash, or materialization of data internal
Kevin> to processing some statement). The former are placed based on
Kevin> temp_tablespaces; the latter are normally placed in the
Kevin> base/pgsql_tmp/ subdirectory.

This is not true.

If temp_tablespaces is set, then all sort / hash / materialization files
will use the list of tablespaces in temp_tablespaces on a rotational
basis (to spread load). The only exception is for temp files that might
have to outlive the current transaction; these are forced into the
default tablespace of the current database. See storage/file/fd.c

However, materialized view refresh (without CONCURRENTLY) needs the new
copy of the matview's data to be in the same tablespace as the old copy,
since it moves the data into place via a heap swap. This restriction
does not apply to sort/hash/materialization files created by the
matview's query.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Rick Otten (#1)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

"rotten" == rotten <rotten@windfish.net> writes:

rotten> I've recently observed that if while refreshing a materialized
rotten> view, temp space is required, it will not use the default
rotten> temp_tablespaces defined in postgresql.conf. Instead it uses
rotten> the tablespace where the materialized view lives.

rotten> This surprised me the other day when my production database ran
rotten> out of disk space. I was able to confirm the behavior in my
rotten> development environment.

rotten> To reproduce:
rotten> 1) Set up a temp tablespace.
rotten> 2) Configure temp_tablespaces in postgrsql.conf to use that tablespace.
rotten> 3) Turn on logging of tablespace names.
rotten> 4) Reduce work_mem (so that temp space is more likely to be required).
rotten> 5) Bounce the DB, or reload the postgresql.conf
rotten> 6) Refresh a fairly large materialized view.
rotten> 7) Observe in the logs where the temp tablespace was created.

I can't reproduce this.

postgres=# show server_version;
server_version
----------------
9.5.4

postgres=# create materialized view mvx1 as select * from generate_series(1,100000);
SELECT 100000
postgres=# set temp_tablespaces = 't1';
SET
postgres=# set log_temp_files = 0;
SET
postgres=# set work_mem = '64kB';
SET

postgres=# refresh materialized view mvx1;
LOG: temporary file: path "pg_tblspc/18002/PG_9.5_201510051/pgsql_tmp/pgsql_tmp91239.1", size 1400000
STATEMENT: refresh materialized view mvx1;
REFRESH MATERIALIZED VIEW

(this clearly shows that the temp file created by the FunctionScan of
generate_series is located in tablespace oid 18002, which happens to be
t1)

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Rick Otten
rotten@windfish.net
In reply to: Andrew Gierth (#8)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

Is t1 also the tablespace for your materialized view? (It isn't clear
from what you've posted below.)

In my environment my default tablespace for the database is different
than the temp_tablespace.

On 2016-08-24 14:58, Andrew Gierth wrote:

"rotten" == rotten <rotten@windfish.net> writes:

rotten> I've recently observed that if while refreshing a materialized
rotten> view, temp space is required, it will not use the default
rotten> temp_tablespaces defined in postgresql.conf. Instead it uses
rotten> the tablespace where the materialized view lives.

rotten> This surprised me the other day when my production database ran
rotten> out of disk space. I was able to confirm the behavior in my
rotten> development environment.

rotten> To reproduce:
rotten> 1) Set up a temp tablespace.
rotten> 2) Configure temp_tablespaces in postgrsql.conf to use that
tablespace.
rotten> 3) Turn on logging of tablespace names.
rotten> 4) Reduce work_mem (so that temp space is more likely to be
required).
rotten> 5) Bounce the DB, or reload the postgresql.conf
rotten> 6) Refresh a fairly large materialized view.
rotten> 7) Observe in the logs where the temp tablespace was created.

I can't reproduce this.

postgres=# show server_version;
server_version
----------------
9.5.4

postgres=# create materialized view mvx1 as select * from
generate_series(1,100000);
SELECT 100000
postgres=# set temp_tablespaces = 't1';
SET
postgres=# set log_temp_files = 0;
SET
postgres=# set work_mem = '64kB';
SET

postgres=# refresh materialized view mvx1;
LOG: temporary file: path
"pg_tblspc/18002/PG_9.5_201510051/pgsql_tmp/pgsql_tmp91239.1", size
1400000
STATEMENT: refresh materialized view mvx1;
REFRESH MATERIALIZED VIEW

(this clearly shows that the temp file created by the FunctionScan of
generate_series is located in tablespace oid 18002, which happens to be
t1)

#10Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Rick Otten (#9)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

"Rick" == Rick Otten <rotten@windfish.net> writes:

Rick> Is t1 also the tablespace for your materialized view? (It isn't
Rick> clear from what you've posted below.)

No. The matview was created in pg_default.

Rick> In my environment my default tablespace for the database is
Rick> different than the temp_tablespace.

Possibly relevant here is that to put a tablespace into
temp_tablespaces, you must have CREATE permission on it. If you set the
value in postgresql.conf rather than with SET, you may not see errors
from lack of permission, instead tablespaces without permission will be
silently omitted.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Rick Otten
rotten@windfish.net
In reply to: Andrew Gierth (#10)
Re: BUG #14290: materialized view refresh doesn't use temp_tablespace

Aha. Maybe that is my problem. The refresh is silently failing to use
TEMP_TABLESPACES.

When I explicitly do the grant create, the refresh then uses the temp
tablespace as originally expected.

Thanks.

On 2016-08-24 16:14, Andrew Gierth wrote:

"Rick" == Rick Otten <rotten@windfish.net> writes:

Rick> Is t1 also the tablespace for your materialized view? (It isn't
Rick> clear from what you've posted below.)

No. The matview was created in pg_default.

Rick> In my environment my default tablespace for the database is
Rick> different than the temp_tablespace.

Possibly relevant here is that to put a tablespace into
temp_tablespaces, you must have CREATE permission on it. If you set the
value in postgresql.conf rather than with SET, you may not see errors
from lack of permission, instead tablespaces without permission will be
silently omitted.