tablespaces for temporary files

Started by Neil Conwayabout 21 years ago8 messages
#1Neil Conway
neilc@samurai.com

I'd like to provide a way for DBAs to specify that the temporary files
needed to for sorting, holdable cursors and similar operations should be
created in a particular tablespace. (Right now these files are created
in the tablespace associated with the current database.)

Two ways to do this come to mind: via a GUC variable, or by setting a
property of CREATE DATABASE (that could be altered via ALTER DATABASE).
I think using a GUC variable is probably the better bet: it is more
flexible, since ALTER DATABASE ... SET and ALTER USER ... SET can be
used to define the GUC variable automatically for particular users and
databases.

So I'd like to add a GUC variable called something like
"scratch_tablespace". If undefined (the default), temporary files for
sorting/etc. will be created in the current database's tablespace. If
set to the name of an existent tablespace, that tablespace will be used
for temporary storage. If set to a nonexistent tablespace, a warning
will be printed and we'll fallback to using the current database's
tablespace.

Regarding naming, I considered calling the GUC variable
"temporary_tablespace" or something similar, but it seems to me that
this might cause confusion with temporary tables -- furthermore, it
might be an interesting feature to define a "temporary table tablespace"
in the future, leading to yet more confusion. I think "scratch
tablespace" is a pretty decent name for this concept, but I'm open to
suggestions.

Comments?

-Neil

#2Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Neil Conway (#1)
Re: tablespaces for temporary files

So I'd like to add a GUC variable called something like
"scratch_tablespace". If undefined (the default), temporary files for

Should be called 'work_tablesapce' to match 'work_mem' :)

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#1)
Re: tablespaces for temporary files

Neil Conway <neilc@samurai.com> writes:

So I'd like to add a GUC variable called something like
"scratch_tablespace". If undefined (the default), temporary files for
sorting/etc. will be created in the current database's tablespace.

(1) What are the protection requirements for this variable?

(2) I don't think that "undefined" is a particularly good concept for
GUC variables. Particularly not ones that you are envisioning setting
from multiple places.

(3) I don't like the idea that a catalog lookup will be necessary before
we can create or access temp files. It would be quite unacceptable from
a modularity standpoint to have the low-level routines that currently
determine temp file paths do catalog accesses.

On the whole I'm unconvinced that this is worth the trouble. One of the
reasons for allowing people to move databases around is to determine
where their temp files go. Also, it's always been possible for people
to change the pgsql_tmp subdirectory into a symlink. While I know that
that isn't particularly DBA-friendly, it seems sufficient to me for what
I suspect is a third-order requirement.

Let's at least wait till we get some demand from the field before we
start inventing frammishes for tablespaces.

regards, tom lane

#4Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#3)
Re: tablespaces for temporary files

On Sat, 2004-10-30 at 00:50, Tom Lane wrote:

(1) What are the protection requirements for this variable?

I think it can be USERSET -- most commands let the user specify a
tablespace explicitly, and this is basically just another way of doing
that. The user executing the query will need CREATE privileges on the
tablespace they end up writing to.

(2) I don't think that "undefined" is a particularly good concept for
GUC variables. Particularly not ones that you are envisioning setting
from multiple places.

Hmm, ok. How about a token like "$database" that expands to the
tablespace of the current database?

(3) I don't like the idea that a catalog lookup will be necessary before
we can create or access temp files. It would be quite unacceptable from
a modularity standpoint to have the low-level routines that currently
determine temp file paths do catalog accesses.

I don't agree it is "unacceptable", but it isn't ideal, granted.

On the whole I'm unconvinced that this is worth the trouble. One of the
reasons for allowing people to move databases around is to determine
where their temp files go.

I think this needlessly limits the flexibility of the system. Once
you've created a database and added a bunch of tables to it (in the DB's
tablespace), is there an easy way to change the tablespace used for
temporary files? What if the DBA has placed the database in a relatively
slow tablespace because that is suitable most of the time, but needs to
quickly execute a large OLAP query that consumes a lot of temporary
space? What if it makes sense at a particular installation for different
users to use different tablespaces for their temporary files?

I just think that always using the database's tablespace for temporary
files needlessly conflates two distinct concepts.

Also, it's always been possible for people
to change the pgsql_tmp subdirectory into a symlink.

This is a pain for the DBA, as you mention; it requires shutting down
the database; and it is fragile to begin with because the "pgsql_tmp"
directory is created on demand.

-Neil

#5Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#3)
Re: tablespaces for temporary files

Tom Lane <tgl@sss.pgh.pa.us> writes:

On the whole I'm unconvinced that this is worth the trouble. One of the
reasons for allowing people to move databases around is to determine
where their temp files go.

The one scenario I would expect to see is having the temp files on filesystem
all to themselves separate from the database. So using the database's location
seems like it would never really satisfy that need.

Actually the sort algorithm postgres uses would be much more efficient if it
could get access to two or three locations guaranteed to be on different
spindles. Last I read the comments it talked about a three tape polyphase sort
emulated on a single tape. It's a _lot_ less efficient emulated on a single
tape than it would be on three separate tapes. And for large sorts drive
really do behave like tapes.

Personally I am inclined to think that sorting and hash table spills really
belong in a location specified completely separate from tablespaces.

Others may be thinking of this more in terms of enforcing resource quotas in
which case the current regime makes more sense. But from a performance point
of view the current system is pointless.

--
greg

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Greg Stark (#5)
Re: tablespaces for temporary files

Greg Stark wrote:

Tom Lane <tgl@sss.pgh.pa.us> writes:

On the whole I'm unconvinced that this is worth the trouble. One of the
reasons for allowing people to move databases around is to determine
where their temp files go.

The one scenario I would expect to see is having the temp files on filesystem
all to themselves separate from the database. So using the database's location
seems like it would never really satisfy that need.

Actually the sort algorithm postgres uses would be much more efficient if it
could get access to two or three locations guaranteed to be on different
spindles. Last I read the comments it talked about a three tape polyphase sort
emulated on a single tape. It's a _lot_ less efficient emulated on a single
tape than it would be on three separate tapes. And for large sorts drive
really do behave like tapes.

Personally I am inclined to think that sorting and hash table spills really
belong in a location specified completely separate from tablespaces.

Others may be thinking of this more in terms of enforcing resource quotas in
which case the current regime makes more sense. But from a performance point
of view the current system is pointless.

Agreed, and I was going to mention the idea of a round-robin allocation
setup where the system cycles through a list of possible locations for
both sort files and temporary tables. One trick is that the cycle
pointer has to be global controlled so once one session uses an area the
next session uses the next location.

Updated TODO item:

* Add a GUC variable to control the tablespace for temporary objects and
sort files

This perhaps should use a round-robin allocation system where several
tablespaces are used in a cycle. The cycle pointer should be global.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: tablespaces for temporary files

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Greg Stark wrote:

Actually the sort algorithm postgres uses would be much more efficient if it
could get access to two or three locations guaranteed to be on different
spindles.

Agreed, and I was going to mention the idea of a round-robin allocation
setup where the system cycles through a list of possible locations for
both sort files and temporary tables.

Greg's point was that sort would want to *know* that it created three
temp files on three different devices. Throwing random effects of other
sessions into the mix wouldn't make it better.

regards, tom lane

#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#7)
Re: tablespaces for temporary files

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Greg Stark wrote:

Actually the sort algorithm postgres uses would be much more efficient if it
could get access to two or three locations guaranteed to be on different
spindles.

Agreed, and I was going to mention the idea of a round-robin allocation
setup where the system cycles through a list of possible locations for
both sort files and temporary tables.

Greg's point was that sort would want to *know* that it created three
temp files on three different devices. Throwing random effects of other
sessions into the mix wouldn't make it better.

OK, let's say the sort starts on a random tablespace and then goes
sequentially through the list:

* Add a GUC variable to control the tablespace for temporary objects and
sort files

It could start with a random tablespace from a supplied list and cycle
through the list.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073