tablespaces and non-empty directories

Started by Philip Yarraabout 20 years ago11 messages
#1Philip Yarra
philip@utiba.com

I assume CREATE TABLESPACE refuses to use a non-empty directory because of the
risk of trashing existing files. Makes sense, but consider the following:

# mkfs -t ext2 /dev/sdc1
# mount -t ext2 /dev/sdc1 /mnt/pg_tables
# chown postgres /mnt/pg_tables
# su -c psql pyarra
pyarra=# CREATE TABLESPACE spc_tables LOCATION '/mnt/pg_tables/';
ERROR: directory "/mnt/pg_tables" is not empty

This is because lost+found exists. Since lost+found would be a reasonably
common directory to find at a mount-point on Unix-like OSs*, would it make
sense for CREATE TABLESPACE to ignore it if present?

Of course this isn't hard to get around:
# mkdir /mnt/pg_tables/data
# chown postgres /mnt/pg_tables/data
CREATE TABLESPACE spc_tables LOCATION '/mnt/pg_tables/data/';

If consensus is that it is a bad idea to treat lost+found as a special case,
would it be worth putting an explicit mention in the doco about the preferred
way to set up a database with multiple disks?

Related question: are there plans afoot to allow specifying an alternate
location for pg_xlog (or pg_delete-me-not) to save doing the shutdown-DB, mv
directory to other disk, symlink, start-DB dance?

Regards, Philip.

* Solaris 9 and Linux both use lost+found, Tru64 v4.0f does not seem to (and
has extra guff for quota management too). I doubt we could cater to every
possible Unix OS and the administrative files it creates at mount points,
however since lost+found is so common, if it's there, we could ignore it.

--

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

-----------------
Utiba Pty Ltd
This message has been scanned for viruses and
dangerous content by Utiba mail server and is
believed to be clean.

#2Gavin Sherry
swm@linuxworld.com.au
In reply to: Philip Yarra (#1)
Re: tablespaces and non-empty directories

On Thu, 17 Nov 2005, Philip Yarra wrote:

I assume CREATE TABLESPACE refuses to use a non-empty directory because of the
risk of trashing existing files. Makes sense, but consider the following:

Right, that was the reasoning.

# mkfs -t ext2 /dev/sdc1
# mount -t ext2 /dev/sdc1 /mnt/pg_tables
# chown postgres /mnt/pg_tables
# su -c psql pyarra
pyarra=# CREATE TABLESPACE spc_tables LOCATION '/mnt/pg_tables/';
ERROR: directory "/mnt/pg_tables" is not empty

This is because lost+found exists. Since lost+found would be a reasonably
common directory to find at a mount-point on Unix-like OSs*, would it make
sense for CREATE TABLESPACE to ignore it if present?

This came up when tablespaces were being developed.

Of course this isn't hard to get around:
# mkdir /mnt/pg_tables/data
# chown postgres /mnt/pg_tables/data
CREATE TABLESPACE spc_tables LOCATION '/mnt/pg_tables/data/';

Right. We decided that this was easy for admins to do and also makes
things a little clearer: if /mnt/pg_tables was the data directory, you'd
have something like:

lost+found 1234132 12223132 [etc]

It might not be immediately obvious what the numeric named directories are
for.

If consensus is that it is a bad idea to treat lost+found as a special case,
would it be worth putting an explicit mention in the doco about the preferred
way to set up a database with multiple disks?

Sounds like a good idea.

Related question: are there plans afoot to allow specifying an alternate
location for pg_xlog (or pg_delete-me-not) to save doing the shutdown-DB, mv
directory to other disk, symlink, start-DB dance?

People have discussed it but I don't know of anyone working on it.

Gavin

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Yarra (#1)
Re: tablespaces and non-empty directories

Philip Yarra <philip@utiba.com> writes:

This is because lost+found exists. Since lost+found would be a reasonably
common directory to find at a mount-point on Unix-like OSs*, would it make
sense for CREATE TABLESPACE to ignore it if present?

No. There is no reason to use a volume's root directory as a
tablespace; especially so since the root directory ought to be owned
by root and so you'd have a permissions problem anyhow. Make a
subdirectory.

regards, tom lane

#4Zeugswetter Andreas DCP SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#3)
Re: tablespaces and non-empty directories

This is because lost+found exists. Since lost+found would be a
reasonably common directory to find at a mount-point on Unix-like
OSs*, would it make sense for CREATE TABLESPACE to ignore it if

present?

No. There is no reason to use a volume's root directory as a

tablespace;

especially so since the root directory ought to be owned by root

That is not so on AIX. Only the moint point (the dir in the parent) is
root.
Once mounted it can have (and preserves) any permission you want.
But on AIX the workaround is to remove the directory after mounting and
before
creating the tablespace.

Andreas

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas DCP SD (#4)
Re: tablespaces and non-empty directories

"Zeugswetter Andreas DCP SD" <ZeugswetterA@spardat.at> writes:

No. There is no reason to use a volume's root directory as a
tablespace;
especially so since the root directory ought to be owned by root

That is not so on AIX. Only the moint point (the dir in the parent) is
root.
Once mounted it can have (and preserves) any permission you want.

Yeah, you *can* make it not-root-owned on most Unixen. That doesn't
mean it's a good idea to do so. For instance, if the root directory
is owned by Joe Luser, what's to stop him from blowing away lost+found
and thereby screwing up future fscks? You should basically never have
more-privileged objects (such as lost+found) inside directories owned by
less-privileged users --- it's just asking for trouble.

regards, tom lane

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Gavin Sherry (#2)
Re: tablespaces and non-empty directories

Gavin Sherry wrote:

Related question: are there plans afoot to allow specifying an alternate
location for pg_xlog (or pg_delete-me-not) to save doing the shutdown-DB, mv
directory to other disk, symlink, start-DB dance?

People have discussed it but I don't know of anyone working on it.

TODO has:

* Allow the pg_xlog directory location to be specified during initdb
with a symlink back to the /data location

I think the only reason it is not done yet is because it is so easy to
do for admins, and it is impossible to do while the server is running.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Jim C. Nasby
jnasby@pervasive.com
In reply to: Bruce Momjian (#6)
Re: tablespaces and non-empty directories

On Tue, Nov 22, 2005 at 01:38:34PM -0500, Bruce Momjian wrote:

Gavin Sherry wrote:

Related question: are there plans afoot to allow specifying an alternate
location for pg_xlog (or pg_delete-me-not) to save doing the shutdown-DB, mv
directory to other disk, symlink, start-DB dance?

People have discussed it but I don't know of anyone working on it.

TODO has:

* Allow the pg_xlog directory location to be specified during initdb
with a symlink back to the /data location

I think the only reason it is not done yet is because it is so easy to
do for admins, and it is impossible to do while the server is running.

Along those lines, is there anything else that would benefit from being
moved? pg_clog and pg_subtrans come to mind; but maybe pg_multixact and
pg_twophase are candidates as well?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#8Gavin Sherry
swm@linuxworld.com.au
In reply to: Jim C. Nasby (#7)
Re: tablespaces and non-empty directories

On Tue, 22 Nov 2005, Jim C. Nasby wrote:

On Tue, Nov 22, 2005 at 01:38:34PM -0500, Bruce Momjian wrote:

Gavin Sherry wrote:

Related question: are there plans afoot to allow specifying an alternate
location for pg_xlog (or pg_delete-me-not) to save doing the shutdown-DB, mv
directory to other disk, symlink, start-DB dance?

People have discussed it but I don't know of anyone working on it.

TODO has:

* Allow the pg_xlog directory location to be specified during initdb
with a symlink back to the /data location

I think the only reason it is not done yet is because it is so easy to
do for admins, and it is impossible to do while the server is running.

Along those lines, is there anything else that would benefit from being
moved? pg_clog and pg_subtrans come to mind; but maybe pg_multixact and
pg_twophase are candidates as well?

pgsql_tmp

Gavin

#9Alvaro Herrera
alvherre@commandprompt.com
In reply to: Jim C. Nasby (#7)
Re: tablespaces and non-empty directories

Jim C. Nasby wrote:

On Tue, Nov 22, 2005 at 01:38:34PM -0500, Bruce Momjian wrote:

* Allow the pg_xlog directory location to be specified during initdb
with a symlink back to the /data location

I think the only reason it is not done yet is because it is so easy to
do for admins, and it is impossible to do while the server is running.

Along those lines, is there anything else that would benefit from being
moved? pg_clog and pg_subtrans come to mind; but maybe pg_multixact and
pg_twophase are candidates as well?

Hmm, I doubt moving any of the SLRU files (clog, subtrans, multixact)
will have much of an impact. If there's too much I/O on those, a better
solution would be to increase the number of buffers allocated to them.
Currently we use 8 for all of them which is probably not appropiate for
everyone.

Not sure about pg_twophase, but I doubt it's used on a too much
performance critical path (after all, there an awful lot of other work
to do to "prepare" a transaction.)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Philip Yarra
philip@utiba.com
In reply to: Gavin Sherry (#8)
Re: tablespaces and non-empty directories

On Wed, 23 Nov 2005 11:23 am, Gavin Sherry wrote:

Along those lines, is there anything else that would benefit from being
moved? pg_clog and pg_subtrans come to mind; but maybe pg_multixact and
pg_twophase are candidates as well?

pgsql_tmp

Does anyone have any recommendations about which of these would contend with
each other for disk IO? I'm looking to put together a doco addition about
multi-disk setup, so far I have something like:

/mnt/pg_base
/mnt/pg_xlog
/mnt/pg_tab1
/mnt/pg_idx1

...but is there significant gain in moving other bits from pg_base to a
different spindle? If so, what can be safely combined, and what would
definitely cause contention?

I know that the answer would vary for different types of DB activity, but any
"rough guides" would be a handy place to start.

Regards, Philip.
--

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

-----------------
Utiba Pty Ltd
This message has been scanned for viruses and
dangerous content by Utiba mail server and is
believed to be clean.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#9)
Re: tablespaces and non-empty directories

Alvaro Herrera <alvherre@commandprompt.com> writes:

Jim C. Nasby wrote:

Along those lines, is there anything else that would benefit from being
moved? pg_clog and pg_subtrans come to mind; but maybe pg_multixact and
pg_twophase are candidates as well?

Hmm, I doubt moving any of the SLRU files (clog, subtrans, multixact)
will have much of an impact.

Certainly pushing them onto the WAL spindle would be a serious misstep.
There is a good case for giving WAL its own dedicated disk --- there is
no case that I've seen for giving any of these their own disk.

If there's too much I/O on those, a better
solution would be to increase the number of buffers allocated to them.
Currently we use 8 for all of them which is probably not appropiate for
everyone.

I've just been looking at a test case provided by Rob Creager that
causes some pretty severe contention on SubtransControlLock. There
are a number of possible answers to this, but increasing the number of
pg_subtrans buffers is definitely one of them. I think it's probably
time we got rid of the assumption that all the uses of slru.c should
have the same number of buffers ...

regards, tom lane