pg_upgrade: delete_old_cluster.sh issues

Started by Marc Maminabout 12 years ago3 messages
#1Marc Mamin
M.Mamin@intershop.de

Hello,

IMHO, there is a serious issue in the script to clean the old data directory
when running pg_upgrade in link mode.

in short: When working with symbolic links, the first step in delete_old_cluster.sh
is to delete the old $PGDATA folder that may contain tablespaces used by the new instance.

in long, our use case:

our postgres data directories are organized as follow:

1) they are all registered in a root location, i.e. /opt/data,
but can be located somewhere else using symbolic links:

ll /opt/app/
...
postgresql-data-1 -> /pgdata/postgresql-data-1

2) we have fixed names for root locations of tablespaces within $PGDATA.
these can be real folders or again symbolic links to some other places:

ll /pgdata/postgresql-data-1
...
tblspc_data
tblspc_idx -> /datarep/pg1/tblspc_idx

(additionally, each schema has its own tablespaces in these locations, but this is not relevant here)

3 ) we do have some custom content within $PGDATA. e.g. an extra log folder used by our deployment script

After running pg_upgrade, checking the tablespace location within the NEW instance:

ll pg_tblspc

16428 -> /opt/app/postgresql-data-1/tblspc_data/foo
16429 -> /opt/app/postgresql-data-1/tblspc_idx/foo

which, resolving the symbolic links is equivalent to:

/pgdata/postgresql-data-1/tblspc_data/foo (x)
/datarep/pg1/tblspc_idx/foo (y)

I called pg_upgrade using the true paths (no symbolic links):

./pg_upgrade \
--link\
--check\
--old-datadir "/pgdata/postgresql-data-1"\
--new-datadir "/pgdata/postgresql_93-data-1"

now, checking what the cleanup script would like to do:

cat delete_old_cluster.sh
#!/bin/sh

(a) rm -rf /pgdata/postgresql-data-1
(b) rm -rf /opt/app/postgresql-data-1/tblspc_data/foo/PG_9.1_201105231
(c) rm -rf /opt/app/postgresql-data-1/tblspc_err_data/foo/PG_9.1_201105231

a: will delete the folder (x) which contains data for the NEW Postgres instance !
b: already gone through (a)
c: still exists in /datarep/pg1/tblspc_idx/foo but can't be found
as the symbolic link in /pgdata/postgresql-data-1 is already deleted through (a)

moreover, our custom content in $OLD_PGATA would be gone too

It seems that these issues could all be avoided
while first removing the expected content of $OLD_PGATA
and then only unlink $OLD_PGATA itself when empty
(or add a note in the output of pg_restore):

replace

rm -rf /pgdata/postgresql-data-1

with

cd /pgdata/postgresql-data-1
rm -rf base
rm -rf global
rm -rf pg_clog
rm -rf pg_hba.conf (*)
rm -rf pg_ident.conf (*)
rm -rf pg_log
rm -rf pg_multixact
rm -rf pg_notify
rm -rf pg_serial
rm -rf pg_stat_tmp
rm -rf pg_subtrans
rm -rf pg_tblspc
rm -rf pg_twophase
rm -rf PG_VERSION (*)
rm -rf pg_xlog
rm -rf postgresql.conf (*)
rm -rf postmaster.log
rm -rf postmaster.opts (*)

(*): could be nice to keep as a reference.

best regards,

Marc Mamin

#2Bruce Momjian
bruce@momjian.us
In reply to: Marc Mamin (#1)
Re: pg_upgrade: delete_old_cluster.sh issues

On Tue, Nov 12, 2013 at 10:35:58AM +0000, Marc Mamin wrote:

Hello,

IMHO, there is a serious issue in the script to clean the old data directory
when running pg_upgrade in link mode.

in short: When working with symbolic links, the first step in
delete_old_cluster.sh
is to delete the old $PGDATA folder that may contain tablespaces used by the
new instance.

in long, our use case:

Rather than removing files/directories individually, which would be
difficult to maintain, we decided in pg_upgrade 9.3 to detect
tablespaces in the old data directory and report that and not create a
delete script. Here is the commit:

http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=4765dd79219b9697d84f5c2c70f3fe00455609a1

The problem with your setup is that while you didn't pass symbolic links
to pg_upgrade, you did use symbolic links when defining the tablespaces,
so pg_upgrade couldn't recognize that the symbolic links were inside the
old data directory.

We could use readlink() to go walk over all symbolic links, but that
seems quite complex. We could use stat() and make sure there are no
matching inodes in the old data directory, or that they are in a
different file system. We could look for a directory named after the PG
catversion in the old data directory. We could update the docs.

I am not sure what to do. We never expected people would put
tablespaces in the data directory.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#2)
1 attachment(s)
Re: pg_upgrade: delete_old_cluster.sh issues

On Mon, Nov 18, 2013 at 10:13:19PM -0500, Bruce Momjian wrote:

On Tue, Nov 12, 2013 at 10:35:58AM +0000, Marc Mamin wrote:

Hello,

IMHO, there is a serious issue in the script to clean the old data directory
when running pg_upgrade in link mode.

in short: When working with symbolic links, the first step in
delete_old_cluster.sh
is to delete the old $PGDATA folder that may contain tablespaces used by the
new instance.

in long, our use case:

Rather than removing files/directories individually, which would be
difficult to maintain, we decided in pg_upgrade 9.3 to detect
tablespaces in the old data directory and report that and not create a
delete script. Here is the commit:

http://git.postgresql.org/gitweb/?p=postgresql.git&amp;a=commitdiff&amp;h=4765dd79219b9697d84f5c2c70f3fe00455609a1

The problem with your setup is that while you didn't pass symbolic links
to pg_upgrade, you did use symbolic links when defining the tablespaces,
so pg_upgrade couldn't recognize that the symbolic links were inside the
old data directory.

We could use readlink() to go walk over all symbolic links, but that
seems quite complex. We could use stat() and make sure there are no
matching inodes in the old data directory, or that they are in a
different file system. We could look for a directory named after the PG
catversion in the old data directory. We could update the docs.

I am not sure what to do. We never expected people would put
tablespaces in the data directory.

I went with a doc patch, attached.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachments:

pg_upgrade.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index 4d03b12..72e3cb6
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*************** psql --username postgres --file script.s
*** 460,466 ****
       cluster's data directories by running the script mentioned when
       <command>pg_upgrade</command> completes. You can also delete the
       old installation directories
!      (e.g. <filename>bin</>, <filename>share</>).
      </para>
     </step>
  
--- 460,467 ----
       cluster's data directories by running the script mentioned when
       <command>pg_upgrade</command> completes. You can also delete the
       old installation directories
!      (e.g. <filename>bin</>, <filename>share</>).  This will not work
!      if you have tablespaces inside the old data directory.
      </para>
     </step>