handle tablespaces for partitioned tables during ALTER DATABASE
Summary:
Normally, you are not allowed to `ALTER DATABASE ... SET TABLESPACE` when the
target tablespace is in use by a table in the database. However, this check
doesn't seem to apply to partitioned tables, and that can lead to unexpected
behavior.
PostgreSQL version:
Compiled by source with git checked out at `refs/tags/REL_12_2`.
```sql
SELECT version();
```
```
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)
```
OS: CentOS 7
Repro:
1. Ensure that databases `d` and `e` are not in use.
1. Ensure that tablespaces `fast` and `faster` are not in use.
1. Ensure that directories `/data/fast` and `/data/faster` for tablespaces are
empty and have correct permissions
1. On one shell, run
```sh
watch psql -c "\
SELECT relname, reltablespace\
FROM pg_class\
WHERE relname LIKE 'scores_%';\
" -d d
```
1. On a second shell in `psql`, run the following commands at your leisure,
observing the `watch` in the first shell:
```sql
CREATE DATABASE d;
CREATE DATABASE e;
\c d
CREATE TABLE scores
(rank int, age int, score int)
PARTITION BY RANGE (rank);
CREATE TABLE scores_rank_0_to_100
PARTITION OF scores FOR VALUES FROM (0) TO (100)
PARTITION BY RANGE (age);
CREATE TABLE scores_rank_100_to_200
PARTITION OF scores FOR VALUES FROM (100) TO (200)
PARTITION BY RANGE (age);
-- I want rank 0 to 100 lookups to be fast.
CREATE TABLESPACE fast LOCATION '/data/fast';
ALTER TABLE scores_rank_0_to_100
SET TABLESPACE fast;
-- Actually, I want the fast SSD to be default.
\c e
ALTER DATABASE d
TABLESPACE fast;
\c d
-- Let's load in data.
CREATE TABLE scores_rank_0_to_100_age_20s
PARTITION OF scores_rank_0_to_100 FOR VALUES FROM (20) TO (30);
INSERT INTO scores_rank_0_to_100_age_20s
VALUES (0, 22, 64819);
-- INSERT ...
-- I need a new age category now.
CREATE TABLE scores_rank_0_to_100_age_30s
PARTITION OF scores_rank_0_to_100 FOR VALUES FROM (30) TO (40);
INSERT INTO scores_rank_0_to_100_age_30s
VALUES (51, 30, 9564);
-- ...
-- I want to use a faster disk.
CREATE TABLESPACE faster LOCATION '/data/faster';
\c e
ALTER DATABASE d
TABLESPACE faster;
\c d
-- (I verify that data has moved to /data/faster.)
-- ...
-- I need a new age category now.
CREATE TABLE scores_rank_0_to_100_age_40s
PARTITION OF scores_rank_0_to_100 FOR VALUES FROM (40) TO (50);
INSERT INTO scores_rank_0_to_100_age_40s
VALUES (89, 41, 654);
-- ...
-- How come my data for the new table is going to the old location
-- /data/fast?
```
The `watch` `SELECT` should end up producing output
```
relname | reltablespace
------------------------------+---------------
scores_rank_0_to_100 | 16512
scores_rank_0_to_100_age_20s | 0
scores_rank_0_to_100_age_30s | 0
scores_rank_0_to_100_age_40s | 16512
scores_rank_100_to_200 | 0
(5 rows)
```
after all the commands are run. I believe that the bug begins on the first
`ALTER DATABASE`. If `scores_rank_0_to_100` were a physical table (i.e. had a
`relfilenode`), the `ALTER DATABASE` would have caught that and errored out,
and that would have been good. With my plausible set of steps, I demonstrate
that the `ALTER DATABASE ... SET TABLESPACE` on non-physical tables (e.g.
partitioned tables) should go through the tablespace check as well.
Thanks,
Jason
Hi,
On 2020-03-23 23:26:08 -0700, postgres@jasonk.me wrote:
Normally, you are not allowed to `ALTER DATABASE ... SET TABLESPACE` when the
target tablespace is in use by a table in the database. However, this check
doesn't seem to apply to partitioned tables, and that can lead to unexpected
behavior.
I don't immediately see what problem this could cause? There's no
physical file associated with a partitioned table, it's "just" a
template for new partitions of that partitioned table.
Greetings,
Andres Freund
(Forgot to CC pgsql-bugs.)
Jason
----- Forwarded message from postgres@jasonk.me -----
Date: Tue, 24 Mar 2020 14:24:33 -0700
From: postgres@jasonk.me
To: Andres Freund <andres@anarazel.de>
Subject: Re: handle tablespaces for partitioned tables during ALTER DATABASE
Message-ID: <20200324212433.3knuzvcpc3idjm2b@jasonk.me>
References: <20200324062608.dstxvn7zmnpmplxr@jasonk.me>
<20200324192359.yw7q5rlfqqxoktl6@alap3.anarazel.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200324192359.yw7q5rlfqqxoktl6@alap3.anarazel.de>
Hi, Andres.
On 2020-03-24T12:23:59-0700, Andres Freund wrote:
I don't immediately see what problem this could cause? There's no
physical file associated with a partitioned table, it's "just" a
template for new partitions of that partitioned table.
I had illustrated a plausible scenario where this could be _unexpected_. The
state that you can get into after the first `ALTER DATABASE` is strange because
the tablespace of `scores_rank_0_to_100` is not `InvalidOid` yet equal to the
database's tablespace.
You can't get into this state normally: try
```sql
CREATE DATABASE f TABLESPACE fast;
\c f
CREATE TABLE t (i int, j int) PARTITION BY RANGE (i);
CREATE TABLE t0 PARTITION OF t FOR VALUES FROM (0) TO (100) TABLESPACE fast;
SELECT relname, reltablespace FROM pg_class WHERE relname LIKE 't0';
```
```
relname | reltablespace
---------+---------------
t0 | 0
(1 row)
```
Notice that it's `InvalidOid`, not the explicit database's tablespace. But you
can get into that state with some work, like I illustrated in my example in the
first message.
For less confusion, I think it makes sense to prohibit this or seamlessly
convert the appropriate `reltablespace` values to `InvalidOid`. Enforcing this
may also give me less things to worry about for my own work.
Jason
----- End forwarded message -----
Import Notes
Reply to msg id not found: 20200324212433.3knuzvcpc3idjm2b@jasonk.me