partitioned indexes and tablespaces
Hello
A customer reported to us that partitioned indexes are not working
consistently with tablespaces:
1. When a CREATE INDEX specifies a tablespace, existing partitions get
the index in the correct tablespace; however, the parent index itself
does not record the tablespace. So when new partitions are created
later, they get the index in the default tablespace instead of the
specified tablespace. Fix by saving the tablespace in the pg_class row
for the parent index.
2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
raise an error indicating that it's not the correct relation kind. In
order for this to actually work, we need bespoke code for ATExecCmd();
the code for all other relation kinds wants to move storage (and runs in
Phase 3, later), but these indexes do not have that. Therefore, write a
cut-down version which is invoked directly in ATExecCmd instead.
3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
the above change.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Fix-tablespace-handling-for-partitioned-indexes.patchtext/x-diff; charset=us-asciiDownload+95-6
On Thu, Nov 01, 2018 at 09:31:38PM -0300, Alvaro Herrera wrote:
A customer reported to us that partitioned indexes are not working
consistently with tablespaces:
Let's see...
1. When a CREATE INDEX specifies a tablespace, existing partitions get
the index in the correct tablespace; however, the parent index itself
does not record the tablespace. So when new partitions are created
later, they get the index in the default tablespace instead of the
specified tablespace. Fix by saving the tablespace in the pg_class row
for the parent index.
I may be missing something of course... But partitioned tables don't
register the tablespace they are on either so as it cannot be used by
any partitions created on it:
=# create tablespace popo location '/home/ioltas/data/tbspace';
CREATE TABLESPACE
=# create table aa (a int) partition by list (a) tablespace popo;
CREATE TABLE
=# create table aa_1 partition of aa for values in (1) tablespace popo;
CREATE TABLE
=# create table aa_2 partition of aa for values in (2);
CREATE TABLE
=# select t.spcname, c.relname from pg_class c, pg_tablespace t
where c.oid > 16000 and c.reltablespace = t.oid;
spcname | relname
---------+---------
popo | aa_1
(1 row)
It seems to me that the current behavior is wanted in this case, because
partitioned tables and partitioned indexes have no physical storage.
2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
raise an error indicating that it's not the correct relation kind. In
order for this to actually work, we need bespoke code for ATExecCmd();
the code for all other relation kinds wants to move storage (and runs in
Phase 3, later), but these indexes do not have that. Therefore, write a
cut-down version which is invoked directly in ATExecCmd instead.
Using the previous example, this does not raise an error:
alter table aa set tablespace popo;
However the reference to reltablespace in pg_class is not changed. So I
would agree with your point to not raise an error and back-patch that,
but I don't agree with the point of changing reltablespace for a
partitioned index if that's what you mean.
3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
the above change.
Reproducible with just the following stuff on top of the previous
example:
create index aai on aa(a);
alter index all in tablespace pg_default set tablespace popo;
In this case also raising an error is a bug, it seems to me that
partitioned indexes should just be ignored.
Could you add an entry in the next CF to not lose track of what is
discussed here?
--
Michael
Hi,
On 2018/11/02 10:27, Michael Paquier wrote:
On Thu, Nov 01, 2018 at 09:31:38PM -0300, Alvaro Herrera wrote:
A customer reported to us that partitioned indexes are not working
consistently with tablespaces:Let's see...
1. When a CREATE INDEX specifies a tablespace, existing partitions get
the index in the correct tablespace; however, the parent index itself
does not record the tablespace. So when new partitions are created
later, they get the index in the default tablespace instead of the
specified tablespace. Fix by saving the tablespace in the pg_class row
for the parent index.I may be missing something of course... But partitioned tables don't
register the tablespace they are on either so as it cannot be used by
any partitions created on it:
=# create tablespace popo location '/home/ioltas/data/tbspace';
CREATE TABLESPACE
=# create table aa (a int) partition by list (a) tablespace popo;
CREATE TABLE
=# create table aa_1 partition of aa for values in (1) tablespace popo;
CREATE TABLE
=# create table aa_2 partition of aa for values in (2);
CREATE TABLE
=# select t.spcname, c.relname from pg_class c, pg_tablespace t
where c.oid > 16000 and c.reltablespace = t.oid;
spcname | relname
---------+---------
popo | aa_1
(1 row)It seems to me that the current behavior is wanted in this case, because
partitioned tables and partitioned indexes have no physical storage.
Keith Fiske complained about this behavior for partitioned *tables* a few
months ago, which led to the following discussion:
/messages/by-id/CAKJS1f9PXYcT+j=oyL-Lquz=ScNwpRtmD7u9svLASUygBdbN8w@mail.gmail.com
It's Michael's message that was the last one on that thread. :)
/messages/by-id/20180413224007.GB27295@paquier.xyz
By the way, if we decide to do something about this, I think we do the
same for partitioned tables. There are more than one interesting
behaviors possible that are mentioned in the above thread for when
parent's reltablespace is set/changed. For example, when it's set, the
existing partitions are not moved, but the new value only applies to
subsequently created partitions. Considering various such behaviors, this
would seem like new feature work, which I'd suppose would only be
considered for 12.
2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
raise an error indicating that it's not the correct relation kind. In
order for this to actually work, we need bespoke code for ATExecCmd();
the code for all other relation kinds wants to move storage (and runs in
Phase 3, later), but these indexes do not have that. Therefore, write a
cut-down version which is invoked directly in ATExecCmd instead.Using the previous example, this does not raise an error:
alter table aa set tablespace popo;
However the reference to reltablespace in pg_class is not changed. So I
would agree with your point to not raise an error and back-patch that,
but I don't agree with the point of changing reltablespace for a
partitioned index if that's what you mean.3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
the above change.Reproducible with just the following stuff on top of the previous
example:
create index aai on aa(a);
alter index all in tablespace pg_default set tablespace popo;In this case also raising an error is a bug, it seems to me that
partitioned indexes should just be ignored.
Same argument here too.
IOW, I agree with Michael that if something will be back-patched to 11, it
should be a small patch to make the unsupported relkind error go away.
Thanks,
Amit
On 2018-Nov-02, Michael Paquier wrote:
On Thu, Nov 01, 2018 at 09:31:38PM -0300, Alvaro Herrera wrote:
1. When a CREATE INDEX specifies a tablespace, existing partitions get
the index in the correct tablespace; however, the parent index itself
does not record the tablespace. So when new partitions are created
later, they get the index in the default tablespace instead of the
specified tablespace. Fix by saving the tablespace in the pg_class row
for the parent index.I may be missing something of course... But partitioned tables don't
register the tablespace they are on either so as it cannot be used by
any partitions created on it:
This is not relevant to my case, IMO. Partitioned tables are explicitly
created each time, with their own parameters; if you want to specify the
tablespace in which it is created, you can do so at that point. This is
not the case with partitioned indexes, because they are created
automatically at CREATE TABLE PARTITION OF time, without an option to
specify where each index goes.
It seems to me that the current behavior is wanted in this case, because
partitioned tables and partitioned indexes have no physical storage.
Well, I designed the partitioned indexes feature and I can say for
certain that this behavior was not explicitly designed in, but was just
a oversight.
2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
raise an error indicating that it's not the correct relation kind.Using the previous example, this does not raise an error:
alter table aa set tablespace popo;
However the reference to reltablespace in pg_class is not changed. So I
would agree with your point to not raise an error and back-patch that,
but I don't agree with the point of changing reltablespace for a
partitioned index if that's what you mean.
Same argument here. The pg_class record for the partitioned index
serves to guide the storage of indexes on future partitions, so it is
valuable to have it. Not recording the tablespace (and not allowing it
to be changed afterwards) is a usability fail.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Nov-02, Amit Langote wrote:
On 2018/11/02 10:27, Michael Paquier wrote:
It seems to me that the current behavior is wanted in this case, because
partitioned tables and partitioned indexes have no physical storage.Keith Fiske complained about this behavior for partitioned *tables* a few
months ago, which led to the following discussion:/messages/by-id/CAKJS1f9PXYcT+j=oyL-Lquz=ScNwpRtmD7u9svLASUygBdbN8w@mail.gmail.com
It's Michael's message that was the last one on that thread. :)
I agree with Fiske, FWIW. I think the current behavior results because
people (including me) overlooked things, not because it was designed
explicitly that way.
By the way, if we decide to do something about this, I think we do the
same for partitioned tables.
I'm up for changing the behavior of partitioned tables in pg12 (please
send a patch), but I'm up for changing the behavior of partitioned
tables in pg11.
There are more than one interesting
behaviors possible that are mentioned in the above thread for when
parent's reltablespace is set/changed.
I'm *NOT* proposing to move existing partitions to another tablespace,
in any case.
IOW, I agree with Michael that if something will be back-patched to 11, it
should be a small patch to make the unsupported relkind error go away.
I don't.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 2, 2018 at 11:02 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
By the way, if we decide to do something about this, I think we do the
same for partitioned tables.I'm up for changing the behavior of partitioned tables in pg12 (please
send a patch), but I'm up for changing the behavior of partitioned
tables in pg11.
Uh, what?
I strongly object to inserting behavior changes into released branches
on the grounds that the behavior wasn't considered carefully enough
before feature freeze. If we accept that as a justification, then
anybody can claim that any behavior change should be back-patched at
any time as long as they were the author of the original patch. But
that's not a recipe for a stable product. There's got to be a point
at which we say, yeah, you know, this is maybe not the perfect set of
design decisions, but we're going to support the decisions we made for
the next 5 years. And maybe we'll make better ones in the next
release.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018-Nov-02, Robert Haas wrote:
I strongly object to inserting behavior changes into released branches
on the grounds that the behavior wasn't considered carefully enough
before feature freeze.
I'm not proposing to change any stable behavior. The thing I'm
proposing to change clearly cannot be used by anyone, because it just
throws errors.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Nov-02, Robert Haas wrote:
On Fri, Nov 2, 2018 at 11:02 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
By the way, if we decide to do something about this, I think we do the
same for partitioned tables.I'm up for changing the behavior of partitioned tables in pg12 (please
send a patch), but I'm up for changing the behavior of partitioned
tables in pg11.Uh, what?
Sorry, I just realized the typo in the above. The behavior I propose to
change in pg11 is that of partitioned *indexes*, not tables.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 2, 2018 at 12:05 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2018-Nov-02, Robert Haas wrote:
I strongly object to inserting behavior changes into released branches
on the grounds that the behavior wasn't considered carefully enough
before feature freeze.I'm not proposing to change any stable behavior. The thing I'm
proposing to change clearly cannot be used by anyone, because it just
throws errors.
I don't get it. If the default tablespace for partition is set as for
a regular table, that is a usable behavior. If it is taken from the
parent table, that is a different and also usable behavior. Isn't
this part of what you are proposing to change?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018-Nov-02, Robert Haas wrote:
On Fri, Nov 2, 2018 at 12:05 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2018-Nov-02, Robert Haas wrote:
I strongly object to inserting behavior changes into released branches
on the grounds that the behavior wasn't considered carefully enough
before feature freeze.I'm not proposing to change any stable behavior. The thing I'm
proposing to change clearly cannot be used by anyone, because it just
throws errors.I don't get it. If the default tablespace for partition is set as for
a regular table, that is a usable behavior. If it is taken from the
parent table, that is a different and also usable behavior. Isn't
this part of what you are proposing to change?
In this thread I'm not proposing to change the behavior for tables, only
for indexes. If people want to change behavior for tables (and I agree
with doing so), they can start their own threads.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 02, 2018 at 03:53:51PM -0300, Alvaro Herrera wrote:
In this thread I'm not proposing to change the behavior for tables, only
for indexes. If people want to change behavior for tables (and I agree
with doing so), they can start their own threads.
Changing this behavior on back-branches is not a good idea I think
because that changes the representation of the pg_class entries in the
catalogs by adding the reltablespace reference for a partitioned index
which does not exist now. We are long past the time of doing such
changes on v11, but that can perfectly be done for v12.
Making the tablespace inherited from the parent if the parent has no
information on disk is sensible in my opinion, so please let's consider
that as a feature but not a bug, and also let's do the change for both
partitioned tables *and* partitioned indexes for consistency. The
current behavior does not prevent the features to work.
So what I would suggest is to fix the commands which are failing by not
ignoring partitioned indexes for v11, then raise a new thread which
implements the new tablespace behavior for all partitioned objects. Per
my recent lookups at partition-related features, making a maximum
consistency between how partitioned tables and partitioned indexes
behave is actually a good approach.
--
Michael
On 2018-Nov-03, Michael Paquier wrote:
On Fri, Nov 02, 2018 at 03:53:51PM -0300, Alvaro Herrera wrote:
In this thread I'm not proposing to change the behavior for tables, only
for indexes. If people want to change behavior for tables (and I agree
with doing so), they can start their own threads.Changing this behavior on back-branches is not a good idea I think
because that changes the representation of the pg_class entries in the
catalogs by adding the reltablespace reference for a partitioned index
which does not exist now. We are long past the time of doing such
changes on v11, but that can perfectly be done for v12.
With all due respect, this argument makes no sense. All partitioned
indexes that exist today have a null reltablespace (all pg_class rows
already have a reltablespace column; I'm not changing that). If users
want to keep the current behavior (i.e. that indexes on future
partitions are created in the default tablespace), all they have to do
is not send any ALTER INDEX changing the index's tablespace.
You're saying that people currently running Postgres 11 that are
doing "CREATE INDEX ... TABLESPACE foo" will be surprised because the
index ends up in tablespace foo for partitions they create afterwards.
Additionally, you're saying that all people currently doing
ALTER INDEX ... SET TABLESPACE foo
expect that
1. the parent partitioned index silently does not change at all
2. the indexes on future partitions end up in the default tablespace.
I cannot see how that's for real.
Furthermore, I think delaying this change to pg12 serves nobody, because
it just means that there will be a behavior difference for no reason.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/02/2018 07:12 PM, Alvaro Herrera wrote:
On 2018-Nov-03, Michael Paquier wrote:
On Fri, Nov 02, 2018 at 03:53:51PM -0300, Alvaro Herrera wrote:
In this thread I'm not proposing to change the behavior for tables, only
for indexes. If people want to change behavior for tables (and I agree
with doing so), they can start their own threads.Changing this behavior on back-branches is not a good idea I think
because that changes the representation of the pg_class entries in the
catalogs by adding the reltablespace reference for a partitioned index
which does not exist now. We are long past the time of doing such
changes on v11, but that can perfectly be done for v12.With all due respect, this argument makes no sense. All partitioned
indexes that exist today have a null reltablespace (all pg_class rows
already have a reltablespace column; I'm not changing that). If users
want to keep the current behavior (i.e. that indexes on future
partitions are created in the default tablespace), all they have to do
is not send any ALTER INDEX changing the index's tablespace.You're saying that people currently running Postgres 11 that are
doing "CREATE INDEX ... TABLESPACE foo" will be surprised because the
index ends up in tablespace foo for partitions they create afterwards.Additionally, you're saying that all people currently doing
ALTER INDEX ... SET TABLESPACE foo
expect that
1. the parent partitioned index silently does not change at all
2. the indexes on future partitions end up in the default tablespace.I cannot see how that's for real.
Furthermore, I think delaying this change to pg12 serves nobody, because
it just means that there will be a behavior difference for no reason.
+1. This is unquestionably a POLA violation that should be fixed, IMNSHO.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Nov-03, Andrew Dunstan wrote:
+1. This is unquestionably a POLA violation that should be fixed, IMNSHO.
Yeah, that's my view on it too.
Pushed.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 2, 2018 at 7:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
With all due respect, this argument makes no sense. All partitioned
indexes that exist today have a null reltablespace (all pg_class rows
already have a reltablespace column; I'm not changing that). If users
I hope not, because that column isn't nullable.
want to keep the current behavior (i.e. that indexes on future
partitions are created in the default tablespace), all they have to do
is not send any ALTER INDEX changing the index's tablespace.You're saying that people currently running Postgres 11 that are
doing "CREATE INDEX ... TABLESPACE foo" will be surprised because the
index ends up in tablespace foo for partitions they create afterwards.Additionally, you're saying that all people currently doing
ALTER INDEX ... SET TABLESPACE foo
expect that
1. the parent partitioned index silently does not change at all
2. the indexes on future partitions end up in the default tablespace.I cannot see how that's for real.
Furthermore, I think delaying this change to pg12 serves nobody, because
it just means that there will be a behavior difference for no reason.
Well, you've guaranteed that already. Now 11 will be different from
11.1, and tables will be different from indexes until somebody goes
and makes that consistent again.
I now wish I hadn't objected to changing the behavior in April. If
I'd know that the alternative was going to be to change it in
November, back-patched, two days before a minor release, with more
people voting against the change than for it, I would have kept my
mouth shut.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018-Nov-03, Robert Haas wrote:
Well, you've guaranteed that already. Now 11 will be different from
11.1, and tables will be different from indexes until somebody goes
and makes that consistent again.
Nobody is running 11.0 in production. The people using 11.0 are just
testing, and those that run into this behavior have already reported as
an inconvenience, not something to rely on.
I now wish I hadn't objected to changing the behavior in April. If
I'd know that the alternative was going to be to change it in
November, back-patched, two days before a minor release, with more
people voting against the change than for it, I would have kept my
mouth shut.
Well, you didn't object to changing index behavior in April. You
objected to a change related to tables. Nobody had noticed this
behavior for indexes.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-11-03 16:24:28 -0300, Alvaro Herrera wrote:
On 2018-Nov-03, Robert Haas wrote:
Well, you've guaranteed that already. Now 11 will be different from
11.1, and tables will be different from indexes until somebody goes
and makes that consistent again.Nobody is running 11.0 in production.
Uh, I know of people doing so.
Greetings,
Andres Freund
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Nov-03, Andrew Dunstan wrote:
+1. This is unquestionably a POLA violation that should be fixed, IMNSHO.
Yeah, that's my view on it too.
Pushed.
Hmm ... in the April thread, one of the main concerns that prevented hasty
action was fear of breaking dump/restore behavior. Have you checked that
with this change, a dump/restore will restore the same state (same
actual tablespace assignments) that existed in the source DB? How about
if the parent partitioned index's tablespace assignment has been changed
since a child index was made? What happens with the --no-tablespaces
option?
I think I'm okay with this change if the answers to all those questions
are sane, but I didn't see them discussed in this thread.
regards, tom lane
On 2018-Nov-03, Tom Lane wrote:
Hmm ... in the April thread, one of the main concerns that prevented hasty
action was fear of breaking dump/restore behavior. Have you checked that
with this change, a dump/restore will restore the same state (same
actual tablespace assignments) that existed in the source DB?
I just did, and it does. The tablespaces are changed with individual
"SET default_tablespace" lines whenever it changes between dumping two
indexes.
How about if the parent partitioned index's tablespace assignment has
been changed since a child index was made?
Each index is created independently, with the correct default
tablespace, and then they are all attached together to the parent index
using ALTER INDEX ATTTACH PARTITION. The tablespace assignments are
identical to the source database.
What happens with the --no-tablespaces option?
No "SET default_tablespace" lines are emitted in this case, so
everything ends up in the default tablespace, as expected.
I think I'm okay with this change if the answers to all those questions
are sane, but I didn't see them discussed in this thread.
I think they are.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Nov 03, 2018 at 12:30:15PM -0700, Andres Freund wrote:
On 2018-11-03 16:24:28 -0300, Alvaro Herrera wrote:
On 2018-Nov-03, Robert Haas wrote:
Well, you've guaranteed that already. Now 11 will be different from
11.1, and tables will be different from indexes until somebody goes
and makes that consistent again.Nobody is running 11.0 in production.
Uh, I know of people doing so.
My understanding of the term GA is that anything marked as GA is
considered enough stable for production use, and that beta releases are
here for testing. So I don't get the argument. And you have made
partitioned indexes now inconsistent with partitioned tables.
Seeing this stuff discussed and committed in haste gives me the sad
face.
:(
--
Michael