CLUSTER on partitioned index

Started by Justin Pryzbyover 5 years ago37 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

Forking this thread, since the existing CFs have been closed.
/messages/by-id/20200914143102.GX18552@telsasoft.com

On Tue, Oct 06, 2020 at 01:38:23PM +0900, Michael Paquier wrote:

On Mon, Oct 05, 2020 at 10:07:33PM -0500, Justin Pryzby wrote:

Honestly, I think you're over-thinking and over-engineering indisclustered.

If "clusteredness" was something we offered to maintain across DML, I think
that might be important to provide stronger guarantees. As it is now, I don't
think this patch is worth changing the catalog definition.

Well, this use case is new because we are discussing the relationship
of indisclustered across multiple transactions for multiple indexes,
so I'd rather have this discussion than not, and I have learnt
the hard way with REINDEX that we should care a lot about the
consistency of partition trees at any step of the operation.

indisclustered is only used as a default for "CLUSTER" (without USING). The
worst thing that can happen if it's "inconsistent" is that "CLUSTER;" clusters
a table on the "old" clustered index (that it was already clustered on), which
is what would've happened before running some command which was interrupted.

Let's
imagine a simple example here, take this partition tree: p (parent),
and two partitions p1 and p2. p has two partitioned indexes i and j,
indexes also present in p1 and p2 as i1, i2, j1 and j2. Let's assume
that the user has done a CLUSTER on p USING i that completes, meaning
that i, i1 and i2 have indisclustered set. Now let's assume that the
user does a CLUSTER on p USING j this time, and that this command
fails while processing p2, meaning that indisclustered is set for j1,
i2, and perhaps i or j depending on what the patch does.

I think the state of "indisclustered" at that point is not critical.
The command failed, and the user can re-run it, or ALTER..SET CLUSTER.
Actually, I think the only inconsistent state is if two indexes are both marked
indisclustered.

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

Also, I noticed that CREATE TABLE (LIKE.. INCLUDING INDEXES) doesn't preserve
indisclustered, but I can't say that's an issue.

--
Justin

Attachments:

v1-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+209-53
v1-0002-preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload+15-2
v1-0003-Invalidate-indisclustered-when-attaching-uncluste.patchtext/x-diff; charset=us-asciiDownload+99-35
#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#1)
Re: CLUSTER on partitioned index

@cfbot: rebased

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

--
Justin

Attachments:

v3-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+208-52
v3-0002-preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload+15-2
v3-0003-Propogate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload+154-39
#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#2)
Re: CLUSTER on partitioned index

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

--
Justin

Attachments:

v4-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+208-52
v4-0002-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload+125-44
v4-0003-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload+55-1
v4-0004-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload+40-1
v4-0005-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload+17-2
#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#3)
Re: CLUSTER on partitioned index

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

--
Justin

Attachments:

v5-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+208-52
v5-0003-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload+125-44
v5-0004-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload+55-1
v5-0005-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload+40-1
v5-0006-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload+17-2
v5-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload+91-32
#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#4)
Re: CLUSTER on partitioned index

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This patch is intertwined with the tablespace patch: not only will it get
rebase conflict, but will also need to test the functionality of
CLUSTER (TABLESPACE a) partitioned_table;

--
Justin

Attachments:

v6-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload+82-21
v6-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+208-53
v6-0003-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload+125-44
v6-0004-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload+55-1
v6-0005-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload+40-1
v6-0006-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload+17-2
v6-0007-pg_dump-partitioned-index-depend-on-its-partition.patchtext/x-diff; charset=us-asciiDownload+11-4
#6Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#5)
Re: CLUSTER on partitioned index

On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

--
Justin

Attachments:

v7-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload+82-21
v7-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+209-54
v7-0003-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload+125-44
v7-0004-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload+55-1
v7-0005-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload+40-1
v7-0006-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload+17-2
v7-0007-pg_dump-partitioned-index-depend-on-its-partition.patchtext/x-diff; charset=us-asciiDownload+11-4
#7Zhihong Yu
zyu@yugabyte.com
In reply to: Justin Pryzby (#6)
Re: CLUSTER on partitioned index

Hi,
For v7-0002-Implement-CLUSTER-of-partitioned-table.patch:

+        * We have to build the list in a different memory context so it
will
+        * survive the cross-transaction processing
+        */
+       old_context = MemoryContextSwitchTo(cluster_context);

cluster_context is not modified within the loop. Can the memory context
switching code be moved outside the loop ?

Cheers

On Sat, Feb 6, 2021 at 6:46 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Show quoted text

On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which

preserves

indisclustered on children of clustered, partitioned indexes,

and invalidates

indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new

while we

discuss it.

This fixes some omissions in the previous patch and error in its

test cases.

CLUSTER ON recurses to children, since I think a clustered parent

index means

that all its child indexes are clustered. "SET WITHOUT CLUSTER"

doesn't have

to recurse to children, but I did it like that for consistency and

it avoids

the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered,

parent index,

since it's marked INVALID until indexes have been built on all child

tables, so

CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow

attaching

the child index (thereby making the parent "valid") to happen before

SET

CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

--
Justin

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#6)
Re: CLUSTER on partitioned index

On Sat, Feb 06, 2021 at 08:45:49AM -0600, Justin Pryzby wrote:

On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

It occured to me that progress reporting should expose this.

I did this in the style of pg_stat_progress_create_index, adding columns
partitions_total and partitions_done showing the overall progress. The progress
of individual partitions is also visible in {blocks,tuples}_{done,total}.
This seems odd, but that's how the index view behaves.

--
Justin

Attachments:

v8-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload+82-21
v8-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+209-54
v8-0003-f-progress-reporting.patchtext/x-diff; charset=us-asciiDownload+47-29
v8-0004-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload+125-44
v8-0005-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload+55-1
v8-0006-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload+40-1
v8-0007-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload+17-2
v8-0008-pg_dump-partitioned-index-depend-on-its-partition.patchtext/x-diff; charset=us-asciiDownload+11-4
#9Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#8)
Re: CLUSTER on partitioned index

On Wed, Feb 10, 2021 at 02:04:58PM -0600, Justin Pryzby wrote:

On Sat, Feb 06, 2021 at 08:45:49AM -0600, Justin Pryzby wrote:

On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

It occured to me that progress reporting should expose this.

I did this in the style of pg_stat_progress_create_index, adding columns
partitions_total and partitions_done showing the overall progress. The progress
of individual partitions is also visible in {blocks,tuples}_{done,total}.
This seems odd, but that's how the index view behaves.

Rebased on 8a8f4d8ede288c2a29105f4708e22ce7f3526149.

This also resolves an issue in the last patch which would've broken progress
reporting of vacuum full.

And take the suggestion to move memory context switching outside the loop.

--
Justin

Attachments:

v9-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload+82-21
v9-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+209-54
v9-0003-f-progress-reporting.patchtext/x-diff; charset=us-asciiDownload+67-26
v9-0004-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload+126-44
v9-0005-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload+55-1
v9-0006-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload+40-1
v9-0007-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload+18-2
v9-0008-pg_dump-partitioned-index-depend-on-its-partition.patchtext/x-diff; charset=us-asciiDownload+11-4
#10Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#9)
Re: CLUSTER on partitioned index

@cfbot: rebased

Attachments:

v10-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload+82-21
v10-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+209-54
v10-0003-f-progress-reporting.patchtext/x-diff; charset=us-asciiDownload+67-26
v10-0004-Propagate-changes-to-indisclustered-to-child-par.patchtext/x-diff; charset=us-asciiDownload+126-44
v10-0005-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload+55-1
v10-0006-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload+40-1
v10-0007-Preserve-indisclustered-on-children-of-clustered.patchtext/x-diff; charset=us-asciiDownload+18-2
v10-0008-pg_dump-partitioned-index-depend-on-its-partitio.patchtext/x-diff; charset=us-asciiDownload+11-4
#11Zhihong Yu
zyu@yugabyte.com
In reply to: Justin Pryzby (#10)
Re: CLUSTER on partitioned index

Hi,
For v10-0002-Implement-CLUSTER-of-partitioned-table.patch :

or that an partitioned index was previously set clustered.

'an partitioned index' -> a partitioned index

+ * Return a List of tables and associated index, where each index is a

associated index -> associated indices

For cluster():
-       rel = table_open(tableOid, NoLock);
+       rel = table_open(tableOid, ShareUpdateExclusiveLock);

Considering the comment preceding cluster() (forced to acquire exclusive
locks on all the tables), maybe add a comment explaining why it is safe to
take ShareUpdateExclusiveLock.

+cluster_multiple_rels(List *rvs, int options)

I think the multiple in the method name is not needed since the relation is
in plural.

Cheers

On Fri, Apr 2, 2021 at 1:03 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Show quoted text

@cfbot: rebased

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#10)
Re: CLUSTER on partitioned index

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time. What justifies
spending so much time on this implementation? My impression is that
CLUSTER is pretty much a fringe command nowadays, because of the access
exclusive lock required.

Does anybody actually use it?

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)

#13Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#12)
Re: CLUSTER on partitioned index

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time. What justifies
spending so much time on this implementation? My impression is that
CLUSTER is pretty much a fringe command nowadays, because of the access
exclusive lock required.

Does anybody actually use it?

Yeah, I am not getting really excited about doing anything here
either. I thought for some time about the interactions with
indisclustered and partitioned tables, but anything I could come up
with felt clunky.
--
Michael

#14Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Alvaro Herrera (#12)
Re: CLUSTER on partitioned index

On Tue, 2021-07-20 at 20:27 -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time. What justifies
spending so much time on this implementation? My impression is that
CLUSTER is pretty much a fringe command nowadays, because of the access
exclusive lock required.

Does anybody actually use it?

I see is used in the field occasionally, as it can really drastically
improve performance. But I admit is is not frequently used.

In a data warehouse, which is updated only occasionally, running
CLUSTER after an update can make a lot of sense.

I personally think that it is enough to be able to cluster the table
partiton by partition.

Yours,
Laurenz Albe

#15Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#12)
Re: CLUSTER on partitioned index

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time.

The cluster now is done one partition at a time, so it might take a long time,
but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and
(since v14) REINDEX.

The patch series would be simpler if partitioned indexes weren't allowed to be
marked CLUSTERED ON. Then, "USING <index>" would be required, which is a step
forward from not supporting cluster on partitioned index at all. As attached.
It's arguably true that the follow-up patches supporting indisclustered on
partitioned indexes aren't worth the trouble.

For sure CLUSTER is useful, see eg.
https://github.com/bucardo/check_postgres/issues/29

It's sometimes important that the table is clustered to allow index scan to
work well (or be chosen at all).

If a table is scanned by an index, and isn't well-clustered, then a larger
fraction (multiple) of the table will be read than what's optimal. That
requires more IO, and more cache space.

A year ago, I partitioned one of our previously-unpartitioned tables, and ended
up clustering the partitions on their partition key (and indexed column) using
\gexec. This was preferable to doing INSERT .. SELECT .. ORDER BY, which
would've made the initial process slower - maybe unjustifiably slower for some
customers. Cluster (using \gexec) was something I was able to do afterward,
for completeness, since I expect the partitions to be mostly-clustered
automatically, so it was bothering me that the existing data was unordered, and
that it might behave differently in the future.

What justifies spending so much time on this implementation?

Actually, I don't use partitioned indexes at all here, so this is not for us..
I worked on this after Adger asked about CIC on partitioned tables (for which I
have a patch in the queue). Isn't it worth supporting that (or should we
include an example about how to use format() with %I and \gexec) ?

VACUUM [FULL] has recursed into partitions since v10 (f0e44751d).
REINDEX supports partitioned tables in v14 (a6642b3ae).
Partitioned indexes exist since v11 (as you well know), so it's somewhat odd
that CLUSTER isn't supported, and seems increasingly weird as decreasing number
of DDL commands are not supported. Supporting DDL on partitioned tables
supports the idea that the physical partitions can be seen as an implementation
detail by the DBA, which I understand was the intent since v10.

You're right that I wouldn't plan to *routinely* re-cluster a partitioned
table. Rather, I'd cluster only its "recent" *partitions*, and leave the old
ones alone. Or cluster the partitions, a single time, once they're no longer
recent. I don't think the feature is marginal just because I don't use it
routinely.

My impression is that CLUSTER is pretty much a fringe command nowadays,
because of the access exclusive lock required.

A step forward would be to integrate something like pg_repack/reorg/squeeze.
I used pg_repack --index until v12 got REINDEX CONCURRENTLY. The goal there
was to improve index scans on some large, append-only partitions where the
planner gave an index scan, but performance was poor (now, we use BRIN so it
works well without reindex). I tested that this would still be an issue by
creating a non-brin index for a single day's table (even with v13 deduplication
and v12 TID tiebreak).

As I see it, support for partitioned cluster is orthogonal to an
online/concurrent cluster, which is a job for another patch.

--
Justin

Attachments:

v11-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+197-48
#16Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Justin Pryzby (#15)
Re: CLUSTER on partitioned index

On Sun, 12 Sept 2021 at 22:10, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time.

The cluster now is done one partition at a time, so it might take a long time,
but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and
(since v14) REINDEX.

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

I noticed that you store the result of find_all_inheritors(...,
NoLock) in get_tables_to_cluster_partitioned, without taking care of
potential concurrent partition hierarchy changes that the comment on
find_all_inheritors warns against or documenting why it is safe, which
sounds dangerous in case someone wants to adapt the code. One problem
I can think of is that only storing reloid and indoid is not
necessarily safe, as they might be reused by drop+create table running
parallel to the CLUSTER command.

Apart from that, I think it would be useful (though not strictly
necessary for this patch) if you could adapt the current CLUSTER
progress reporting to report the progress for the whole partition
hierarchy, instead of a new progress report for each relation, as was
the case in earlier versions of the patchset.

The v11 patch seems quite incomplete when compared to previous
discussions, or at the very least is very limited (no ALTER TABLE
clustering commands for partitioned tables, but `CLUSTER ptable USING
pindex` is supported). If v11 is the new proposed direction for ptable
clustering, could you also document these limitations in the
cluster.sgml and alter_table.sgml docs?

[ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]

diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
...
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ERROR:  cannot mark index clustered in partitioned table

This error message does not seem to match my expectation as a user: I
am not trying to mark an index as clustered, and for a normal table
"SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
that behaviour of normal unclustered tables should be shared here as
well. At the very least, the error message should be changed.

ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
ERROR: cannot mark index clustered in partitioned table

A "HINT: use the CLUSTER command to cluster partitioned tables" (or
equivalent) should be added if we decide to keep the clustering APIs
of ALTER TABLE disabled for partitioned tables, as CLUSTER is now
implemented for partitioned tables.

-DROP TABLE clstrpart;

I believe that this cleanup should not be fully removed, but moved to
before '-- Test CLUSTER with external tuplesorting', as the table is
not used after that line.

Kind regards,

Matthias van de Meent

#17Justin Pryzby
pryzby@telsasoft.com
In reply to: Matthias van de Meent (#16)
Re: CLUSTER on partitioned index

On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote:

On Sun, 12 Sept 2021 at 22:10, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time.

The cluster now is done one partition at a time, so it might take a long time,
but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and
(since v14) REINDEX.

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

Alvaro's critique was that the patchset was too complicated for what was
claimed to be a marginal feature. My response was to rearrange the patchset to
its minimal form, supporting CLUSTER without marking the index as clustered.

I noticed that you store the result of find_all_inheritors(...,
NoLock) in get_tables_to_cluster_partitioned, without taking care of
potential concurrent partition hierarchy changes that the comment on
find_all_inheritors warns against or documenting why it is safe, which
sounds dangerous in case someone wants to adapt the code. One problem
I can think of is that only storing reloid and indoid is not
necessarily safe, as they might be reused by drop+create table running
parallel to the CLUSTER command.

The parallel code in vacuum is expand_vacuum_rel(), which is where the
corresponding things happens for vacuum full. This patch is to make cluster()
do all the same stuff before calling cluster_rel().

What VACUUM tries to do is to avoid erroring if a partition is dropped while
cluster is running. cluster_rel() does the same thing by calling
cluster_multiple_rels() ,which uses CLUOPT_RECHECK.

If the OIDs wrapped around, I think existing vacuum could accidentally process
a new table with the same OID as a dropped partition. I think cluster would
*normally* catch that case and error in check_index_is_clusterable():
| Check that index is in fact an index on the given relation

Arguably VACUUM FULL could call cluster() (not cluster_rel()) and pass the
partitioned table rather than first expanding it. But non-full vacuum needs to
expand partitioned tables anyway.

Apart from that, I think it would be useful (though not strictly
necessary for this patch) if you could adapt the current CLUSTER
progress reporting to report the progress for the whole partition
hierarchy, instead of a new progress report for each relation, as was
the case in earlier versions of the patchset.

Yea, but this is already true for VACUUM FULL (which uses CLUSTER and supports
partitioned tables since v10) and REINDEX.
See also /messages/by-id/20210216064214.GI28165@telsasoft.com

My goal is to present a minimal patch and avoid any nonessential complexity.

The v11 patch seems quite incomplete when compared to previous
discussions, or at the very least is very limited (no ALTER TABLE
clustering commands for partitioned tables, but `CLUSTER ptable USING
pindex` is supported). If v11 is the new proposed direction for ptable
clustering, could you also document these limitations in the
cluster.sgml and alter_table.sgml docs?

You said it's less complete, but it's is due to deliberate reduction in scope.
cluster.sgml says:
+    Clustering a partitioned table clusters each of its partitions using the
+    partition of the specified partitioned index (which must be specified).

The ALTER restriction hasn't changed, so I didn't touch the documentation.

I am still curious myself to know if this is the direction the patch should
move.

[ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]

diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
...
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ERROR:  cannot mark index clustered in partitioned table

This error message does not seem to match my expectation as a user: I
am not trying to mark an index as clustered, and for a normal table
"SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
that behaviour of normal unclustered tables should be shared here as
well. At the very least, the error message should be changed.

This is the pre-existing behavior.

-DROP TABLE clstrpart;

I believe that this cleanup should not be fully removed, but moved to
before '-- Test CLUSTER with external tuplesorting', as the table is
not used after that line.

You're right - this was from when the patchset handled CLUSTER ON.
Leaving the index allows testing in pg_dump - a large part of the complexity of
the elided patches is to handle restoring a partitioned index, without
violating the rule that partitions of an clustered index must also be
clustered. I adjusted this in my local branch.

Thanks for looking. I'm going to see about updating comments based on
corresponding parts of vacuum and on this message itself.

--
Justin

#18Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#17)
Re: CLUSTER on partitioned index

On Thu, Sep 23, 2021 at 06:56:26PM -0500, Justin Pryzby wrote:

On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote:

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

Alvaro's critique was that the patchset was too complicated for what was
claimed to be a marginal feature. My response was to rearrange the patchset to
its minimal form, supporting CLUSTER without marking the index as clustered.

My goal is to present a minimal patch and avoid any nonessential complexity.

FWIW, my opinion on the matter is similar to Alvaro's, and an extra
read of the patch gives me the same impression. Let's see if others
have an opinion on the matter.

Thanks for looking. I'm going to see about updating comments based on
corresponding parts of vacuum and on this message itself.

It doesn't feel right to just discard the patch at this stage, and it
needs an update, so I have moved it to the next CF for now, waiting on
author. If this does not really move on, my suggestion is to discard
the patch at the end of next CF, aka 2022-01.
--
Michael

#19Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#18)
Re: CLUSTER on partitioned index

On Fri, Dec 03, 2021 at 10:16:24AM +0900, Michael Paquier wrote:

On Thu, Sep 23, 2021 at 06:56:26PM -0500, Justin Pryzby wrote:

On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote:

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

Alvaro's critique was that the patchset was too complicated for what was
claimed to be a marginal feature. My response was to rearrange the patchset to
its minimal form, supporting CLUSTER without marking the index as clustered.

My goal is to present a minimal patch and avoid any nonessential complexity.

FWIW, my opinion on the matter is similar to Alvaro's, and an extra
read of the patch gives me the same impression. Let's see if others
have an opinion on the matter.

You and Alvaro thought the patch was too complicated for its value, so I
reduced the scope to its essential form.

CLUSTER was claimed to be of marginal utility, since the table is locked.
But locking one partition at a time would be less disruptive than locking an
equivalent non-partitioned table.

There's only about a dozen, other remaining restrictions/limitations on
partitioned tables, (AMs, triggers, identity, generated, exclusion, CIC,
FREEZE).

Since it's supported to VACUUM (including VACUUM FULL) and REINDEX a
partitioned table, I'm still suprised there's much hesitation to support
CLUSTER (which is used by vacuum full).

Thanks for looking. I'm going to see about updating comments based on
corresponding parts of vacuum and on this message itself.

It doesn't feel right to just discard the patch at this stage, and it
needs an update, so I have moved it to the next CF for now, waiting on
author. If this does not really move on, my suggestion is to discard
the patch at the end of next CF, aka 2022-01.

This includes minor updates based on Mathias review (commit message and test
case).

--
Justin

Attachments:

v11-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload+204-46
#20Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#12)
Re: CLUSTER on partitioned index

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time. What justifies
spending so much time on this implementation? My impression is that
CLUSTER is pretty much a fringe command nowadays, because of the access
exclusive lock required.

Does anybody actually use it?

I hope that Alvaro will comment on the simplified patches. If multiple people
think the patch isn't worth it, feel free to close it. But I don't see how
complexity could be the reason.

--
Justin

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#20)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#22)
#24Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#21)
#25Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#22)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#24)
#27Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#26)
#28Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#26)
#29Justin Pryzby
pryzby@telsasoft.com
In reply to: Alvaro Herrera (#27)
#30Zhihong Yu
zyu@yugabyte.com
In reply to: Justin Pryzby (#29)
#31Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#29)
#32Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#31)
#33Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#32)
#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#29)
#35Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#34)
#36Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#35)
#37Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#36)