BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Started by PG Bug reporting form11 months ago19 messageshackersbugs

noreply@postgresql.org

11 months ago

hackersbugs

The following bug has been logged on the website:

Bug reference: 18959
Logged by: Maximilian Chrzan
Email address: maximilian.chrzan@here.com
PostgreSQL version: 17.4
Operating system: x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.
Description:

Dear PostgreSQL team,
We encountered a reproducible issue when creating expression indexes on a
partitioned table using:
CREATE INDEX IF NOT EXISTS ... ON partitioned_table ((expression));
When such statements are executed in parallel (e.g., via separate
transactions or threads), the PostgreSQL engine attempts to propagate the
index to each child partition using internally generated names like:
partition_name_expr_idx
partition_name_expr_idx1
partition_name_expr_idx2
...
These internal names are not derived from the index expression or parent
index name, but instead appear to be based on a counter of existing
expression indexes.
The Issue:
When multiple expressions are indexed in parallel on the same partitioned
table, even with distinct expressions and parent index names, the system may
generate the same internal name for partition-level indexes, causing:
(Postgres <17): ERROR: duplicate key value violates unique constraint
"pg_class_relname_nsp_index" 23505
(Postgres 17): relation "{index_name}" already exists 42P07
This occurs even though the parent-level index names are unique and
expressions differ.
Reproducer (simplified):
-- In separate sessions concurrently:
CREATE INDEX IF NOT EXISTS idx_expr1 ON parent_table (((jsondata -> 'a' ->
'b')));
CREATE INDEX IF NOT EXISTS idx_expr2 ON parent_table (((jsondata -> 'x' ->
'y')));
Internally, PostgreSQL attempts to create something like:
CREATE INDEX parent_table_partition1_expr_idx ON ...
CREATE INDEX parent_table_partition1_expr_idx ON ... -- collision
Expected behavior:
If expressions or parent index names differ, partition-level index names
should be derived deterministically from:
* Parent index name (preferred) eg.: parent_idx_name_partition1
* Or a hash of the expression (as fallback)
This would avoid internal naming collisions and allow safe concurrent
execution of CREATE INDEX IF NOT EXISTS on partitioned tables.
This issue limits scalability when programmatically creating multiple
JSON-path expression indexes on partitioned tables, and complicates use of
parallelism. While advisory locking is a possible workaround, it is not
ideal.
Thanks in advance for looking into it.
Best regards,
Max Chrzan

Dilip Kumar

dilipbalaut@gmail.com

10 months ago

In reply to: PG Bug reporting form (#1)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

On Sat, Jun 14, 2025 at 3:15 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 18959
Logged by: Maximilian Chrzan
Email address: maximilian.chrzan@here.com
PostgreSQL version: 17.4
Operating system: x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.
Description:

Dear PostgreSQL team,
We encountered a reproducible issue when creating expression indexes on a
partitioned table using:
CREATE INDEX IF NOT EXISTS ... ON partitioned_table ((expression));
When such statements are executed in parallel (e.g., via separate
transactions or threads), the PostgreSQL engine attempts to propagate the
index to each child partition using internally generated names like:
partition_name_expr_idx
partition_name_expr_idx1
partition_name_expr_idx2
...
These internal names are not derived from the index expression or parent
index name, but instead appear to be based on a counter of existing
expression indexes.
The Issue:
When multiple expressions are indexed in parallel on the same partitioned
table, even with distinct expressions and parent index names, the system may
generate the same internal name for partition-level indexes, causing:
(Postgres <17): ERROR: duplicate key value violates unique constraint
"pg_class_relname_nsp_index" 23505
(Postgres 17): relation "{index_name}" already exists 42P07
This occurs even though the parent-level index names are unique and
expressions differ.
Reproducer (simplified):
-- In separate sessions concurrently:
CREATE INDEX IF NOT EXISTS idx_expr1 ON parent_table (((jsondata -> 'a' ->
'b')));
CREATE INDEX IF NOT EXISTS idx_expr2 ON parent_table (((jsondata -> 'x' ->
'y')));
Internally, PostgreSQL attempts to create something like:
CREATE INDEX parent_table_partition1_expr_idx ON ...
CREATE INDEX parent_table_partition1_expr_idx ON ... -- collision
Expected behavior:
If expressions or parent index names differ, partition-level index names
should be derived deterministically from:
* Parent index name (preferred) eg.: parent_idx_name_partition1
* Or a hash of the expression (as fallback)
This would avoid internal naming collisions and allow safe concurrent
execution of CREATE INDEX IF NOT EXISTS on partitioned tables.
This issue limits scalability when programmatically creating multiple
JSON-path expression indexes on partitioned tables, and complicates use of
parallelism. While advisory locking is a possible workaround, it is not
ideal.

It seems beneficial to embed the parent index name within the names of
its partitioned child indexes, although it would become tricky when
building an index for a multi level partition hierarchy but we could
simplify this by only referencing the top-level user-provided index
name. This is my perspective, and I'm open to other ideas.

--
Regards,
Dilip Kumar
Google

Phineas Jensen

phin@zayda.net

10 months ago

In reply to: Dilip Kumar (#2)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

On Jun 18, 2025, at 4:29 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

It seems beneficial to embed the parent index name within the names of
its partitioned child indexes, although it would become tricky when
building an index for a multi level partition hierarchy but we could
simplify this by only referencing the top-level user-provided index
name. This is my perspective, and I'm open to other ideas.

I agree that embedding the parent index name would be the simplest solution for this case, but a similar bug would still happen if no index name was specified for the parent at all (e.g. CREATE INDEX ON parent_table ((jsondata->’a’->’b’)) ), although in that case, the conflict is on the parent table, not the child tables.

Would it be worth making CREATE INDEX add a short hash or some other unique key when no name is specified? Or does it make more sense to just say (maybe in the documentation) that if you are running CREATE INDEX multiple times concurrently that you should specify a name to avoid conflicts?

I created SQL and Bash scripts to reproduce the problem, which I’ve attached.

Phin Jensen

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Dilip Kumar (#2)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Dilip Kumar <dilipbalaut@gmail.com> writes:

On Sat, Jun 14, 2025 at 3:15 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

If expressions or parent index names differ, partition-level index names
should be derived deterministically from:
* Parent index name (preferred) eg.: parent_idx_name_partition1
* Or a hash of the expression (as fallback)
This would avoid internal naming collisions and allow safe concurrent
execution of CREATE INDEX IF NOT EXISTS on partitioned tables.

It seems beneficial to embed the parent index name within the names of
its partitioned child indexes, although it would become tricky when
building an index for a multi level partition hierarchy but we could
simplify this by only referencing the top-level user-provided index
name. This is my perspective, and I'm open to other ideas.

This seems very closely related to commit 3db61db48 [1]https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3db61db48, which fixed
a similar behavior for child foreign key constraints. Per that commit
message, it's a good idea for the child objects to have names related
to the parent objects, so we ought to change this behavior regardless
of any concurrent-failure considerations.

Having said that, I do not think that the OP's idea of fully
deterministic index name choice is workable. We don't constrain
partitions to be exactly like their parents; that means that an index
name that works fine at an upper level might conflict with some
pre-existing index on a child. So unless you prefer failure to
selecting a different name at the child level, it's necessary to
allow the child index names to sometimes be different.

But ... the code *does* have the ability to dodge conflicting
index names already; this is why you get
partition_name_expr_idx
partition_name_expr_idx1
partition_name_expr_idx2
and not immediate failure. If this isn't working reliably in
concurrent situations, that must mean that we are not obtaining
an exclusive lock before looking for pre-existing index names.
I'm not sure if that's a bug or intentional. My vague recollection
is that we intend to allow multiple CREATE INDEX in parallel, so it
may be that obtaining a lock would be a cure worse than the disease.

In any case, deriving the child index name(s) from the parent name
would reduce the scope of this problem, so I agree we ought to
make it do that.

regards, tom lane

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3db61db48

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Tom Lane (#4)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

I wrote:

This seems very closely related to commit 3db61db48 [1], which fixed
a similar behavior for child foreign key constraints. Per that commit
message, it's a good idea for the child objects to have names related
to the parent objects, so we ought to change this behavior regardless
of any concurrent-failure considerations.

I experimented with the attached, which borrows a couple of ideas
from 3db61db48 to produce names like "parent_index_2" when cloning
indexes. While it should help with the immediate problem, I'm not
sure if this is acceptable, because there are a *lot* of ensuing
changes in the regression tests, many more than 3db61db48 caused.
(Note that I didn't bother to fix places where the tests rely on
a generated name that has changed; the delta in the test outputs
is merely meant to give an idea of how much churn there is.
I didn't check non-core test suites, either.)

Also, looking at the error message changes, I'm less sure that
this is a UX improvement than I was about 3db61db48. Do people
care which partition a uniqueness constraint failed in? In
the current behavior, the index name will reflect that, but
with this behavior, not so much.

Anyway, maybe this is a good idea or maybe it isn't. Thoughts?

regards, tom lane

Chrzan, Maximilian

maximilian.chrzan@here.com

10 months ago

In reply to: Tom Lane (#5)

hackersbugs

AW: [EXTERNAL] Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

We are working with very large partitioned tables (500M+ rows, >1 TB of data) and need to create multiple expression indexes on them.

To avoid the issues with parallel index creation, we switched to sequential execution: as soon as one index finishes (usually after 1–2 hours), we immediately start the next (typically within a second). In this setup, there is no actual parallelism — yet we occasionally still hit this error:

ERROR: duplicate key value violates unique constraint "pg_class_relname_nsp_index"
Detail: Key (relname, relnamespace) = (…) already exists.

This suggests that the issue is not limited to concurrent execution. It can also occur when index creation happens in quick succession.

Additionally, we noticed that two parallel index creations on a partitioned table will block each other — even if they target different expressions. Here's a simplified example:

CREATE TABLE test (
jsondata JSONB,
version BIGINT NOT NULL DEFAULT 9223372036854775807
) PARTITION BY RANGE (version);

CREATE TABLE test_p0 PARTITION OF test FOR VALUES FROM (0) TO (100000);

Transaction 1:

DO $$
BEGIN
CREATE INDEX IF NOT EXISTS idx_1 ON test
(((jsondata -> 'properties') -> 'foo1') ASC NULLS LAST);
PERFORM pg_sleep(10);
END;
$$;

Transaction 2 (started in parallel):

DO $$
BEGIN
CREATE INDEX IF NOT EXISTS idx_2 ON test
(((jsondata -> 'properties') -> 'foo2') ASC NULLS LAST);
END;
$$;

Transaction 2 will block until Transaction 1 completes — and then fail with:

ERROR: duplicate key value violates unique constraint "pg_class_relname_nsp_index"
Detail: Key (relname, relnamespace) = (test_p1_expr_idx, 2200) already exists.

If the same indexes are created directly on the partition "test_p0", the second index is created immediately — without blocking or error.
________________________________
Von: Tom Lane <tgl@sss.pgh.pa.us>
Gesendet: Mittwoch, 18. Juni 2025 18:46
An: Dilip Kumar <dilipbalaut@gmail.com>
Cc: Chrzan, Maximilian <maximilian.chrzan@here.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Betreff: [EXTERNAL] Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

[Sie erhalten nicht h?ufig E-Mails von tgl@sss.pgh.pa.us. Weitere Informationen, warum dies wichtig ist, finden Sie unter https://aka.ms/LearnAboutSenderIdentification ]

LEARN FAST: This email originated outside of HERE.
Please do not click on links or open attachments unless you recognize the sender and know the content is safe. Thank you.

I wrote:

This seems very closely related to commit 3db61db48 [1], which fixed
a similar behavior for child foreign key constraints. Per that commit
message, it's a good idea for the child objects to have names related
to the parent objects, so we ought to change this behavior regardless
of any concurrent-failure considerations.

Anyway, maybe this is a good idea or maybe it isn't. Thoughts?

regards, tom lane

Dilip Kumar

dilipbalaut@gmail.com

10 months ago

In reply to: Chrzan, Maximilian (#6)

hackersbugs

Re: [EXTERNAL] Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

On Thu, Jun 19, 2025 at 7:38 PM Chrzan, Maximilian
<maximilian.chrzan@here.com> wrote:

We are working with very large partitioned tables (500M+ rows, >1 TB of data) and need to create multiple expression indexes on them.

To avoid the issues with parallel index creation, we switched to sequential execution: as soon as one index finishes (usually after 1–2 hours), we immediately start the next (typically within a second). In this setup, there is no actual parallelism — yet we occasionally still hit this error:

ERROR: duplicate key value violates unique constraint "pg_class_relname_nsp_index"
Detail: Key (relname, relnamespace) = (…) already exists.

This suggests that the issue is not limited to concurrent execution. It can also occur when index creation happens in quick succession.

Additionally, we noticed that two parallel index creations on a partitioned table will block each other — even if they target different expressions. Here's a simplified example:

CREATE TABLE test (
jsondata JSONB,
version BIGINT NOT NULL DEFAULT 9223372036854775807
) PARTITION BY RANGE (version);

CREATE TABLE test_p0 PARTITION OF test FOR VALUES FROM (0) TO (100000);

Transaction 1:

DO $$
BEGIN
CREATE INDEX IF NOT EXISTS idx_1 ON test
(((jsondata -> 'properties') -> 'foo1') ASC NULLS LAST);
PERFORM pg_sleep(10);
END;
$$;

Transaction 2 (started in parallel):

DO $$
BEGIN
CREATE INDEX IF NOT EXISTS idx_2 ON test
(((jsondata -> 'properties') -> 'foo2') ASC NULLS LAST);
END;
$$;

Transaction 2 will block until Transaction 1 completes — and then fail with:

I believe this is fundamentally the same issue we're addressing here.
We're observing duplicate index name creation on child tables. If the
first transaction remains open, the second transaction waits for it to
commit or roll back because it's attempting to insert the same index
name key into the catalog. Once the first transaction commits, the
second will roll back due to a unique key violation. Conversely, if
the first transaction rolls back, the second will succeed.

--
Regards,
Dilip Kumar
Google

Junwang Zhao

zhjwpku@gmail.com

10 months ago

In reply to: Tom Lane (#5)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Hi Tom,

On Thu, Jun 19, 2025 at 12:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

This seems very closely related to commit 3db61db48 [1], which fixed
a similar behavior for child foreign key constraints. Per that commit
message, it's a good idea for the child objects to have names related
to the parent objects, so we ought to change this behavior regardless
of any concurrent-failure considerations.

I experimented with the attached, which borrows a couple of ideas
from 3db61db48 to produce names like "parent_index_2" when cloning
indexes. While it should help with the immediate problem, I'm not
sure if this is acceptable, because there are a *lot* of ensuing
changes in the regression tests, many more than 3db61db48 caused.
(Note that I didn't bother to fix places where the tests rely on
a generated name that has changed; the delta in the test outputs
is merely meant to give an idea of how much churn there is.
I didn't check non-core test suites, either.)

I think this approach is better because each child index inherits its
parent's index name with an extra number, creating a more
intuitive hierarchy. This naming convention makes it easier to
understand the partition levels directly from the index name.
So I'm +1 for this idea.

Also, looking at the error message changes, I'm less sure that
this is a UX improvement than I was about 3db61db48. Do people
care which partition a uniqueness constraint failed in? In
the current behavior, the index name will reflect that, but
with this behavior, not so much.

I can see the benefit of being able to identify the associated
partition directly by checking the index name. Can we prepend
the partition rel name to the index name, this will make the
index longer, not sure if it's acceptable.

Anyway, maybe this is a good idea or maybe it isn't. Thoughts?

regards, tom lane

--
Regards
Junwang Zhao

Dilip Kumar

dilipbalaut@gmail.com

10 months ago

In reply to: Tom Lane (#5)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

On Wed, Jun 18, 2025 at 10:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

This seems very closely related to commit 3db61db48 [1], which fixed
a similar behavior for child foreign key constraints. Per that commit
message, it's a good idea for the child objects to have names related
to the parent objects, so we ought to change this behavior regardless
of any concurrent-failure considerations.

I experimented with the attached, which borrows a couple of ideas
from 3db61db48 to produce names like "parent_index_2" when cloning
indexes. While it should help with the immediate problem, I'm not
sure if this is acceptable, because there are a *lot* of ensuing
changes in the regression tests, many more than 3db61db48 caused.
(Note that I didn't bother to fix places where the tests rely on
a generated name that has changed; the delta in the test outputs
is merely meant to give an idea of how much churn there is.
I didn't check non-core test suites, either.)

Also, looking at the error message changes, I'm less sure that
this is a UX improvement than I was about 3db61db48. Do people
care which partition a uniqueness constraint failed in? In
the current behavior, the index name will reflect that, but
with this behavior, not so much.

Anyway, maybe this is a good idea or maybe it isn't. Thoughts?

I haven't reviewed the patch itself, but I like the idea. We're now
consistently using the parent index name for partitioned indexes,
whether they're named or unnamed indexes. That looks like a great
improvement. And I think including the partition number of each level
in the index name significantly enhances its clarity, especially
within a multi-level partition hierarchy.

--
Regards,
Dilip Kumar
Google

#10

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Dilip Kumar (#9)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Dilip Kumar <dilipbalaut@gmail.com> writes:

On Wed, Jun 18, 2025 at 10:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I experimented with the attached, which borrows a couple of ideas
from 3db61db48 to produce names like "parent_index_2" when cloning
indexes. While it should help with the immediate problem, I'm not
sure if this is acceptable, because there are a *lot* of ensuing
changes in the regression tests, many more than 3db61db48 caused.

I haven't reviewed the patch itself, but I like the idea. We're now
consistently using the parent index name for partitioned indexes,
whether they're named or unnamed indexes. That looks like a great
improvement. And I think including the partition number of each level
in the index name significantly enhances its clarity, especially
within a multi-level partition hierarchy.

A different approach that we could take --- possibly alongside doing
the above --- is to try to remove the race condition between two
sessions choosing the same index name. It doesn't look practical
to close the race window completely, but it's quite simple to make
it a whole lot shorter. If we check for a conflicting relation
name using SnapshotDirty instead of only looking for committed
pg_class rows, then the window is little more than the time needed
to insert the index's pg_class row, rather than being the whole
time needed to build the index. (The fact that the OP is working
with terabyte-sized tables is what's making this so bad for him.)

In the attached draft I only bothered to change the initial
probe for a conflicting pg_class entry. We could go further and
apply the same idea in ConstraintNameExists(), but I'm not sure
it's worth the trouble.

regards, tom lane

#11

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Dilip Kumar (#9)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Dilip Kumar <dilipbalaut@gmail.com> writes:

I haven't reviewed the patch itself, but I like the idea. We're now
consistently using the parent index name for partitioned indexes,
whether they're named or unnamed indexes. That looks like a great
improvement. And I think including the partition number of each level
in the index name significantly enhances its clarity, especially
within a multi-level partition hierarchy.

Since people seem to think this might be a good way to proceed,
I spent some effort on cleaning up the regression test changes.

While doing that, I decided that applying this behavioral change to
CREATE TABLE LIKE (the original user of generateClonedIndexStmt)
might not be such a hot idea: the regression test changes that
that induced felt less natural than the ones involving partitioned
indexes. Another practical reason is that all the calls for
partitioned indexes will call DefineIndex immediately, so the
race-condition window for some other session to claim the same
index name is barely wider than it was before. But in CREATE TABLE
LIKE, there's considerably more delay, and I think it might even
be possible to construct counterexamples where our own process
could try to create two identically-named indexes if we try to
nail down the index name in generateClonedIndexStmt.

So that leads me to the attached. Excluding CREATE TABLE LIKE
reduces the number of regression-test changes a little, but
there's still a lot of them, implying this is a nontrivial
behavioral change for users. So I feel like this is not
something to squeeze into v18 post-beta-1. I'm thinking it'd
be appropriate for v19 instead. (We could perhaps back-patch
the other SnapshotDirty patch to ameliorate the problem in the
back branches.)

regards, tom lane

#12

Dilip Kumar

dilipbalaut@gmail.com

10 months ago

In reply to: Tom Lane (#11)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

On Fri, Jun 20, 2025 at 2:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dilip Kumar <dilipbalaut@gmail.com> writes:

I haven't reviewed the patch itself, but I like the idea. We're now
consistently using the parent index name for partitioned indexes,
whether they're named or unnamed indexes. That looks like a great
improvement. And I think including the partition number of each level
in the index name significantly enhances its clarity, especially
within a multi-level partition hierarchy.

Since people seem to think this might be a good way to proceed,
I spent some effort on cleaning up the regression test changes.

While doing that, I decided that applying this behavioral change to
CREATE TABLE LIKE (the original user of generateClonedIndexStmt)
might not be such a hot idea: the regression test changes that
that induced felt less natural than the ones involving partitioned
indexes. Another practical reason is that all the calls for
partitioned indexes will call DefineIndex immediately, so the
race-condition window for some other session to claim the same
index name is barely wider than it was before. But in CREATE TABLE
LIKE, there's considerably more delay, and I think it might even
be possible to construct counterexamples where our own process
could try to create two identically-named indexes if we try to
nail down the index name in generateClonedIndexStmt.

So that leads me to the attached.

The patch LGTM

Excluding CREATE TABLE LIKE

reduces the number of regression-test changes a little, but
there's still a lot of them, implying this is a nontrivial
behavioral change for users. So I feel like this is not
something to squeeze into v18 post-beta-1. I'm thinking it'd
be appropriate for v19 instead. (We could perhaps back-patch
the other SnapshotDirty patch to ameliorate the problem in the
back branches.)

Yes, that makes sense to apply in v19 because of user visible behavior
changes in index names. I agree the SnapshotDirty patch can give
relief for this case for back branches.

--
Regards,
Dilip Kumar
Google

#13

Tom Lane

tgl@sss.pgh.pa.us

10 months ago

In reply to: Dilip Kumar (#12)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Dilip Kumar <dilipbalaut@gmail.com> writes:

Yes, that makes sense to apply in v19 because of user visible behavior
changes in index names. I agree the SnapshotDirty patch can give
relief for this case for back branches.

OK, I pushed the SnapshotDirty patch. The other patch still seems
to apply over it, so I won't repost that unless the cfbot thinks
differently.

regards, tom lane

#14

Tom Lane

tgl@sss.pgh.pa.us

8 months ago

In reply to: Dilip Kumar (#12)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

[ moving thread to -hackers for more visibility ]

Dilip Kumar <dilipbalaut@gmail.com> writes:

On Fri, Jun 20, 2025 at 2:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dilip Kumar <dilipbalaut@gmail.com> writes:

I haven't reviewed the patch itself, but I like the idea. We're now
consistently using the parent index name for partitioned indexes,
whether they're named or unnamed indexes. That looks like a great
improvement. And I think including the partition number of each level
in the index name significantly enhances its clarity, especially
within a multi-level partition hierarchy.

Since people seem to think this might be a good way to proceed,
I spent some effort on cleaning up the regression test changes.

The patch LGTM

Attached find a rebase that copes with some recent test changes;
there's no substantive difference.

I'm surprised that this didn't break sooner, TBH, since it touches
so many test cases. I'd kind of like to either get it pushed or
write it off as a bad idea. Does anyone else care to comment?

regards, tom lane

#15

Robert Haas

robertmhaas@gmail.com

7 months ago

In reply to: Tom Lane (#14)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

On Fri, Aug 29, 2025 at 11:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm surprised that this didn't break sooner, TBH, since it touches
so many test cases. I'd kind of like to either get it pushed or
write it off as a bad idea. Does anyone else care to comment?

I'll take the risk of expressing an opinion: I think we should do
something about this problem, but I'm not all that convinced we should
do this particular thing. For example, consider this change from your
patch:

- idxpart1_a_idx   | CREATE INDEX idxpart1_a_idx ON public.idxpart1
USING btree (a)
- idxpart1_c_b_idx | CREATE INDEX idxpart1_c_b_idx ON public.idxpart1
USING btree (c, b)
  idxpart2_a_idx   | CREATE INDEX idxpart2_a_idx ON public.idxpart2
USING btree (a)
  idxpart2_c_b_idx | CREATE INDEX idxpart2_c_b_idx ON public.idxpart2
USING btree (c, b)
  idxparti         | CREATE INDEX idxparti ON ONLY public.idxpart
USING btree (a)
  idxparti2        | CREATE INDEX idxparti2 ON ONLY public.idxpart
USING btree (c, b)
+ idxparti2_1      | CREATE INDEX idxparti2_1 ON public.idxpart1 USING
btree (c, b)
+ idxparti_1       | CREATE INDEX idxparti_1 ON public.idxpart1 USING btree (a)

There are two things about this that don't seem great to me. First,
the index names for idxpart1 are no longer consistent with the index
names for idxpart2. Second, IMHO, the names are worse. Let's talk
about each of those problems separately. The reason the names aren't
consistent any more is because idxpart2 is created as a standalone
table and then attached as a partition, whereas idxpart1 is a
partition from the first moment of its existence. It is not essential
that the index names not vary based on which way the user does it, but
I think it is a nicer user experience if they don't. Now, what about
the absolute quality of the names? The change makes the index names
more consistent with the name of the index on the parent, which is
nice, but we also lose something: the index names are now less
consistent with the names of the child tables, and they don't mention
the affected columns any more. I think those are pretty major
drawbacks. In fact, these kinds of examples can understand the degree
to which we've lost commonality with the child table names, because
here the child tables and the indexes are named with increasing
integer counters, but in general the partitions might have names like
2025_09, 2025_10, etc. and the indexes are just going to be counting
up 1, 2, ... and I feel like that's worse than what we do now. The
pg_overexplain output changes make this point somewhat dramatically,
perhaps over-dramatically:

-         ->  Index Scan using brassica_id_idx on brassica v1_1
(actual rows=N.NN loops=1)
+         ->  Index Scan using vegetables_id_idx_1 on brassica v1_1
(actual rows=N.NN loops=1)
-         ->  Index Scan using daucus_id_idx on daucus v1_2 (actual
rows=N.NN loops=1)
+         ->  Index Scan using vegetables_id_idx_2 on daucus v1_2
(actual rows=N.NN loops=1)

Surely, it's not as nice for the indexes on the brassica and daucus
tables to be named vegetables_id_idx_1 and vegetables_id_idx_2 rather
than brasica_id_ix and daucus_id_idx. That's just gotta be worse. The
fact that you could still get it the other way if you created the
partitions standalone and then attached them makes it even worse.

IMHO, the real problem here is that when an index is created on a
column, we have this idea (with which I agree) that it would be nice
to include the column name in the index, but when we have an
expression we go "oh, rats, there's no column name, I guess we'll just
use 'expr'", which doesn't scale very well beyond a single expression
index. For example:

-    "pt12_expr_idx" btree ((mydouble(category) + 1))
-    "pt12_sdata_idx" btree (sdata)
-    "pt12_tdata_idx" btree (tdata COLLATE mycollation)
+    "pti1_1" btree ((mydouble(category) + 1))
+    "pti2_1" btree (sdata)
+    "pti3_1" btree (tdata COLLATE mycollation)

Hypothetically, what if the second and third indexes ended up being
named just as they are, but the first index ended up being called
pt12_mydouble_idx? That would be similar to the way that a function
name is used as a column alias if none is provided. Granted, that
wouldn't help the OP, whose expressions look like this:

jsondata -> 'a' -> 'b'
jsondata -> 'x' -> 'y'

It's asking a lot for an algorithm to produce distinct, human-readable
names for these indexes, especially given that there could be other
expression indexes on jsondata ---> 'a'----> 'b' i.e. differing only
in the operator name. Nonetheless, my intuition here is that the root
of the problem here is that we throw up our hands and just say "expr".
One way forward could be to do some sort of hash for the plan tree and
instead of saying "expr", say "exprXXXX" where each X is an integer or
a hex digit or something. Then it's very likely that all of the
indexes would get different autogenerated names regardless of how the
expression differs. This is not without problems: it's complicated,
and it might create buildfarm instability. But from a theoretical
perspective I like it better, because it seems to me that it's getting
at the root cause of the problem, which IMHO is that "expr" is not a
great descriptor for an arbitrary expression.

Now all that being said, I'm not volunteering to do the work here and
I don't think that what you propose is the most horribly idea anyone's
ever had, or anything remotely close to that. It could be that you
don't agree (perhaps rightly) or it could be that you agree in a
theoretical world but don't want to do the work to get there. So don't
understand this as a vigorous attempt to block the patch, just as me
giving my view of how I see it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16

Tom Lane

tgl@sss.pgh.pa.us

7 months ago

In reply to: Robert Haas (#15)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Robert Haas <robertmhaas@gmail.com> writes:

I'll take the risk of expressing an opinion: I think we should do
something about this problem, but I'm not all that convinced we should
do this particular thing.

Thanks for weighing in! Opinions are good, and I'm by no means set
on doing what is in my submitted patch. However, I'm failing to
extract a clearly-better alternative from your message.

There are two things about this that don't seem great to me. First,
the index names for idxpart1 are no longer consistent with the index
names for idxpart2. Second, IMHO, the names are worse. Let's talk
about each of those problems separately. The reason the names aren't
consistent any more is because idxpart2 is created as a standalone
table and then attached as a partition, whereas idxpart1 is a
partition from the first moment of its existence. It is not essential
that the index names not vary based on which way the user does it, but
I think it is a nicer user experience if they don't.

I think I'm not getting something here. Isn't that inconsistency
directly traceable to the user having supplied inconsistent names
to begin with? Surely we're not going to get into the business of
overruling the user's choice of names, so it seems to me that some
cases like this are inevitable. The patch does change which cases
those are, but I don't see how we avoid all such changes except by
sitting on the current rules forever. Maybe what your complaint
really points to is that this regression test case is designed around
the old naming conventions, and we ought to do more-extensive surgery
on it so that the names that are test-script-determined are consistent
with the new approach.

Now, what about
the absolute quality of the names? The change makes the index names
more consistent with the name of the index on the parent, which is
nice, but we also lose something: the index names are now less
consistent with the names of the child tables, and they don't mention
the affected columns any more.

Well, again, that's a user decision.

Surely, it's not as nice for the indexes on the brassica and daucus
tables to be named vegetables_id_idx_1 and vegetables_id_idx_2 rather
than brasica_id_ix and daucus_id_idx. That's just gotta be worse.

I don't really agree. The only thing the overexplain test script
does to create these indexes is

CREATE INDEX ON vegetables (id);

I don't see why it's even slightly surprising that the resulting
child indexes should have names involving "vegetables" and "id".
If anything, I'd argue that the current behavior is more surprising,
even if we've grown used to it.

The fact that you could still get it the other way if you created the
partitions standalone and then attached them makes it even worse.

True, but I think people who didn't like that would soon adapt their
choices of index names.

IMHO, the real problem here is that when an index is created on a
column, we have this idea (with which I agree) that it would be nice
to include the column name in the index, but when we have an
expression we go "oh, rats, there's no column name, I guess we'll just
use 'expr'", which doesn't scale very well beyond a single expression
index.

Fair complaint, but I'm not hearing a workable proposal for
something better to do with expression indexes. This:

One way forward could be to do some sort of hash for the plan tree and
instead of saying "expr", say "exprXXXX" where each X is an integer or
a hex digit or something.

would yield names that are neither intelligible nor readily
distinguishable from each other. Trying to make them stable across
system changes seems like a doomed project as well.

I do like the idea of pulling out function and variable names from
the index's expression. That won't get us all the way to unique
names, but we have to have a rule for fixing duplicate names anyway
(since you can make more than one plain index on the same column).
So we could do whatever seems sensible for deriving an
expression-index name and then deal with remaining duplications
the same way as for non-expression indexes.

In any case, the issues around expression-index names feel like
an orthogonal problem to me; I don't agree that a fix for that
would remove the need to do the sorts of things I'm suggesting.

So, how do we move forward? I'm perfectly willing to look into
the derive-a-name-from-the-expression idea, but I think that'd
best be done in a separate patch.

regards, tom lane

#17

Robert Haas

robertmhaas@gmail.com

7 months ago

In reply to: Tom Lane (#16)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

On Mon, Sep 15, 2025 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thanks for weighing in! Opinions are good, and I'm by no means set
on doing what is in my submitted patch. However, I'm failing to
extract a clearly-better alternative from your message.

Oh, you wanted an *actionable* opinion? :-)

I think I'm not getting something here. Isn't that inconsistency
directly traceable to the user having supplied inconsistent names
to begin with?

I don't think so. The user's create index commands in this example are:

create index idxparti on idxpart (a);
create index idxparti2 on idxpart (c, b);
create index on idxpart2 (a);
create index on idxpart2 (c, b);

I don't see anything inconsistent there. Granted, they could have
spelled out the index names for idxpart2, but the point from my POV is
that the index names here are fine without the patch, and yet the
patch changes them. At least AFAICS, the only issue we currently have
is when there are expression indexes involved, and more than one of
them.

Maybe what your complaint
really points to is that this regression test case is designed around
the old naming conventions, and we ought to do more-extensive surgery
on it so that the names that are test-script-determined are consistent
with the new approach.

There's arguably some of this, but I don't see it as the main issue.

Now, what about
the absolute quality of the names? The change makes the index names
more consistent with the name of the index on the parent, which is
nice, but we also lose something: the index names are now less
consistent with the names of the child tables, and they don't mention
the affected columns any more.

Well, again, that's a user decision.

Not when the names are system-generated -- we of course must adhere to
the user's choice of names, but we can't blame the user if we start
generating inferior names.

Surely, it's not as nice for the indexes on the brassica and daucus
tables to be named vegetables_id_idx_1 and vegetables_id_idx_2 rather
than brasica_id_ix and daucus_id_idx. That's just gotta be worse.

I don't really agree. The only thing the overexplain test script
does to create these indexes is

CREATE INDEX ON vegetables (id);

I don't see why it's even slightly surprising that the resulting
child indexes should have names involving "vegetables" and "id".
If anything, I'd argue that the current behavior is more surprising,
even if we've grown used to it.

OK. So that's a difference of opinion on what looks best then, which
is fair enough.

So, how do we move forward? I'm perfectly willing to look into
the derive-a-name-from-the-expression idea, but I think that'd
best be done in a separate patch.

I'm not sure. To me, the munging of everything together under the name
"expr" is the root of the problem here, and since this patch isn't
really addressing that problem, it's kind of to one side of what I see
as the main point. However, that's a judgement call, and if you or
others see it differently, then so be it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18

Tom Lane

tgl@sss.pgh.pa.us

7 months ago

In reply to: Robert Haas (#17)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Sep 15, 2025 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So, how do we move forward? I'm perfectly willing to look into
the derive-a-name-from-the-expression idea, but I think that'd
best be done in a separate patch.

I'm not sure. To me, the munging of everything together under the name
"expr" is the root of the problem here, and since this patch isn't
really addressing that problem, it's kind of to one side of what I see
as the main point. However, that's a judgement call, and if you or
others see it differently, then so be it.

Well, let's leave it as a difference of opinion for the moment.
I do agree that improving the names generated for expression indexes
would be useful independently of this. I propose setting this patch
aside for the time being, and I will go look into that, and then
if that gets accepted we can come back here and discuss how much
of a problem remains.

regards, tom lane

#19

Tom Lane

tgl@sss.pgh.pa.us

7 months ago

In reply to: Tom Lane (#18)

hackersbugs

Re: BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

I wrote:

I do agree that improving the names generated for expression indexes
would be useful independently of this. I propose setting this patch
aside for the time being, and I will go look into that, and then
if that gets accepted we can come back here and discuss how much
of a problem remains.

I started a separate thread for that, mostly so the cfbot doesn't
see it as the same patch:

/messages/by-id/876799.1757987810@sss.pgh.pa.us

regards, tom lane

BUG #18959: Name collisions of expression indexes during parallel Index creations on a pratitioned table.

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: