Improving the names generated for indexes on expressions

Started by Tom Lane9 months ago17 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

In the wake of the discussion around bug #18959 [1]/messages/by-id/18959-f63b53b864bb1417@postgresql.org, here is
a modest proposal for improving the names we pick for expression
indexes. The commit message explains the details, but this
example should give the flavor:

postgres=# create table mytab (f1 int, f2 text, f3 text);
CREATE TABLE
postgres=# create index on mytab(abs(f1 + 1));
CREATE INDEX
postgres=# create index on mytab((f2 || f3));
CREATE INDEX
postgres=# \d mytab
Table "public.mytab"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
f1 | integer | | |
f2 | text | | |
f3 | text | | |
Indexes:
"mytab_abs_f1_+_1_idx" btree (abs(f1 + 1))
"mytab_f2_||_f3_idx" btree ((f2 || f3))

Formerly you got:

"mytab_abs_idx" btree (abs(f1 + 1))
"mytab_expr_idx" btree ((f2 || f3))

There's plenty of room for differing opinions about how to do this,
so have at it.

regards, tom lane

[1]: /messages/by-id/18959-f63b53b864bb1417@postgresql.org

Attachments:

v1-0001-Improve-the-names-generated-for-indexes-on-expres.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Improve-the-names-generated-for-indexes-on-expres.p; name*1=atchDownload+295-148
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#1)
Re: Improving the names generated for indexes on expressions

Hi

út 16. 9. 2025 v 3:57 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

In the wake of the discussion around bug #18959 [1], here is
a modest proposal for improving the names we pick for expression
indexes. The commit message explains the details, but this
example should give the flavor:

postgres=# create table mytab (f1 int, f2 text, f3 text);
CREATE TABLE
postgres=# create index on mytab(abs(f1 + 1));
CREATE INDEX
postgres=# create index on mytab((f2 || f3));
CREATE INDEX
postgres=# \d mytab
Table "public.mytab"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
f1 | integer | | |
f2 | text | | |
f3 | text | | |
Indexes:
"mytab_abs_f1_+_1_idx" btree (abs(f1 + 1))
"mytab_f2_||_f3_idx" btree ((f2 || f3))

Formerly you got:

"mytab_abs_idx" btree (abs(f1 + 1))
"mytab_expr_idx" btree ((f2 || f3))

There's plenty of room for differing opinions about how to do this,
so have at it.

-1

I don't like the introduction of the necessity to use double quotes. If
somebody needs a better name, then he can use an explicit name.

Regards

Pavel

Show quoted text

regards, tom lane

[1]
/messages/by-id/18959-f63b53b864bb1417@postgresql.org

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#1)
Re: Improving the names generated for indexes on expressions

On Monday, September 15, 2025, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In the wake of the discussion around bug #18959 [1], here is
a modest proposal for improving the names we pick for expression
indexes. The commit message explains the details, but this
example should give the flavor:

postgres=# create table mytab (f1 int, f2 text, f3 text);
CREATE TABLE
postgres=# create index on mytab(abs(f1 + 1));
CREATE INDEX
postgres=# create index on mytab((f2 || f3));
CREATE INDEX
postgres=# \d mytab
Table "public.mytab"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
f1 | integer | | |
f2 | text | | |
f3 | text | | |
Indexes:
"mytab_abs_f1_+_1_idx" btree (abs(f1 + 1))
"mytab_f2_||_f3_idx" btree ((f2 || f3))

Formerly you got:

"mytab_abs_idx" btree (abs(f1 + 1))
"mytab_expr_idx" btree ((f2 || f3))

There's plenty of room for differing opinions about how to do this,
so have at it.

If there are no function names present, output “expr” in lieu of a function
name. Then just output any columns that are present. No operators, no
constants. If multiple functions, exist output just the first one
encountered. I’d make an exception for a boolean constant and include
true/false as well.

mytab_abs_f1_idx
mytab_expr_f2_f3_idx

I fear consistently exceeding 63 bytes of identifier length if we choose to
display the entire expression in the name. And I find it unpleasant to
read, which is generally not good for a name - though index names are not
as visible so it’s not as strong a dislike. This seems like a reasonable
compromise that is likely to communicate the most salient aspects of an
expression. It does detract from the emphasis on operators we tend to
have, but it exactly those that make the name unpleasant.

David J.

#4Robert Haas
robertmhaas@gmail.com
In reply to: David G. Johnston (#3)
Re: Improving the names generated for indexes on expressions

On Tue, Sep 16, 2025 at 12:56 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

If there are no function names present, output “expr” in lieu of a function name. Then just output any columns that are present. No operators, no constants.

In the previous discussion, the user's expression indexes were on
these expressions:

jsondata -> 'a' -> 'b'
jsondata -> 'x' -> 'y'

So "no operators, no constants" wouldn't really allow us to make any
useful progress, inasmuch as it would throw away everything that
matters.

I am not really sure we want to do what Tom proposes here because, as
Pavel says, it would result in a lot of indexes containing special
characters in the name. But I do want us to try to find some way of
giving indexes on different expressions different names.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: Improving the names generated for indexes on expressions

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Sep 16, 2025 at 12:56 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

If there are no function names present, output “expr” in lieu of a function name. Then just output any columns that are present. No operators, no constants.

In the previous discussion, the user's expression indexes were on
these expressions:

jsondata -> 'a' -> 'b'
jsondata -> 'x' -> 'y'

So "no operators, no constants" wouldn't really allow us to make any
useful progress, inasmuch as it would throw away everything that
matters.

Precisely. It's exactly expression indexes on OpExprs that
FigureColumn is completely useless for; if we don't do something
for those then we've not moved the needle much. I initially
tried leaving out Consts, but that didn't work well on the
indexes in the regression tests, let alone the field example
Robert cites.

One thing I was thinking about after putting up the initial draft
was to suppress the underscores around operator names, so that
instead of

mytab_jsondata_->_a_->_b_idx

the above would net you

mytab_jsondata->a->b_idx

It's less consistent but looks less busy too.

I am not really sure we want to do what Tom proposes here because, as
Pavel says, it would result in a lot of indexes containing special
characters in the name.

Question is, why should we care about that?

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#3)
Re: Improving the names generated for indexes on expressions

"David G. Johnston" <david.g.johnston@gmail.com> writes:

I fear consistently exceeding 63 bytes of identifier length if we choose to
display the entire expression in the name.

I was worried about that too, but at least among our regression test
cases, there are none that come even close to 63 bytes under this
proposal. So I think the concern is overblown.

regards, tom lane

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#4)
Re: Improving the names generated for indexes on expressions

On Tuesday, September 16, 2025, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Sep 16, 2025 at 12:56 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

If there are no function names present, output “expr” in lieu of a

function name. Then just output any columns that are present. No
operators, no constants.

In the previous discussion, the user's expression indexes were on
these expressions:

jsondata -> 'a' -> 'b'
jsondata -> 'x' -> 'y'

So "no operators, no constants" wouldn't really allow us to make any
useful progress, inasmuch as it would throw away everything that
matters.

Wouldn’t mind special-casing json/jsonb in the algorithm:
tbl_col.const.const_idx

More generally, maybe map all the various common accessor operators to “.”
instead of using them directly and capture the constants when chained from
a column.

This forces the need for double-quoting but that doesn’t seem like
something we can really avoid.

Keeps the name short by collapsing the operator to a single symbol, and
removing single quotes from the constants.

David J.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#7)
Re: Improving the names generated for indexes on expressions

"David G. Johnston" <david.g.johnston@gmail.com> writes:

More generally, maybe map all the various common accessor operators to “.”
instead of using them directly and capture the constants when chained from
a column.

That seems fairly useless. You still have a name that requires
double quotes, and you can't tell one operator from another, and
you haven't even saved much space because few operator names are
longer than two or three characters.

(I have thought a little about truncating the contents of Consts
to maybe a dozen bytes for this purpose. Those seem much more
likely to be long...)

More generally, though, I absolutely object to giving the JSON
operators some kind of special privilege in this context.
That's totally not per Postgres style, and besides it's not
solving the problem as a whole, but just this one example.

regards, tom lane

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: Improving the names generated for indexes on expressions

On Tue, Sep 16, 2025 at 8:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Question is, why should we care about that?

It's a fair question, and I can't think of any hard-and-fast reason.
However, I suspect that some users may not like it; the quick -1 from
Pavel lends credence to that theory, IMHO. People do use DDL commands
to operate on indexes, so it's not crazy that quoting could be a
hassle. AFAICT, there's no precedent for the exact thing you've done
here: e.g. the column alias for generate_series(1,10) ends up as the
function name, without the arguments. On the other hand, I confess I'm
not sure we've made the right decision there: limiting it to just the
function name can help to keep the length reasonable, but it can also
often make the column name pretty meaningless, as when the expression
was something like round(interesting_calculation(whatever)) and you
end up with "round" as the column name.

Do you think there's any way of doing this usefully while not
autogenerating names that require quoting, or is that a hopeless
endeavor? If it's hopeless, is it better to accept autogenerated names
that require quoting, or is it better to solve the problem on the
other thread with something more like what you proposed there?

Maybe those are questions that you were hoping I was going to have an
answer to, so I'll tell you my bias. I would be inclined to smush any
series of quote-requiring characters that appear in the expression
down to a single underscore. If that results in name collisions, then
the user should consider specifying names for each partition
themselves, or writing the index expressions so they're less similar,
or not concurrently creating indexes on near-identical expressions.
Maybe that's too harsh a position, but this patch with that change
would still solve the complained-of case (because the alphanumeric
constants differ) and would I believe considerably improve the
intelligibility of index names derived from expression indexes. As
previously stated, I like distinguishing different index expressions
more than I like revising the naming convention for partition-child
indexes vs. indexes created directly on a child table. But of course,
these are all arguable positions.

--
Robert Haas
EDB: http://www.enterprisedb.com

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: Improving the names generated for indexes on expressions

[ Nobody has had any better ideas in 2 months, so back to this ... ]

Robert Haas <robertmhaas@gmail.com> writes:

Do you think there's any way of doing this usefully while not
autogenerating names that require quoting, or is that a hopeless
endeavor? If it's hopeless, is it better to accept autogenerated names
that require quoting, or is it better to solve the problem on the
other thread with something more like what you proposed there?

If people are dead set against double-quoting, I think the only way
forward is to not include operator names in the generated index names.
I do not think that's a better way personally, but I seem to be
outvoted. Here's a v2 that does it like that.

regards, tom lane

Attachments:

v2-0001-Improve-the-names-generated-for-indexes-on-expres.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Improve-the-names-generated-for-indexes-on-expres.p; name*1=atchDownload+265-144
#11Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: Improving the names generated for indexes on expressions

On Sat, Nov 1, 2025 at 6:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If people are dead set against double-quoting, I think the only way
forward is to not include operator names in the generated index names.
I do not think that's a better way personally, but I seem to be
outvoted. Here's a v2 that does it like that.

Is this better or worse than deparsing the whole expression to a
string and then smushing sequences of quote-requiring characters down
to a single underscore, as I suggested before?

I'm not saying what you've done is worse, but it does have the
disadvantage that you can only cover a few node types, whereas that
approach would cover every node type more or less automatically. Maybe
there are implementation difficulties, but I wouldn't think so... or
maybe you've got a scenario in mind where it would produce silly
results.

Nonwithstanding the above, I looked at the regression test cases and I
think all the changes are improvements. But, for example:

-ERROR:  cannot alter type "test_type1" because column
"test_tbl1_idx.row" uses it
+ERROR:  cannot alter type "test_type1" because column
"test_tbl1_idx.x_y" uses it

This would come out complaining about test_tbl1_idx_row_x_y_test_type1
with the above proposal. I like the inclusion of row. The inclusion of
test_type1 I like less well, but I'd accept it in the name of
uniformity. However, I don't feel strongly about it -- if you like
what you've done here, I'd say go for it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#11)
Re: Improving the names generated for indexes on expressions

[ I was hoping for opinions from more people, but so far it's crickets ]

Robert Haas <robertmhaas@gmail.com> writes:

Is this better or worse than deparsing the whole expression to a
string and then smushing sequences of quote-requiring characters down
to a single underscore, as I suggested before?

I spent some time trying to parse out why I don't like that
suggestion, and eventually realized that it's this: I don't want
the behavior of CREATE INDEX to be dependent on every last detail
of ruleutils.c. I'm afraid that that'd introduce undesirable
cross-version changes in the names selected for indexes. Now,
considering this sort of change at all requires an assumption that
we can get away with breaking any applications that are sensitive
to that. But we only have to assume that we can get away with that
once. If we get ruleutils.c involved then I foresee a steady drip
drip drip of edge-case naming changes, and I think that will annoy
people.

In any event, the cfbot has been nagging me that this patch needs
a rebase, so here's v3. The only change from v2 is that there are
some new test cases in indexing.sql that need adjustment.

regards, tom lane

Attachments:

v3-0001-Improve-the-names-generated-for-indexes-on-expres.patchtext/x-diff; charset=us-ascii; name*0=v3-0001-Improve-the-names-generated-for-indexes-on-expres.p; name*1=atchDownload+316-189
#13solai v
solai.cdac@gmail.com
In reply to: Tom Lane (#10)
Re: Improving the names generated for indexes on expressions

Hi,

If people are dead set against double-quoting, I think the only way
forward is to not include operator names in the generated index names.
I do not think that's a better way personally, but I seem to be
outvoted. Here's a v2 that does it like that.

regards, tom lane

I applied the v2 patch and tested it.
The patch applied and built successfully.I tested several expression
indexes(function,operator,JSON,and nested-function expressions).The
generated index names are now more descriptive and make it easier to
identify the indexed expression.
I also ran make check.One regression test (indexing) failed.From the
diff,the failure appears to be due to the regression test still
referencing the old autogenerated index names such as
idxpart_expr_idx,while the patched code now generates names such as
idxpart_a_b_idx.
The manual tests behaved as expected,and the new naming scheme
successfully produces more meaningful and distinguishable index names.
The regression failure appears to be related to outdated test
expectations rather than an issue with the patch functionality itself.

Regards,
Solai

#14solai v
solai.cdac@gmail.com
In reply to: Tom Lane (#12)
Re: Improving the names generated for indexes on expressions

Hi,

In any event, the cfbot has been nagging me that this patch needs
a rebase, so here's v3. The only change from v2 is that there are
some new test cases in indexing.sql that need adjustment.

I applied the v3 patch and tested it.
The patch applied and built successfully.I reran the expression index
test cases i had used previously,including function,operator,JSON
,nested function ,and partitioned expression indexes.The generated
index names remained descriptive and behaved as expected.
I also run make check ,and all regression tests passed successfully.
The indexing regression test failure that i reported while testing v2
appears to be resolved in this version.

Regards,
Solai

#15Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#12)
Re: Improving the names generated for indexes on expressions

On Thu, Jun 4, 2026 at 6:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I spent some time trying to parse out why I don't like that
suggestion, and eventually realized that it's this: I don't want
the behavior of CREATE INDEX to be dependent on every last detail
of ruleutils.c. I'm afraid that that'd introduce undesirable
cross-version changes in the names selected for indexes. Now,
considering this sort of change at all requires an assumption that
we can get away with breaking any applications that are sensitive
to that. But we only have to assume that we can get away with that
once. If we get ruleutils.c involved then I foresee a steady drip
drip drip of edge-case naming changes, and I think that will annoy
people.

Hmm, I don't know. I suppose you could go back and see whether making
these changes in the back-branches would produce different regression
test changes than making them against the master branch.

In any event, the cfbot has been nagging me that this patch needs
a rebase, so here's v3. The only change from v2 is that there are
some new test cases in indexing.sql that need adjustment.

...but whether you do that research or not, I still think this is a
very significant improvement. For those not wanting to open up the
patch:

- An index on ((a + 0)) now gets the name idxpart1_a_0_idx instead of
idxpart1_expr_idx.
- An index on ((a + b)) now gets the name idxpart_a_b_idx instead of
idxpart_expr_idx.
- An index on (abs(b)) now gets the name idxpart1_abs_b_idx instead of
idxpart1_abs_idx.

Personal taste certainly enters into the calculus here, but IMHO
calling everything BLAH_expr_idx because the topmost thing is some
kind of operator invocation is a really poor user experience, and even
for cases where the top-level thing is a function invocation,
including more than just the name of the top-level function in the
index name seems like a really significant improvement.

I think it would be good to press ahead with this. Admittedly, you
have gotten feedback from a limited number of people, but if you
commit this soon-ish, there will be lots of time to hear from people
who dislike it before we run out of time to rethink for v20. If the
consensus is that we revert it or change the algorithm, so be it, but
it doesn't seem like you will learn much more about what people think
without committing something.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: Improving the names generated for indexes on expressions

Robert Haas <robertmhaas@gmail.com> writes:

I think it would be good to press ahead with this. Admittedly, you
have gotten feedback from a limited number of people, but if you
commit this soon-ish, there will be lots of time to hear from people
who dislike it before we run out of time to rethink for v20. If the
consensus is that we revert it or change the algorithm, so be it, but
it doesn't seem like you will learn much more about what people think
without committing something.

Thanks for looking at it! I agree that pushing this shortly after
we branch would be ideal timing, so I'll plan on doing that.

regards, tom lane

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#15)
Re: Improving the names generated for indexes on expressions

On Tue, Jun 16, 2026 at 9:44 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 4, 2026 at 6:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I spent some time trying to parse out why I don't like that
suggestion, and eventually realized that it's this: I don't want
the behavior of CREATE INDEX to be dependent on every last detail
of ruleutils.c. I'm afraid that that'd introduce undesirable
cross-version changes in the names selected for indexes. Now,
considering this sort of change at all requires an assumption that
we can get away with breaking any applications that are sensitive
to that. But we only have to assume that we can get away with that
once. If we get ruleutils.c involved then I foresee a steady drip
drip drip of edge-case naming changes, and I think that will annoy
people.

Hmm, I don't know. I suppose you could go back and see whether making
these changes in the back-branches would produce different regression
test changes than making them against the master branch.

In any event, the cfbot has been nagging me that this patch needs
a rebase, so here's v3. The only change from v2 is that there are
some new test cases in indexing.sql that need adjustment.

...but whether you do that research or not, I still think this is a
very significant improvement. For those not wanting to open up the
patch:

- An index on ((a + 0)) now gets the name idxpart1_a_0_idx instead of
idxpart1_expr_idx.
- An index on ((a + b)) now gets the name idxpart_a_b_idx instead of
idxpart_expr_idx.
- An index on (abs(b)) now gets the name idxpart1_abs_b_idx instead of
idxpart1_abs_idx.

Personal taste certainly enters into the calculus here, but IMHO
calling everything BLAH_expr_idx because the topmost thing is some
kind of operator invocation is a really poor user experience, and even
for cases where the top-level thing is a function invocation,
including more than just the name of the top-level function in the
index name seems like a really significant improvement.

+1. I haven't followed this thread but the examples along with your
explanation for the same clearly sounds like an improvement to me.

--
With Regards,
Amit Kapila.