Improving the names generated for indexes on expressions
In the wake of the discussion around bug #18959 [1]/messages/by-id/18959-f63b53b864bb1417@postgresql.org, here is
a modest proposal for improving the names we pick for expression
indexes. The commit message explains the details, but this
example should give the flavor:
postgres=# create table mytab (f1 int, f2 text, f3 text);
CREATE TABLE
postgres=# create index on mytab(abs(f1 + 1));
CREATE INDEX
postgres=# create index on mytab((f2 || f3));
CREATE INDEX
postgres=# \d mytab
Table "public.mytab"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
f1 | integer | | |
f2 | text | | |
f3 | text | | |
Indexes:
"mytab_abs_f1_+_1_idx" btree (abs(f1 + 1))
"mytab_f2_||_f3_idx" btree ((f2 || f3))
Formerly you got:
"mytab_abs_idx" btree (abs(f1 + 1))
"mytab_expr_idx" btree ((f2 || f3))
There's plenty of room for differing opinions about how to do this,
so have at it.
regards, tom lane
Attachments:
v1-0001-Improve-the-names-generated-for-indexes-on-expres.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Improve-the-names-generated-for-indexes-on-expres.p; name*1=atchDownload+295-148
Hi
út 16. 9. 2025 v 3:57 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
In the wake of the discussion around bug #18959 [1], here is
a modest proposal for improving the names we pick for expression
indexes. The commit message explains the details, but this
example should give the flavor:postgres=# create table mytab (f1 int, f2 text, f3 text);
CREATE TABLE
postgres=# create index on mytab(abs(f1 + 1));
CREATE INDEX
postgres=# create index on mytab((f2 || f3));
CREATE INDEX
postgres=# \d mytab
Table "public.mytab"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
f1 | integer | | |
f2 | text | | |
f3 | text | | |
Indexes:
"mytab_abs_f1_+_1_idx" btree (abs(f1 + 1))
"mytab_f2_||_f3_idx" btree ((f2 || f3))Formerly you got:
"mytab_abs_idx" btree (abs(f1 + 1))
"mytab_expr_idx" btree ((f2 || f3))There's plenty of room for differing opinions about how to do this,
so have at it.
-1
I don't like the introduction of the necessity to use double quotes. If
somebody needs a better name, then he can use an explicit name.
Regards
Pavel
Show quoted text
regards, tom lane
On Monday, September 15, 2025, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the wake of the discussion around bug #18959 [1], here is
a modest proposal for improving the names we pick for expression
indexes. The commit message explains the details, but this
example should give the flavor:postgres=# create table mytab (f1 int, f2 text, f3 text);
CREATE TABLE
postgres=# create index on mytab(abs(f1 + 1));
CREATE INDEX
postgres=# create index on mytab((f2 || f3));
CREATE INDEX
postgres=# \d mytab
Table "public.mytab"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
f1 | integer | | |
f2 | text | | |
f3 | text | | |
Indexes:
"mytab_abs_f1_+_1_idx" btree (abs(f1 + 1))
"mytab_f2_||_f3_idx" btree ((f2 || f3))Formerly you got:
"mytab_abs_idx" btree (abs(f1 + 1))
"mytab_expr_idx" btree ((f2 || f3))There's plenty of room for differing opinions about how to do this,
so have at it.
If there are no function names present, output “expr” in lieu of a function
name. Then just output any columns that are present. No operators, no
constants. If multiple functions, exist output just the first one
encountered. I’d make an exception for a boolean constant and include
true/false as well.
mytab_abs_f1_idx
mytab_expr_f2_f3_idx
I fear consistently exceeding 63 bytes of identifier length if we choose to
display the entire expression in the name. And I find it unpleasant to
read, which is generally not good for a name - though index names are not
as visible so it’s not as strong a dislike. This seems like a reasonable
compromise that is likely to communicate the most salient aspects of an
expression. It does detract from the emphasis on operators we tend to
have, but it exactly those that make the name unpleasant.
David J.
On Tue, Sep 16, 2025 at 12:56 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
If there are no function names present, output “expr” in lieu of a function name. Then just output any columns that are present. No operators, no constants.
In the previous discussion, the user's expression indexes were on
these expressions:
jsondata -> 'a' -> 'b'
jsondata -> 'x' -> 'y'
So "no operators, no constants" wouldn't really allow us to make any
useful progress, inasmuch as it would throw away everything that
matters.
I am not really sure we want to do what Tom proposes here because, as
Pavel says, it would result in a lot of indexes containing special
characters in the name. But I do want us to try to find some way of
giving indexes on different expressions different names.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Sep 16, 2025 at 12:56 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:If there are no function names present, output “expr” in lieu of a function name. Then just output any columns that are present. No operators, no constants.
In the previous discussion, the user's expression indexes were on
these expressions:
jsondata -> 'a' -> 'b'
jsondata -> 'x' -> 'y'
So "no operators, no constants" wouldn't really allow us to make any
useful progress, inasmuch as it would throw away everything that
matters.
Precisely. It's exactly expression indexes on OpExprs that
FigureColumn is completely useless for; if we don't do something
for those then we've not moved the needle much. I initially
tried leaving out Consts, but that didn't work well on the
indexes in the regression tests, let alone the field example
Robert cites.
One thing I was thinking about after putting up the initial draft
was to suppress the underscores around operator names, so that
instead of
mytab_jsondata_->_a_->_b_idx
the above would net you
mytab_jsondata->a->b_idx
It's less consistent but looks less busy too.
I am not really sure we want to do what Tom proposes here because, as
Pavel says, it would result in a lot of indexes containing special
characters in the name.
Question is, why should we care about that?
regards, tom lane
"David G. Johnston" <david.g.johnston@gmail.com> writes:
I fear consistently exceeding 63 bytes of identifier length if we choose to
display the entire expression in the name.
I was worried about that too, but at least among our regression test
cases, there are none that come even close to 63 bytes under this
proposal. So I think the concern is overblown.
regards, tom lane
On Tuesday, September 16, 2025, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 16, 2025 at 12:56 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:If there are no function names present, output “expr” in lieu of a
function name. Then just output any columns that are present. No
operators, no constants.In the previous discussion, the user's expression indexes were on
these expressions:jsondata -> 'a' -> 'b'
jsondata -> 'x' -> 'y'So "no operators, no constants" wouldn't really allow us to make any
useful progress, inasmuch as it would throw away everything that
matters.
Wouldn’t mind special-casing json/jsonb in the algorithm:
tbl_col.const.const_idx
More generally, maybe map all the various common accessor operators to “.”
instead of using them directly and capture the constants when chained from
a column.
This forces the need for double-quoting but that doesn’t seem like
something we can really avoid.
Keeps the name short by collapsing the operator to a single symbol, and
removing single quotes from the constants.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
More generally, maybe map all the various common accessor operators to “.”
instead of using them directly and capture the constants when chained from
a column.
That seems fairly useless. You still have a name that requires
double quotes, and you can't tell one operator from another, and
you haven't even saved much space because few operator names are
longer than two or three characters.
(I have thought a little about truncating the contents of Consts
to maybe a dozen bytes for this purpose. Those seem much more
likely to be long...)
More generally, though, I absolutely object to giving the JSON
operators some kind of special privilege in this context.
That's totally not per Postgres style, and besides it's not
solving the problem as a whole, but just this one example.
regards, tom lane
On Tue, Sep 16, 2025 at 8:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Question is, why should we care about that?
It's a fair question, and I can't think of any hard-and-fast reason.
However, I suspect that some users may not like it; the quick -1 from
Pavel lends credence to that theory, IMHO. People do use DDL commands
to operate on indexes, so it's not crazy that quoting could be a
hassle. AFAICT, there's no precedent for the exact thing you've done
here: e.g. the column alias for generate_series(1,10) ends up as the
function name, without the arguments. On the other hand, I confess I'm
not sure we've made the right decision there: limiting it to just the
function name can help to keep the length reasonable, but it can also
often make the column name pretty meaningless, as when the expression
was something like round(interesting_calculation(whatever)) and you
end up with "round" as the column name.
Do you think there's any way of doing this usefully while not
autogenerating names that require quoting, or is that a hopeless
endeavor? If it's hopeless, is it better to accept autogenerated names
that require quoting, or is it better to solve the problem on the
other thread with something more like what you proposed there?
Maybe those are questions that you were hoping I was going to have an
answer to, so I'll tell you my bias. I would be inclined to smush any
series of quote-requiring characters that appear in the expression
down to a single underscore. If that results in name collisions, then
the user should consider specifying names for each partition
themselves, or writing the index expressions so they're less similar,
or not concurrently creating indexes on near-identical expressions.
Maybe that's too harsh a position, but this patch with that change
would still solve the complained-of case (because the alphanumeric
constants differ) and would I believe considerably improve the
intelligibility of index names derived from expression indexes. As
previously stated, I like distinguishing different index expressions
more than I like revising the naming convention for partition-child
indexes vs. indexes created directly on a child table. But of course,
these are all arguable positions.
--
Robert Haas
EDB: http://www.enterprisedb.com
[ Nobody has had any better ideas in 2 months, so back to this ... ]
Robert Haas <robertmhaas@gmail.com> writes:
Do you think there's any way of doing this usefully while not
autogenerating names that require quoting, or is that a hopeless
endeavor? If it's hopeless, is it better to accept autogenerated names
that require quoting, or is it better to solve the problem on the
other thread with something more like what you proposed there?
If people are dead set against double-quoting, I think the only way
forward is to not include operator names in the generated index names.
I do not think that's a better way personally, but I seem to be
outvoted. Here's a v2 that does it like that.
regards, tom lane
Attachments:
v2-0001-Improve-the-names-generated-for-indexes-on-expres.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Improve-the-names-generated-for-indexes-on-expres.p; name*1=atchDownload+265-144
On Sat, Nov 1, 2025 at 6:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
If people are dead set against double-quoting, I think the only way
forward is to not include operator names in the generated index names.
I do not think that's a better way personally, but I seem to be
outvoted. Here's a v2 that does it like that.
Is this better or worse than deparsing the whole expression to a
string and then smushing sequences of quote-requiring characters down
to a single underscore, as I suggested before?
I'm not saying what you've done is worse, but it does have the
disadvantage that you can only cover a few node types, whereas that
approach would cover every node type more or less automatically. Maybe
there are implementation difficulties, but I wouldn't think so... or
maybe you've got a scenario in mind where it would produce silly
results.
Nonwithstanding the above, I looked at the regression test cases and I
think all the changes are improvements. But, for example:
-ERROR: cannot alter type "test_type1" because column
"test_tbl1_idx.row" uses it
+ERROR: cannot alter type "test_type1" because column
"test_tbl1_idx.x_y" uses it
This would come out complaining about test_tbl1_idx_row_x_y_test_type1
with the above proposal. I like the inclusion of row. The inclusion of
test_type1 I like less well, but I'd accept it in the name of
uniformity. However, I don't feel strongly about it -- if you like
what you've done here, I'd say go for it.
--
Robert Haas
EDB: http://www.enterprisedb.com