Dump statistic issue with index on expressions

Started by Maksim.Melnikov4 months ago4 messageshackers
Jump to latest
#1Maksim.Melnikov
m.melnikov@postgrespro.ru

Hi hackers.
There is an issue on new feature dump statistics related to index
processing.
In case when table has more then one index and if one of them is index
on expressions
we can get error like this:

pg_dump --verbose --statistics-only > /dev/null
...
pg_dump: reading subscriptions
pg_dump: reading subscription membership of tables
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving "standard_conforming_strings = on"
pg_dump: saving "search_path = "
pg_dump: error: could not find index attname "source_system"

For clarity, schema ddl attached

CREATE TABLE test_table_stats (
    id uuid NOT NULL,
    body jsonb,
    source_system character varying,
    source_id character varying,
    model_name character varying NOT NULL
);

CREATE INDEX test_table_stats_source_system_text ON test_table_stats
USING btree (upper((source_system)::text));
CREATE UNIQUE INDEX test_table_stats_json_system_un ON test_table_stats
USING btree (source_system, source_id, model_name);

When pg_dump sequentially process indexes in case when index is
processed after index on expression,
it can use index attrs names of previously processed index. I've
attached simple patch to fix it.

Best regards
Melnikov Maksim

Attachments:

0001-Dump-statistic-issue-with-index-on-expressions.patchtext/x-patch; charset=UTF-8; name=0001-Dump-statistic-issue-with-index-on-expressions.patchDownload+2-3
#2Alexander Korotkov
aekorotkov@gmail.com
In reply to: Maksim.Melnikov (#1)
Re: Dump statistic issue with index on expressions

Hi, Maksim!

On Wed, Feb 25, 2026 at 6:37 PM Maksim.Melnikov
<m.melnikov@postgrespro.ru> wrote:

There is an issue on new feature dump statistics related to index
processing.
In case when table has more then one index and if one of them is index
on expressions
we can get error like this:

pg_dump --verbose --statistics-only > /dev/null
...
pg_dump: reading subscriptions
pg_dump: reading subscription membership of tables
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving "standard_conforming_strings = on"
pg_dump: saving "search_path = "
pg_dump: error: could not find index attname "source_system"

For clarity, schema ddl attached

CREATE TABLE test_table_stats (
id uuid NOT NULL,
body jsonb,
source_system character varying,
source_id character varying,
model_name character varying NOT NULL
);

CREATE INDEX test_table_stats_source_system_text ON test_table_stats
USING btree (upper((source_system)::text));
CREATE UNIQUE INDEX test_table_stats_json_system_un ON test_table_stats
USING btree (source_system, source_id, model_name);

When pg_dump sequentially process indexes in case when index is
processed after index on expression,
it can use index attrs names of previously processed index. I've
attached simple patch to fix it.

I see this is a bug indeed: an index with no expression can get its
indAttNames and nindAttNames from the previous index. But I didn't
manage to reproduce your case. dumpRelationStats_dumper() only
iterates indexes with pg_stats entry, and those are indexes with
expressions. Could you give more details on how did you reproduce
user-facing error? Which particular git commit did you use? How did
you fill the database step by step?

------
Regards,
Alexander Korotkov
Supabase

#3Maksim.Melnikov
m.melnikov@postgrespro.ru
In reply to: Alexander Korotkov (#2)
Re: Dump statistic issue with index on expressions

Alexander, hello!

On 5/11/26 02:42, Alexander Korotkov wrote:

Hi, Maksim!

On Wed, Feb 25, 2026 at 6:37 PM Maksim.Melnikov
<m.melnikov@postgrespro.ru> wrote:

There is an issue on new feature dump statistics related to index
processing.
In case when table has more then one index and if one of them is index
on expressions
we can get error like this:

pg_dump --verbose --statistics-only > /dev/null
...
pg_dump: reading subscriptions
pg_dump: reading subscription membership of tables
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving "standard_conforming_strings = on"
pg_dump: saving "search_path = "
pg_dump: error: could not find index attname "source_system"

For clarity, schema ddl attached

CREATE TABLE test_table_stats (
id uuid NOT NULL,
body jsonb,
source_system character varying,
source_id character varying,
model_name character varying NOT NULL
);

CREATE INDEX test_table_stats_source_system_text ON test_table_stats
USING btree (upper((source_system)::text));
CREATE UNIQUE INDEX test_table_stats_json_system_un ON test_table_stats
USING btree (source_system, source_id, model_name);

When pg_dump sequentially process indexes in case when index is
processed after index on expression,
it can use index attrs names of previously processed index. I've
attached simple patch to fix it.

I see this is a bug indeed: an index with no expression can get its
indAttNames and nindAttNames from the previous index. But I didn't
manage to reproduce your case. dumpRelationStats_dumper() only
iterates indexes with pg_stats entry, and those are indexes with
expressions. Could you give more details on how did you reproduce
user-facing error? Which particular git commit did you use? How did
you fill the database step by step?

------
Regards,
Alexander Korotkov
Supabase

Sorry for delay, I've tried to reproduce this issue for some time and
detect that it isn't reproducable on vanilla postgresql.
It seems, that our fork generate more records for pg_statistics unlike
vanilla, that generate records only for indexes with expressions,
more details can be found here in src/backend/commands/analyze.c

static void
do_analyze_rel(Relation onerel, VacuumParams *params,
               List *va_cols, AcquireSampleRowsFunc acquirefunc,
               BlockNumber relpages, bool inh, bool in_outer_xact,
               int elevel)
{
......
            thisdata->tupleFract = 1.0; /* fix later if partial */
            if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
            {
                ListCell   *indexpr_item =
list_head(indexInfo->ii_Expressions);

                thisdata->vacattrstats = (VacAttrStats **)
                    palloc(indexInfo->ii_NumIndexAttrs *
sizeof(VacAttrStats *));
.....
                thisdata->attr_cnt = tcnt;

Anyway, it seems code, reported before, isn't ideal and prone to bugs.
In my opinion better fix it.
Hope it will be helpful.

Best regards,
Maksim Melnikov

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Maksim.Melnikov (#3)
Re: Dump statistic issue with index on expressions

Hi, Maksim!

On Wed, May 20, 2026 at 12:52 PM Maksim.Melnikov
<m.melnikov@postgrespro.ru> wrote:

On 5/11/26 02:42, Alexander Korotkov wrote:

On Wed, Feb 25, 2026 at 6:37 PM Maksim.Melnikov
<m.melnikov@postgrespro.ru> wrote:

There is an issue on new feature dump statistics related to index
processing.
In case when table has more then one index and if one of them is index
on expressions
we can get error like this:

pg_dump --verbose --statistics-only > /dev/null
...
pg_dump: reading subscriptions
pg_dump: reading subscription membership of tables
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving "standard_conforming_strings = on"
pg_dump: saving "search_path = "
pg_dump: error: could not find index attname "source_system"

For clarity, schema ddl attached

CREATE TABLE test_table_stats (
id uuid NOT NULL,
body jsonb,
source_system character varying,
source_id character varying,
model_name character varying NOT NULL
);

CREATE INDEX test_table_stats_source_system_text ON test_table_stats
USING btree (upper((source_system)::text));
CREATE UNIQUE INDEX test_table_stats_json_system_un ON test_table_stats
USING btree (source_system, source_id, model_name);

When pg_dump sequentially process indexes in case when index is
processed after index on expression,
it can use index attrs names of previously processed index. I've
attached simple patch to fix it.

I see this is a bug indeed: an index with no expression can get its
indAttNames and nindAttNames from the previous index. But I didn't
manage to reproduce your case. dumpRelationStats_dumper() only
iterates indexes with pg_stats entry, and those are indexes with
expressions. Could you give more details on how did you reproduce
user-facing error? Which particular git commit did you use? How did
you fill the database step by step?

------
Regards,
Alexander Korotkov
Supabase

Sorry for delay, I've tried to reproduce this issue for some time and
detect that it isn't reproducable on vanilla postgresql.
It seems, that our fork generate more records for pg_statistics unlike
vanilla, that generate records only for indexes with expressions,
more details can be found here in src/backend/commands/analyze.c

static void
do_analyze_rel(Relation onerel, VacuumParams *params,
List *va_cols, AcquireSampleRowsFunc acquirefunc,
BlockNumber relpages, bool inh, bool in_outer_xact,
int elevel)
{
......
thisdata->tupleFract = 1.0; /* fix later if partial */
if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
{
ListCell *indexpr_item =
list_head(indexInfo->ii_Expressions);

thisdata->vacattrstats = (VacAttrStats **)
palloc(indexInfo->ii_NumIndexAttrs *
sizeof(VacAttrStats *));
.....
thisdata->attr_cnt = tcnt;

Anyway, it seems code, reported before, isn't ideal and prone to bugs.
In my opinion better fix it.
Hope it will be helpful.

Thank you. Now this is clear. No user-facing error, but an internal
inconsistency. I'm going to push (and backpatch) this after release
freeze is lifted.

------
Regards,
Alexander Korotkov
Supabase