Add important info about ANALYZE after create Functional Index

Started by Fabrízio de Royes Melloover 5 years ago40 messageshackers
Jump to latest
#1Fabrízio de Royes Mello
fabriziomello@gmail.com

Hi all,

As you all already know Postgres supports functions in index expressions
(marked as immutable ofc) and for this special index the ANALYZE command
creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently
(using the new REINDEX .. CONCURRENTLY or the old manner creating new one
and then swapping) we need to run ANALYZE to update statistics but we don't
mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after
rebuilding concurrently a functional index and they followed the
recommendation we have into our documentation [1]https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499 about how to rebuild it
concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1]: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Fabrízio de Royes Mello (#1)
Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <
fabriziomello@gmail.com> wrote:

Hi all,

As you all already know Postgres supports functions in index expressions
(marked as immutable ofc) and for this special index the ANALYZE command
creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently
(using the new REINDEX .. CONCURRENTLY or the old manner creating new one
and then swapping) we need to run ANALYZE to update statistics but we don't
mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after
rebuilding concurrently a functional index and they followed the
recommendation we have into our documentation [1] about how to rebuild it
concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1]
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

David J.

#3Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: David G. Johnston (#2)
Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 3:46 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

+1 to consider it as a major problem of CREATE INDEX [CONCURRENTLY] for
indexes on expressions, it's very easy to forget what I've observed many
times.

Although, this triggers a question – should ANALYZE be automated in, say,
pg_restore as well?

And another question: how ANALYZE needs to be run? If it's under the
user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

Thanks,
Nik

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Nikolay Samokhvalov (#3)
Re: Add important info about ANALYZE after create Functional Index

On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

Although, this triggers a question – should ANALYZE be automated in, say,
pg_restore as well?

Independent concern.

And another question: how ANALYZE needs to be run? If it's under the
user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is
what to specify for “table_and_columns”.

David J.

#5Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: David G. Johnston (#4)
Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

Although, this triggers a question – should ANALYZE be automated in, say,
pg_restore as well?

Independent concern.

It's the same class of issues – after we created some objects, we lack
statistics and willing to automate its collection. If the approach is
automated in one case, it should be automated in the others, for
consistency.

And another question: how ANALYZE needs to be run? If it's under the

user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is
what to specify for “table_and_columns”.

I'm not sure I follow.

Thanks,
Nik

#6Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Fabrízio de Royes Mello (#1)
Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <
fabriziomello@gmail.com> wrote:

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Attaching the patches for the docs, one for 11 and older, and another for
12+ (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).

I still think that automating is the right thing to do but of course, it's
a much bigger topic that a quick fix dor the docs.

Attachments:

func-index-analyze.pre12.patchapplication/x-patch; name=func-index-analyze.pre12.patchDownload+12-4
func-index-analyze.master.patchapplication/x-patch; name=func-index-analyze.master.patchDownload+9-0
#7Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: David G. Johnston (#2)
Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 7:46 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

It would seem preferable to call the lack of auto-analyzing after these

operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

When we create a new table or index they will not have statistics until an
ANALYZE happens. This is the default behaviour and I think is not a big
problem here, but we need to add some note on docs about the need of
statistics for indexes on expressions.

But IMHO there is a misbehaviour with the implementation of CONCURRENTLY on
REINDEX because running it will lose the statistics. Have a look the
example below:

fabrizio=# SELECT version();
version

---------------------------------------------------------------------------------------------------------
PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
(1 row)

fabrizio=# CREATE TABLE t(f1 BIGSERIAL PRIMARY KEY, f2 TEXT) WITH
(autovacuum_enabled = false);
CREATE TABLE
fabrizio=# INSERT INTO t(f2) SELECT repeat(chr(65+(random()*26)::int),
(random()*300)::int) FROM generate_series(1, 10000);
INSERT 0 10000
fabrizio=# CREATE INDEX t_idx2 ON t(lower(f2));
CREATE INDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_idx2'::regclass;
count
-------
0
(1 row)

fabrizio=# ANALYZE t;
ANALYZE
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_idx2'::regclass;
count
-------
1
(1 row)

fabrizio=# REINDEX INDEX t_idx2;
REINDEX
fabrizio=# REINDEX INDEX t_pkey;
REINDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_idx2'::regclass;
count
-------
1
(1 row)
^^^^^^^^
-- A regular REINDEX don't lose the statistics.

fabrizio=# REINDEX INDEX CONCURRENTLY t_idx2;
REINDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_idx2'::regclass;
count
-------
0
(1 row)

^^^^^^^^
-- But the REINDEX CONCURRENTLY loses.

So IMHO here is the place we should rework a bit to execute ANALYZE as a
last step.

Regards,

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

#8Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Nikolay Samokhvalov (#6)
Re: Add important info about ANALYZE after create Functional Index

On Tue, Oct 27, 2020 at 4:12 AM Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <

fabriziomello@gmail.com> wrote:

Would be nice if add some information about it into our docs but not

sure where. I'm thinking about:

- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Attaching the patches for the docs, one for 11 and older, and another for

12+ (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).

Actually the REINDEX CONCURRENTLY suffers with the lack of ANALYZE. See my
previous message on this thread.

So just adding the note on the ANALYZE docs is enough.

I still think that automating is the right thing to do but of course,

it's a much bigger topic that a quick fix dor the docs.

So what we need to do is see how to fix REINDEX CONCURRENTLY.

Regards,

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

#9Michael Paquier
michael@paquier.xyz
In reply to: Fabrízio de Royes Mello (#7)
Re: Add important info about ANALYZE after create Functional Index

On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabrízio de Royes Mello wrote:

When we create a new table or index they will not have statistics until an
ANALYZE happens. This is the default behaviour and I think is not a big
problem here, but we need to add some note on docs about the need of
statistics for indexes on expressions.

But IMHO there is a misbehaviour with the implementation of CONCURRENTLY on
REINDEX because running it will lose the statistics. Have a look the
example below:

[...]

So IMHO here is the place we should rework a bit to execute ANALYZE as a
last step.

I agree that this is not user-friendly, and I suspect that we will
need to do something within index_concurrently_swap() to fill in the
stats of the new index from the data of the old one (not looked at
that in details yet).
--
Michael

#10Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Michael Paquier (#9)
Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 2:15 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabrízio de Royes Mello wrote:

When we create a new table or index they will not have statistics until

an

ANALYZE happens. This is the default behaviour and I think is not a big
problem here, but we need to add some note on docs about the need of
statistics for indexes on expressions.

But IMHO there is a misbehaviour with the implementation of

CONCURRENTLY on

REINDEX because running it will lose the statistics. Have a look the
example below:

[...]

So IMHO here is the place we should rework a bit to execute ANALYZE as a
last step.

I agree that this is not user-friendly, and I suspect that we will
need to do something within index_concurrently_swap() to fill in the
stats of the new index from the data of the old one (not looked at
that in details yet).

We already do a similar thing for PgStats [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/index.c;h=0974f3e23a23726b63246cd3a1347e10923dd541;hb=HEAD#l1693 so maybe we should also copy
pg_statistics from old to new index during the swap.

But I'm not sure if it's totally safe anyway and would be better to create
a new phase to issue ANALYZE if necessary (exists statistics for old index).

Regards,

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/index.c;h=0974f3e23a23726b63246cd3a1347e10923dd541;hb=HEAD#l1693
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/index.c;h=0974f3e23a23726b63246cd3a1347e10923dd541;hb=HEAD#l1693

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Fabrízio de Royes Mello (#7)
Re: Add important info about ANALYZE after create Functional Index

On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabr�zio de Royes Mello wrote:

On Mon, Oct 26, 2020 at 7:46 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

It would seem preferable to call the lack of auto-analyzing after these

operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

When we create a new table or index they will not have statistics until an
ANALYZE happens. This is the default behaviour and I think is not a big
problem here, but we need to add some note on docs about the need of
statistics for indexes on expressions.

I think the problem is we notice when a table has not been analyzed yet
(and trigger an analyze), but we won't notice that for an index. So if
the table does not change very often, it may take ages before we build
stats for the index - not great.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David G. Johnston (#2)
Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:

On Mon, Oct 26, 2020 at 3:08 PM Fabr�zio de Royes Mello <
fabriziomello@gmail.com> wrote:

Hi all,

As you all already know Postgres supports functions in index expressions
(marked as immutable ofc) and for this special index the ANALYZE command
creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently
(using the new REINDEX .. CONCURRENTLY or the old manner creating new one
and then swapping) we need to run ANALYZE to update statistics but we don't
mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after
rebuilding concurrently a functional index and they followed the
recommendation we have into our documentation [1] about how to rebuild it
concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1]
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13David G. Johnston
david.g.johnston@gmail.com
In reply to: Tomas Vondra (#12)
Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 11:55 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

My guess is that it wouldn't be too difficult to write a patch that could
be safely back-patched and it's worth doing so even if ultimately the
decision is not to. But then again the patch writer isn't going to be me.

Given how simple the manual workaround is not having it be manual seems
like it would be safe and straight-forward to implement.

David J.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#12)
Re: Add important info about ANALYZE after create Functional Index

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

This doesn't seem clearly different from any other situation where
auto-analyze doesn't react fast enough to suit you. I would not
call it a bug, at least not without a wholesale redefinition of
how auto-analyze is supposed to work. As a close analogy, we
don't make any effort to force an immediate auto-analyze after
CREATE STATISTICS.

I don't see anything in the CREATE STATISTICS man page pointing
that out, either. But there's probably room for "Notes" entries
about it in both places.

regards, tom lane

#15David G. Johnston
david.g.johnston@gmail.com
In reply to: Nikolay Samokhvalov (#5)
Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 9:44 PM Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

Although, this triggers a question – should ANALYZE be automated in,
say, pg_restore as well?

Independent concern.

It's the same class of issues – after we created some objects, we lack
statistics and willing to automate its collection. If the approach is
automated in one case, it should be automated in the others, for
consistency.

I don't see a need to force consistency between something that will affect,
at most, one table, and something that will affect an entire database or
cluster. The other material difference is that the previous state of a
restore is "nothing" while in the create/reindex cases we are going from
live, populated, state to another.

I do observe that while the create/reindex analyze would run automatically
during the restore on object creation there would be no data present so it
would be close to a no-op in practice.

And another question: how ANALYZE needs to be run? If it's under the

user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is
what to specify for “table_and_columns”.

I'm not sure I follow.

Describe how parallelism within the session that is auto-analyzing is
supposed to work. vaccuumdb opens up multiple connections which shouldn't
happen here.

I suppose having the auto-analyze run three times with different targets
would work but I'm doubting that is a win. I may just be underestimating
how long an analyze on an extremely large table with high statistics takes.

David J.

#16Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David G. Johnston (#13)
Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:

On Wed, Oct 28, 2020 at 11:55 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

My guess is that it wouldn't be too difficult to write a patch that could
be safely back-patched and it's worth doing so even if ultimately the
decision is not to. But then again the patch writer isn't going to be me.

Given how simple the manual workaround is not having it be manual seems
like it would be safe and straight-forward to implement.

Maybe, but I wouldn't be surprised if it was actually a bit trickier in
practice, particularly for the CONCURRENTLY case. But I haven't tried.

Anyway, I think there's an agreement it'd be valuable to do this after
CREATE INDEX in the future, so if someone wants to implement it that'd
be great. We can consider backpatching only once we have an actual patch
anyway.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#14)
Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 03:05:39PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

This doesn't seem clearly different from any other situation where
auto-analyze doesn't react fast enough to suit you. I would not
call it a bug, at least not without a wholesale redefinition of
how auto-analyze is supposed to work. As a close analogy, we
don't make any effort to force an immediate auto-analyze after
CREATE STATISTICS.

True.

I don't see anything in the CREATE STATISTICS man page pointing
that out, either. But there's probably room for "Notes" entries
about it in both places.

I agree. I'll add it to my TODO list for the next CF.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#14)
Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 12:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This doesn't seem clearly different from any other situation where
auto-analyze doesn't react fast enough to suit you.

I would not
call it a bug, at least not without a wholesale redefinition of
how auto-analyze is supposed to work.

The definition of auto-analyze is just fine; the issue is with the user
unfriendly position that the only times analyze is ever run is when it is
run manually or heuristically in a separate process. I agree that this
isn't a bug in the traditional sense - the current behavior is intentional
- but it is a POLA violation.

The fundamental question here is do we want to change our policy in this
regard and make our system more user-friendly? If so, let's do so for v14
in honor of the problem the lack of documentation and POLA violation has
recently caused.

Then, as a separate concern, should we admit the oversight and back-patch
our policy change or just move forward and add documentation to older
versions?

As a close analogy, we
don't make any effort to force an immediate auto-analyze after
CREATE STATISTICS.

At least we have been consistent...

David J.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#16)
Re: Add important info about ANALYZE after create Functional Index

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:

Given how simple the manual workaround is not having it be manual seems
like it would be safe and straight-forward to implement.

Maybe, but I wouldn't be surprised if it was actually a bit trickier in
practice, particularly for the CONCURRENTLY case. But I haven't tried.

Anyway, I think there's an agreement it'd be valuable to do this after
CREATE INDEX in the future, so if someone wants to implement it that'd
be great. We can consider backpatching only once we have an actual patch
anyway.

Just to be clear, I'm entirely *not* on board with that. IMV it's
intentional that we do not force auto-analyze activity after CREATE
INDEX or CREATE STATISTICS. If we change that, people will want a
way to opt out of it, and then your "simple" patch isn't so simple
anymore. (Not that it was simple anyway. What if the CREATE is
inside a transaction block, for instance? There's no use in
kicking autovacuum before commit.)

regards, tom lane

#20Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#19)
Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 03:18:52PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:

Given how simple the manual workaround is not having it be manual seems
like it would be safe and straight-forward to implement.

Maybe, but I wouldn't be surprised if it was actually a bit trickier in
practice, particularly for the CONCURRENTLY case. But I haven't tried.

Anyway, I think there's an agreement it'd be valuable to do this after
CREATE INDEX in the future, so if someone wants to implement it that'd
be great. We can consider backpatching only once we have an actual patch
anyway.

Just to be clear, I'm entirely *not* on board with that. IMV it's
intentional that we do not force auto-analyze activity after CREATE
INDEX or CREATE STATISTICS. If we change that, people will want a
way to opt out of it, and then your "simple" patch isn't so simple
anymore.

True. Some users may have reasons to not want the analyze, I guess.

(Not that it was simple anyway. What if the CREATE is
inside a transaction block, for instance? There's no use in
kicking autovacuum before commit.)

I don't think anyone proposed to do this through autovacuum. There was a
reference to auto-analyze but I think that was meant as 'run analyze
automatically.' Which would work in transactions just fine, I think.

But I agree it'd likely be a more complicated patch than it might seem
at first glance.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Tomas Vondra (#20)
#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Fabrízio de Royes Mello (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#23)
#25Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#24)
#26Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Michael Paquier (#24)
#27Michael Paquier
michael@paquier.xyz
In reply to: Fabrízio de Royes Mello (#26)
#28Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#25)
#29Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#27)
#31Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Michael Paquier (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Nikolay Samokhvalov (#6)
#33Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Bruce Momjian (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: Fabrízio de Royes Mello (#33)
#35Justin Pryzby
pryzby@telsasoft.com
In reply to: Bruce Momjian (#32)
#36Bruce Momjian
bruce@momjian.us
In reply to: Justin Pryzby (#35)
#37Justin Pryzby
pryzby@telsasoft.com
In reply to: Bruce Momjian (#36)
#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#36)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Justin Pryzby (#37)
#40Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#38)