When manual analyze is needed
Hi,
We see in one of the RDS postgres instances, from pg_stat_user_tables , the
auto vacuum and auto analyze happening on the tables without our manual
intervention.
So is auto vacuum analyze is sufficient to make sure optimal stats and
unbloated table structure in place or should we do it manually based on the
type of tables like
for e.g.
if a table is having just insert only types and getting similar amount of
data throughout the day 24*7
VS
a table which is getting bulk data load once a day only
VS
a volatile table with truncate load kind of operation
VS
a table with heavy Update/deletes also happening along with inserts
throughout the day 24*7.
Will auto vacuum/analyze take care of all such tables, or we need to do it
manually in certain scenarios?
And if any suboptimal plan is taken by the optimizer for a specific query
suddenly, because of the missing stats , how to catch that. Basically, how
to get the old plan hash and get it compared with the current plan hash?
Regards
Veem
On Mon, 2024-03-04 at 01:33 +0530, veem v wrote:
We see in one of the RDS postgres instances, from pg_stat_user_tables ,
the auto vacuum and auto analyze happening on the tables without our
manual intervention.
That's exactly the idea behind autovacuum.
So is auto vacuum analyze is sufficient to make sure optimal stats
and unbloated table structure in place
Yes, it is sufficient. If you have a busy database, you may have to
tune autovacuum to keep up.
The only things that require manual ANALYZE are
1. partitioned tables (autoanalyze will collect statistics on the
partitions, but not the partitioned table itself)
2. after you create an index on an expression (otherwise you have to
wait until autoanalyze runs to get statistics on the indexed
expression)
Yours,
Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes:
The only things that require manual ANALYZE are
...
You missed one important exception: autovacuum/autoanalyze cannot
process temporary tables, because those are not accessible outside
the owning session. So you need to do those manually if it's
important for performance.
regards, tom lane
On Mon, 4 Mar 2024 at 09:42, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
The only things that require manual ANALYZE are
1. partitioned tables (autoanalyze will collect statistics on the
partitions, but not the partitioned table itself)
So the partitioned table stats is nothing but the rolledover stats of all
the partitions. As you mentioned, autoanalyze only works for child
partitions but not the partition tables, so does it mean we have to
schedule some jobs manually(through some scheduletr like pg_cron) to
analyze these partitioned tables at certain time intervals to keep those up
to date for partitioned tables? And won't that scan all the child
partitions again to have the stats aggregated/rolled over from all the
child partitions level to the table level?
Additionally if a query was working fine but suddenly takes a
suboptimal plan because of missing stats , do we have any hash value column
on any performance view associated with the queryid which we can refer to
see past vs current plans difference and identify such issues quickly and
fix it?
I am not seeing any such column in pg_stat_activity or pg_stat_statements
to hold hash value of the plan and also the query column is showing
"<insufficient privilege>" for many of the entries, why so?
On Mon, 2024-03-04 at 10:16 +0530, veem v wrote:
So the partitioned table stats is nothing but the rolledover stats of all the partitions.
As you mentioned, autoanalyze only works for child partitions but not the partition tables,
so does it mean we have to schedule some jobs manually(through some scheduletr like pg_cron)
to analyze these partitioned tables at certain time intervals to keep those up to date for
partitioned tables?
Something like that, yes.
And won't that scan all the child partitions again to have the stats aggregated/rolled
over from all the child partitions level to the table level?
Yes.
Additionally if a query was working fine but suddenly takes a suboptimal plan because
of missing stats , do we have any hash value column on any performance view associated
with the queryid which we can refer to see past vs current plans difference and identify
such issues quickly and fix it?
Not that I know of.
Yours,
Laurenz Albe
On Mon, Mar 4, 2024 at 12:23 AM veem v <veema0000@gmail.com> wrote:
Additionally if a query was working fine but suddenly takes a
suboptimal plan because of missing stats , do we have any hash value column
on any performance view associated with the queryid which we can refer to
see past vs current plans difference and identify such issues quickly and
fix it?
You can use auto_explain; nothing else tracks things at that fine a level.
You can use pg_stat_statements to track the average and max time for each
query. Save and reset periodically to make it more useful.
https://www.postgresql.org/docs/current/auto-explain.html
https://www.postgresql.org/docs/current/pgstatstatements.html
I am not seeing any such column in pg_stat_activity or pg_stat_statements
to hold hash value of the plan and also the query column is showing
"<insufficient privilege>" for many of the entries, why so?
Ordinary users are not allowed to see what other people are running. You
can add a user to the pg_read_all_stats role to allow this:
GRANT pg_read_all_stats TO alice;
Oftentimes someone needing access to the stats also needs a little more
access, so consider the pg_monitor role as well. Both are documented here:
https://www.postgresql.org/docs/current/predefined-roles.html
Cheers,
Greg
On Mon, 4 Mar 2024 at 21:46, Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Mon, Mar 4, 2024 at 12:23 AM veem v <veema0000@gmail.com> wrote:
Additionally if a query was working fine but suddenly takes a
suboptimal plan because of missing stats , do we have any hash value column
on any performance view associated with the queryid which we can refer to
see past vs current plans difference and identify such issues quickly and
fix it?You can use auto_explain; nothing else tracks things at that fine a level.
You can use pg_stat_statements to track the average and max time for each
query. Save and reset periodically to make it more useful.https://www.postgresql.org/docs/current/auto-explain.html
https://www.postgresql.org/docs/current/pgstatstatements.html
Thank you so much Greg. That helps.
We were planning to have the auto_explain extension added and set the
log_min_duration to ~5 seconds and log_analyze to true. So that all the
queries going above that time period will be logged and provide detailed
information on the exact point of bottleneck. Will it be a good idea to set
it on production DB which is a highly active database? or should we only
have the extension added but only set the parameters while we debug some
performance issue and then reset it back after we are done.
We were planning to have the auto_explain extension added and set the
log_min_duration to ~5 seconds and log_analyze to true. So that all the
queries going above that time period will be logged and provide detailed
information on the exact point of bottleneck. Will it be a good idea to set
it on production DB which is a highly active database? or should we only
have the extension added but only set the parameters while we debug some
performance issue and then reset it back after we are done.
I would not use log_analyze on a highly active production db. Even on a dev
system, use it carefully as it has some downsides. The log_min_duration of
5s should be fine everywhere, however.
Cheers,
Greg