mat views stats

Started by Jim Mlodgenskiabout 9 years ago12 messageshackers
Jump to latest
#1Jim Mlodgenski
jimmy76@gmail.com

I've come across a number of times where the statistics on materialized
views become stale producing bad plans. It turns out that autovacuum only
touches a materialized view when it is first created and ignores it on a
refresh. When you have a materialized view like yesterdays_sales the data
in the materialized view turns over every day.

Attached is a patch to trigger autovacuum based on a matview refresh along
with a system view pg_stat_all_matviews to show information more meaningful
for materialized views.

-- Jim

Attachments:

stat_matview.patchtext/x-patch; charset=US-ASCII; name=stat_matview.patchDownload+288-3
#2Peter Eisentraut
peter_e@gmx.net
In reply to: Jim Mlodgenski (#1)
Re: mat views stats

On 2/20/17 10:06, Jim Mlodgenski wrote:

I've come across a number of times where the statistics on materialized
views become stale producing bad plans. It turns out that autovacuum
only touches a materialized view when it is first created and ignores it
on a refresh. When you have a materialized view like yesterdays_sales
the data in the materialized view turns over every day.

That sounds like a bug.

Attached is a patch to trigger autovacuum based on a matview refresh
along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

I think these should be two separate patches. We might want to
backpatch the first one.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Eisentraut (#2)
Re: mat views stats

On 2/21/17 4:22 PM, Peter Eisentraut wrote:

Attached is a patch to trigger autovacuum based on a matview refresh
along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

Certainly easier, but I don't think it'd be better. Matviews really
aren't the same thing as tables. Off-hand (without reviewing the patch),
update and delete counts certainly wouldn't make any sense. "Insert"
counts might, in as much as it's how many rows have been added by
refreshes. You'd want a refresh count too.

I think these should be two separate patches. We might want to
backpatch the first one.

+1; definitely sounds like a bug to me.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Jim Mlodgenski
jimmy76@gmail.com
In reply to: Jim Nasby (#3)
Re: mat views stats

On Wed, Feb 22, 2017 at 12:43 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

On 2/21/17 4:22 PM, Peter Eisentraut wrote:

Attached is a patch to trigger autovacuum based on a matview refresh

along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

Certainly easier, but I don't think it'd be better. Matviews really aren't
the same thing as tables. Off-hand (without reviewing the patch), update
and delete counts certainly wouldn't make any sense. "Insert" counts might,
in as much as it's how many rows have been added by refreshes. You'd want a
refresh count too.

Matviews already show up in the pg_stat_*_tables and the patch does
leverage the existing pg_stat_*_tables underlying structure, but it creates
more meaningful pg_stat_*_matviews leaving out things like insert and
update counts.

I think these should be two separate patches. We might want to

backpatch the first one.

I was originally thinking 2 patches, but I couldn't think of a way to
trigger the analyze reliably without adding a refresh count or sending
bogus stats. We can certainly send a stats message containing the number of
rows inserted by the refresh, but are we going to also send the number of
deletes as well? Consider a matview that has month to date data. At the end
of the month, there will be about 30n live tuples. The next day on the new
month, there will be n inserts with the stats thinking there are 30n live
tuples which is below the analyze scale factor. We want to analyze the
matview on the first of the day of the new month, but it wouldn't be
triggered for a few days. We can have REFRESH also track live tuples, but
it was quickly becoming a slippery slope of changing behavior for a back
patch. Maybe that's OK and we can go down that road.

We can back patch some documentation about the existing refresh behavior
with autovacuum.

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Jim Mlodgenski (#4)
Re: mat views stats

On 2/22/17 06:31, Jim Mlodgenski wrote:

Matviews already show up in the pg_stat_*_tables and the patch does
leverage the existing pg_stat_*_tables underlying structure, but it
creates more meaningful pg_stat_*_matviews leaving out things like
insert and update counts.

But fields like seq_scans and last_analyze are then redundant between
the *_tables view and the *_matviews view. Maybe it would make more
sense to introduce a new view like you propose and not show them in
*_tables anymore?

I was originally thinking 2 patches, but I couldn't think of a way to
trigger the analyze reliably without adding a refresh count or sending
bogus stats. We can certainly send a stats message containing the number
of rows inserted by the refresh, but are we going to also send the
number of deletes as well? Consider a matview that has month to date
data. At the end of the month, there will be about 30n live tuples. The
next day on the new month, there will be n inserts with the stats
thinking there are 30n live tuples which is below the analyze scale
factor. We want to analyze the matview on the first of the day of the
new month, but it wouldn't be triggered for a few days. We can have
REFRESH also track live tuples, but it was quickly becoming a slippery
slope of changing behavior for a back patch. Maybe that's OK and we can
go down that road.

For those not reading the patch, it introduces a new reloption
autovacuum_analyze_refresh_threshold that determines when to autoanalyze
a materialized view.

What behavior would we like by default? Refreshing a materialized view
is a pretty expensive operation, so I think scheduling an analyze quite
aggressively right afterwards is often what you want.

I think sending a stats message with the number of inserted rows could
make sense.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Eisentraut (#5)
Re: mat views stats

On 2/22/17 7:56 AM, Peter Eisentraut wrote:

What behavior would we like by default? Refreshing a materialized view
is a pretty expensive operation, so I think scheduling an analyze quite
aggressively right afterwards is often what you want.

I think sending a stats message with the number of inserted rows could
make sense.

+1 on both counts. And if sane analyze behavior does depend on the stats 
changes then there's no real advantage to a separate patch.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#3)
Re: mat views stats

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Certainly easier, but I don't think it'd be better. Matviews really aren't
the same thing as tables. Off-hand (without reviewing the patch), update and
delete counts certainly wouldn't make any sense. "Insert" counts might, in
as much as it's how many rows have been added by refreshes. You'd want a
refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contents. So I think all the same counters that make sense for
regular tables are also sensible here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jim Mlodgenski
jimmy76@gmail.com
In reply to: Robert Haas (#7)
Re: mat views stats

On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

Certainly easier, but I don't think it'd be better. Matviews really

aren't

the same thing as tables. Off-hand (without reviewing the patch), update

and

delete counts certainly wouldn't make any sense. "Insert" counts might,

in

as much as it's how many rows have been added by refreshes. You'd want a
refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contents. So I think all the same counters that make sense for
regular tables are also sensible here.

After digging into things further, just making refresh report the stats for
what is it basically doing simplifies and solves it and it is something we
can back patch if that the consensus. See the attached patch.

Attachments:

refresh_matview_stats.patchtext/x-patch; charset=US-ASCII; name=refresh_matview_stats.patchDownload+19-3
#9Michael Paquier
michael@paquier.xyz
In reply to: Jim Mlodgenski (#8)
Re: mat views stats

On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski <jimmy76@gmail.com> wrote:

On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

Certainly easier, but I don't think it'd be better. Matviews really
aren't
the same thing as tables. Off-hand (without reviewing the patch), update
and
delete counts certainly wouldn't make any sense. "Insert" counts might,
in
as much as it's how many rows have been added by refreshes. You'd want a
refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contrs that make sense for
regular tables are also sensible here.

After digging into things further, just making refresh report the stats for
what is it basically doing simplifies and solves it and it is something we
can back patch if that the consensus. See the attached patch.

This is unhappy:
$ git diff master --check
src/backend/commands/matview.c:155: indent with spaces.
+ uint64 processed = 0;

+                /*
+                 * Send the stats to mimic what we are essentially doing.
+                 * A truncate and insert
+                 */
This sentence is unfinished.

There is also no need to report the number of inserts if WITH NO DATA is used.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Jim Mlodgenski
jimmy76@gmail.com
In reply to: Michael Paquier (#9)
Re: mat views stats

On Wed, Mar 1, 2017 at 8:39 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski <jimmy76@gmail.com> wrote:

On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

Certainly easier, but I don't think it'd be better. Matviews really
aren't
the same thing as tables. Off-hand (without reviewing the patch),

update

and
delete counts certainly wouldn't make any sense. "Insert" counts

might,

in
as much as it's how many rows have been added by refreshes. You'd

want a

refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contrs that make sense for
regular tables are also sensible here.

After digging into things further, just making refresh report the stats

for

what is it basically doing simplifies and solves it and it is something

we

can back patch if that the consensus. See the attached patch.

This is unhappy:
$ git diff master --check
src/backend/commands/matview.c:155: indent with spaces.
+ uint64 processed = 0;

+                /*
+                 * Send the stats to mimic what we are essentially doing.
+                 * A truncate and insert
+                 */
This sentence is unfinished.

There is also no need to report the number of inserts if WITH NO DATA is
used.

Here is the cleaned up patch

Attachments:

refresh_matview_stats_v2.patchtext/x-patch; charset=US-ASCII; name=refresh_matview_stats_v2.patchDownload+20-3
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Mlodgenski (#10)
Re: mat views stats

Jim Mlodgenski <jimmy76@gmail.com> writes:

After digging into things further, just making refresh report the stats
for what is it basically doing simplifies and solves it and it is
something we can back patch if that the consensus. See the attached
patch.

I've pushed this into HEAD with one non-cosmetic change: the patch tried
to pass a uint64 tuple count to pgstat_count_heap_insert(), whose argument
was only declared as "int". This would go seriously wrong with matviews
having more than INT_MAX rows, which hardly seems out of the question,
so I changed pgstat_count_heap_insert() to take int64 instead.

I don't think we can make that change in the back branches though.
It seems too likely that third-party code might be calling
pgstat_count_heap_insert().

We could possibly kluge around this to produce a safe-to-back-patch
fix by doing something like

pgstat_count_heap_insert(matviewRel, (int) Min(processed, INT_MAX));

But that seems pretty ugly. Given the lack of previous reports, I'm
personally content to leave this unfixed in the back branches.

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jim Mlodgenski
jimmy76@gmail.com
In reply to: Tom Lane (#11)
Re: mat views stats

But that seems pretty ugly. Given the lack of previous reports, I'm
personally content to leave this unfixed in the back branches.

Comments?

Most instances of this I've seen out in the field have worked around this

by just running analyze in the scheduled jobs after the refresh so we're
probably good not back patching.