Materialized views

Started by Kevin Grittnerabout 14 years ago17 messages
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

This is the time of year when the Wisconsin Courts formalize their
annual plan for where people will be spending the bulk of their time
in the coming year. Two years ago at this time, managers decided
that serializable transactions were a big enough issue to justify
assigning about half of my 2011 time to working on PostgreSQL
enhancements for that. This year our big database issue is
materialized views.

As we strive to create our next generation of software we find
ourselves wanting to provide "dashboard" type windows with graphs of
statistics which are insanely expensive to calculate on the fly.
We've been creating ad hoc materialized views to deal with the
performance issues, but that is labor intensive. I'm considering
submitting a proposal to management that I be assigned to work on
a declarative implementation in PostgreSQL to allow speedier
application development of software needing materialized views.

I'm posting to make sure that nobody else is already in the midst of
working on this, and to check regarding something on the Wiki page
for this topic:

http://wiki.postgresql.org/wiki/Materialized_Views

That page describes three components: creating MVs, updating MVs, and
having the planner automatically detect when an MV matches some
portion of a regular query and using the MV instead of the specified
tables in such cases. I have high confidence that if time is
approved I could do the first two for the 9.3, but that last one
seems insanely complicated and not necessarily a good idea. (That's
particularly true with some of the lazier strategies for maintaining
the data in the materialized view.) I don't think we want to use
that 3rd component in our shop, anyway. So the question is, would a
patch which does the first two without the third be accepted by the
community?

I'm not at the point of proposing specifics yet; the first phase
would be a close review of prior threads and work on the topic
(including the GSoC work). Then I would discuss implementation
details here before coding.

The hope on our end, of course, is that the time spent on
implementing this would be more than compensated by application
programmer time savings as we work on our next generation of
application software, which seems like a pretty safe bet to me.

-Kevin

#2Thom Brown
thom@linux.com
In reply to: Kevin Grittner (#1)
Re: Materialized views

On 8 November 2011 21:23, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

This is the time of year when the Wisconsin Courts formalize their
annual plan for where people will be spending the bulk of their time
in the coming year.  Two years ago at this time, managers decided
that serializable transactions were a big enough issue to justify
assigning about half of my 2011 time to working on PostgreSQL
enhancements for that.  This year our big database issue is
materialized views.

As we strive to create our next generation of software we find
ourselves wanting to provide "dashboard" type windows with graphs of
statistics which are insanely expensive to calculate on the fly.
We've been creating ad hoc materialized views to deal with the
performance issues, but that is labor intensive.  I'm considering
submitting a proposal to management that I be assigned to work on
a declarative implementation in PostgreSQL to allow speedier
application development of software needing materialized views.

I'm posting to make sure that nobody else is already in the midst of
working on this, and to check regarding something on the Wiki page
for this topic:

http://wiki.postgresql.org/wiki/Materialized_Views

That page describes three components: creating MVs, updating MVs, and
having the planner automatically detect when an MV matches some
portion of a regular query and using the MV instead of the specified
tables in such cases.  I have high confidence that if time is
approved I could do the first two for the 9.3, but that last one
seems insanely complicated and not necessarily a good idea.  (That's
particularly true with some of the lazier strategies for maintaining
the data in the materialized view.)  I don't think we want to use
that 3rd component in our shop, anyway.  So the question is, would a
patch which does the first two without the third be accepted by the
community?

I'm not at the point of proposing specifics yet; the first phase
would be a close review of prior threads and work on the topic
(including the GSoC work).  Then I would discuss implementation
details here before coding.

The hope on our end, of course, is that the time spent on
implementing this would be more than compensated by application
programmer time savings as we work on our next generation of
application software, which seems like a pretty safe bet to me.

+1

I was pleased to see the subject of this thread. I definitely think
it's worth it, especially if you're able to make it also work for
foreign tables (saving expense of seeking external data so can also
act as a local cache, but that's me getting carried away). And I
agree regarding the planner detection. If that ever were desired, it
certainly would't need implementing in the first phase.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Josh Berkus
josh@agliodbs.com
In reply to: Kevin Grittner (#1)
Re: Materialized views

On 11/8/11 1:23 PM, Kevin Grittner wrote:

So the question is, would a
patch which does the first two without the third be accepted by the
community?

AFAIC, yes.

For that matter, Part 3 is useful without parts 1 and 2, if someone
wanted to work on that. I recall some academic work on automated
materialized view matching back in the 7.2 days.

And I can help test whatever you come up with ... I do a lot of matviews.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#4Greg Jaskiewicz
gj@pointblue.com.pl
In reply to: Kevin Grittner (#1)
Re: Materialized views

On 8 Nov 2011, at 21:23, Kevin Grittner wrote:

This is the time of year when the Wisconsin Courts formalize their
annual plan for where people will be spending the bulk of their time
in the coming year. Two years ago at this time, managers decided
that serializable transactions were a big enough issue to justify
assigning about half of my 2011 time to working on PostgreSQL
enhancements for that. This year our big database issue is
materialized views.

+1 for that too. Too many hacked together MVs here, and more coming.

#5Stephen Frost
sfrost@snowman.net
In reply to: Kevin Grittner (#1)
Re: Materialized views

* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:

So the question is, would a
patch which does the first two without the third be accepted by the
community?

I'm about 99% sure the answer to that is 'yes'. Are you thinking of
having a background scheduler which handles the updating of
schedule-driven (instead of trigger-driven) MVs..? Not to try to
feature-creep this on you, but you might consider how a new backend
process which handles scheduled tasks could be generalized to go beyond
handling just MV updates.. :)

Thanks!

Stephen

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Stephen Frost (#5)
Re: Materialized views

Stephen Frost <sfrost@snowman.net> wrote:

Are you thinking of having a background scheduler which handles
the updating of schedule-driven (instead of trigger-driven) MVs..?

I'm trying not to get to far into discussing design in advance of
reviewing previous work and any papers on the topic. That said, it
seems clear that the above is required but not sufficient.

Not to try to feature-creep this on you, but you might consider
how a new backend process which handles scheduled tasks could be
generalized to go beyond handling just MV updates.. :)

I'll keep that in mind. :-)

-Kevin

#7David E. Wheeler
david@kineticode.com
In reply to: Stephen Frost (#5)
Re: Materialized views

On Nov 8, 2011, at 2:54 PM, Stephen Frost wrote:

So the question is, would a
patch which does the first two without the third be accepted by the
community?

+1 Definitely.

I'm about 99% sure the answer to that is 'yes'. Are you thinking of
having a background scheduler which handles the updating of
schedule-driven (instead of trigger-driven) MVs..? Not to try to
feature-creep this on you, but you might consider how a new backend
process which handles scheduled tasks could be generalized to go beyond
handling just MV updates.. :)

+1 That sure would be nice. Might be some useful stuff in pgAgent to pull into this (license permitting).

Best,

David

#8Josh Berkus
josh@agliodbs.com
In reply to: Stephen Frost (#5)
Re: Materialized views

I'm about 99% sure the answer to that is 'yes'. Are you thinking of
having a background scheduler which handles the updating of
schedule-driven (instead of trigger-driven) MVs..? Not to try to
feature-creep this on you, but you might consider how a new backend
process which handles scheduled tasks could be generalized to go beyond
handling just MV updates.. :)

No feature-creeping; the two features described will be hard enough.
Besides, we have pg_agent for that.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#9David E. Wheeler
david@kineticode.com
In reply to: Josh Berkus (#8)
Re: Materialized views

On Nov 8, 2011, at 3:25 PM, Josh Berkus wrote:

I'm about 99% sure the answer to that is 'yes'. Are you thinking of
having a background scheduler which handles the updating of
schedule-driven (instead of trigger-driven) MVs..? Not to try to
feature-creep this on you, but you might consider how a new backend
process which handles scheduled tasks could be generalized to go beyond
handling just MV updates.. :)

No feature-creeping; the two features described will be hard enough.
Besides, we have pg_agent for that.

Well, there has to be some way to refresh an MV at regular intervals, no?

Best,

David

#10Josh Berkus
josh@agliodbs.com
In reply to: David E. Wheeler (#9)
Re: Materialized views

No feature-creeping; the two features described will be hard enough.
Besides, we have pg_agent for that.

Well, there has to be some way to refresh an MV at regular intervals, no?

For periodic update matviews, yes. However, Kevin needs only produce
the command/function call for updating a generic periodic matview. He
doesn't need to provide a scheduling utility. For simple cases, cron works.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#11David E. Wheeler
david@kineticode.com
In reply to: Josh Berkus (#10)
Re: Materialized views

On Nov 8, 2011, at 4:23 PM, Josh Berkus wrote:

Well, there has to be some way to refresh an MV at regular intervals, no?

For periodic update matviews, yes. However, Kevin needs only produce
the command/function call for updating a generic periodic matview. He
doesn't need to provide a scheduling utility. For simple cases, cron works.

Oh, I see. I was assuming it would be automatic-ish, with the declaration of the matview including update interval information. Could certainly come later, though.

Best,

David

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#8)
Re: Materialized views

On 11/08/2011 06:25 PM, Josh Berkus wrote:

I'm about 99% sure the answer to that is 'yes'. Are you thinking of
having a background scheduler which handles the updating of
schedule-driven (instead of trigger-driven) MVs..? Not to try to
feature-creep this on you, but you might consider how a new backend
process which handles scheduled tasks could be generalized to go beyond
handling just MV updates.. :)

No feature-creeping; the two features described will be hard enough.
Besides, we have pg_agent for that.

Yeah. I'd be plenty happy with greedy MVs plus lazy MVs that need a
manual refresh. These are both patterns I use a lot (the buildfarm
dashboard is in effect a greedy MV although it's not updated by a trigger).

cheers

andrew

#13Stephen Frost
sfrost@snowman.net
In reply to: Josh Berkus (#10)
Re: Materialized views

* Josh Berkus (josh@agliodbs.com) wrote:

Well, there has to be some way to refresh an MV at regular intervals, no?

For periodic update matviews, yes. However, Kevin needs only produce
the command/function call for updating a generic periodic matview. He
doesn't need to provide a scheduling utility. For simple cases, cron works.

Perhaps that would be an acceptable initial version, but I'd be pretty
disappointed and I certainly don't think we should stop there.

I'm quite aware of cron and as aware of the difficulties that many DBAs
are going to have getting cronjobs implemented. There's permission
issues (we see this already with the requests to provide an in-PG DBA
interface for pg_hba.conf..), locking issues (writing decent scripts
that don't destroy the box if they take a bit too long, like >5m on a
5m scheduled job), authentication issues (we don't really want these
running as superuser if we can avoid it..), and probably other things
I'm not thinking of.

Thanks,

Stephen

#14Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#13)
Re: Materialized views

On Tue, Nov 8, 2011 at 8:31 PM, Stephen Frost <sfrost@snowman.net> wrote:

Perhaps that would be an acceptable initial version, but I'd be pretty
disappointed and I certainly don't think we should stop there.

I agree that a built-in job scheduler would be pretty awesome, but I
think it's a completely separate project from materialized views.
Each of the two is a major project all by itself; making one into a
dependency of the other is a recipe for failure.

In view of Kevin's success with SSI, I'm very pleased to see him
picking this out as his next target. If and when it gets done, this
will be a great, great feature. And we shouldn't be afraid to start
small and move incrementally toward what we really want.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Greg Smith
greg@2ndQuadrant.com
In reply to: Kevin Grittner (#1)
Re: Materialized views

On 11/08/2011 04:23 PM, Kevin Grittner wrote:

http://wiki.postgresql.org/wiki/Materialized_Views
That page describes three components: creating MVs, updating MVs, and
having the planner automatically detect when an MV matches some
portion of a regular query and using the MV instead of the specified
tables in such cases.

So far that page is just my initial notes on this subject based on some
research I did, don't read too much into it. Ignoring MV substitution
is certainly the right plan for an initial development target. An
implementation that didn't update automatically at all is also still a
useful step. It's very easy to pick too big of a chunk of this project
to chew on at once.

When I wrote that, it wasn't completely clear to me yet that doing the
updates would involve so many of the same tricky bits that stalled
progress on the MERGE command. Nowadays I think of working out the
concurrency issues in both this and MERGE, in the wake of true
serialization, as another step in this implementation plan. It's not
strictly necessary, but there's some shared issues that might be tackled
easier in the MERGE context instead.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#1)
Re: Materialized views

On Tue, Nov 8, 2011 at 9:23 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

That page describes three components: creating MVs, updating MVs, and
having the planner automatically detect when an MV matches some
portion of a regular query and using the MV instead of the specified
tables in such cases.  I have high confidence that if time is
approved I could do the first two for the 9.3, but that last one
seems insanely complicated and not necessarily a good idea.  (That's
particularly true with some of the lazier strategies for maintaining
the data in the materialized view.)  I don't think we want to use
that 3rd component in our shop, anyway.  So the question is, would a
patch which does the first two without the third be accepted by the
community?

For me, yes. I support and encourage your work. It's a big topic and
we must approach it incrementally.

Having said that, we should assume that #3 will be implemented and
that we need to collect appropriate metadata and anything else
required. So the design should foresee #3 and not in any way optimise
for the case where #3 doesn't happen. It may occur that #3 is added
during next cycle concurrently with this development.

I would also caution that all other databases currently provide #3 as
a matter of course. That is the "sauce" as far as many people are
concerned. Everything else is already achievable using external
application code. So I would not want people to start saying "we have
MVs" when in fact all we did was add declarative syntax to support
what was already possible - we could easily publicise that incorrectly
at release time.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#17Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#1)
Re: Materialized views

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

I'm considering submitting a proposal to management that I be
assigned to work on a declarative implementation in PostgreSQL to
allow speedier application development of software needing
materialized views.

Thanks to all who provided feedback and support in response to my
post.

Based on the feedback here and off-list, I did submit a proposal.
It was just approved by the appropriate steering committee
(consisting of our CIO, the Director of State Courts, District Court
Administrators, Judges, Clerks of Court, and other stake-holders) as
a low-priority project. That means that I expect I'll have the time
to get a patch together in time for 9.3, but the times at which the
decks will be clear of other assignments to allow work on this will
not be very predictable. I'll probably be on-again, off-again
throughout the year. I apologize in advance for the fact that the
times when I will be able to work on it might not fit well with the
release cycle or CFs, but I kinda have to take what I can get in
that regard.

-Kevin