Query Rewrite for Materialized Views (FDW Extension)

Started by Dent Johnalmost 8 years ago10 messages

denty@QQdd.eu

almost 8 years ago

Hi,

I wanted to share the project I've been working on which dynamically rewrites queries to target materialized views when views are available that can satisfy part of a query with lower cost plans.

I embarked upon as an interesting side project. It took me a bit more time than I anticipated, but the result works for my use case. Because of that, I thought it worth sharing. However I would caution that my use case is not exactly of a commercial scale... so please heed the following obligatory warning:

**NOTE: this is not "production ready" code — if it works for you, then great, but use it after thorough testing, and with appropriate caution.**

There are some limitations to the rewrite opportunities it takes up, and it will almost certainly fail on complex materialized views composed of deeply nested queries.

The extension does not have extensive (actually: any) documentation, but the few test cases should make obvious to the inclined reader how it works. This is deliberate at this early a stage: I don't want to encourage uninformed adoption because of the possibility of data loss or incorrect query rewrites.

The extension is written against a Postgres 10.1 source tree.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite

Questions or comments are very welcome.

denty.

John Dent

denty@QQdd.eu

over 7 years ago

In reply to: Dent John (#1)

Query Rewrite for Materialized Views (Postgres Extension)

Hi folks,

I thought I’d share an update to my pet project, which dynamically rewrites queries to target materialized views when they are available and can satisfy a query (or part of it) with a lower cost plan.

The extension is now a regular EXTENSION and no longer tied in to the FDW mechanism. As a result, it may now be more generally usable, and less complicated to integrate into an existing system. (The FDW approach was an easy way for me to get started, but it ultimately added complexity and was rather limiting.)

Same caution as before applies:

**NOTE: this is not "production ready" code — if it works for you, then great, but use it after thorough testing, and with appropriate caution.**

Built, and has rudimentary testing against Postgres 10.1..10.3.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite
README: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite/blob/master/README.md

Hope it is useful or interesting for someone! Questions or comments are very welcome.

denty.

Show quoted text

Begin original message:

From: Dent John <denty@QQdd.eu>
Subject: Query Rewrite for Materialized Views (FDW Extension)
Date: 5 April 2018 at 14:41:15 BST
To: pgsql-hackers@lists.postgresql.org

Hi,

I wanted to share the project I've been working on which dynamically rewrites queries to target materialized views when views are available that can satisfy part of a query with lower cost plans.

I embarked upon as an interesting side project. It took me a bit more time than I anticipated, but the result works for my use case. Because of that, I thought it worth sharing. However I would caution that my use case is not exactly of a commercial scale... so please heed the following obligatory warning:

**NOTE: this is not "production ready" code — if it works for you, then great, but use it after thorough testing, and with appropriate caution.**

There are some limitations to the rewrite opportunities it takes up, and it will almost certainly fail on complex materialized views composed of deeply nested queries.

The extension does not have extensive (actually: any) documentation, but the few test cases should make obvious to the inclined reader how it works. This is deliberate at this early a stage: I don't want to encourage uninformed adoption because of the possibility of data loss or incorrect query rewrites.

The extension is written against a Postgres 10.1 source tree.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite

Questions or comments are very welcome.

denty.

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: John Dent (#2)

Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-16 16:21 GMT+02:00 John Dent <denty@qqdd.eu>:

Hi folks,

I thought I’d share an update to my pet project, which dynamically
rewrites queries to target materialized views when they are available and
can satisfy a query (or part of it) with a lower cost plan.

The extension is now a regular EXTENSION and no longer tied in to the FDW
mechanism. As a result, it may now be more generally usable, and less
complicated to integrate into an existing system. (The FDW approach was an
easy way for me to get started, but it ultimately added complexity and was
rather limiting.)

Same caution as before applies:

**NOTE: this is not "production ready" code — if it works for you, then
great, but use it after thorough testing, and with appropriate caution.**

Built, and has rudimentary testing against Postgres 10.1..10.3.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite
README: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite/
blob/master/README.md

Hope it is useful or interesting for someone! Questions or comments are
very welcome.

good idea.

Regards

Pavel

Show quoted text

denty.

Begin original message:

*From: *Dent John <denty@QQdd.eu>
*Subject: **Query Rewrite for Materialized Views (FDW Extension)*
*Date: *5 April 2018 at 14:41:15 BST
*To: *pgsql-hackers@lists.postgresql.org

Hi,

I wanted to share the project I've been working on which dynamically
rewrites queries to target materialized views when views are available that
can satisfy part of a query with lower cost plans.

I embarked upon as an interesting side project. It took me a bit more time
than I anticipated, but the result works for my use case. Because of that,
I thought it worth sharing. However I would caution that my use case is not
exactly of a commercial scale... so please heed the following obligatory
warning:

**NOTE: this is not "production ready" code — if it works for you, then
great, but use it after thorough testing, and with appropriate caution.**

There are some limitations to the rewrite opportunities it takes up, and
it will almost certainly fail on complex materialized views composed of
deeply nested queries.

The extension does not have extensive (actually: any) documentation, but
the few test cases should make obvious to the inclined reader how it works.
This is deliberate at this early a stage: I don't want to encourage
uninformed adoption because of the possibility of data loss or incorrect
query rewrites.

The extension is written against a Postgres 10.1 source tree.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite

Questions or comments are very welcome.

denty.

Corey Huinker

corey.huinker@gmail.com

over 7 years ago

In reply to: Pavel Stehule (#3)

Re: Query Rewrite for Materialized Views (Postgres Extension)

Hope it is useful or interesting for someone! Questions or comments are

very welcome.

good idea.

Regards

Pavel

In a recent PgConf NYC presentation [1]https://postgresconf.org/conferences/2018/program/proposals/a-roadmap-to-continuous-materialized-views-b4644661-8d5a-4186-8c17-4fb82600e147 I was talking about the technical
hurdles to implementing materialized views that could be kept up to date at
all times, and the benefits of having such a thing.

Some use cases can be addressed with eventually-consistent derivative table
structures (Vertica's projections, PipelineDB's continuous views, etc), but
those methods rely on the source data never having deletes or updates, or
confining those updates to the "hot" part of the source tables, so it
generally works for time-series data, but not for other cases.

It has occurred to me that Dave Fetter's work on ASSERTIONS [2]http://databasedoings.blogspot.com/2018/06/ive-posted-my-slides-for-my-asssertions.html has common
underpinnings with true continuous materialized views. In both cases, the
creation of a system object causes the creations of insert/update/delete
triggers on one or more existing tables. In the case of assertions, those
triggers are run with the goal of raising an error if rows are returned
from a query. In the case of a materialized view, those same triggers would
be used to delete rows from a CMV and insert replacements rows.

If we can get always-up-to-date materialized views, then Denty's work on
query rewrite would have greatly enhanced utility.

[1]: https://postgresconf.org/conferences/2018/program/proposals/a-roadmap-to-continuous-materialized-views-b4644661-8d5a-4186-8c17-4fb82600e147
https://postgresconf.org/conferences/2018/program/proposals/a-roadmap-to-continuous-materialized-views-b4644661-8d5a-4186-8c17-4fb82600e147
[2]: http://databasedoings.blogspot.com/2018/06/ive-posted-my-slides-for-my-asssertions.html
http://databasedoings.blogspot.com/2018/06/ive-posted-my-slides-for-my-asssertions.html

Dent John

denty@QQdd.eu

over 7 years ago

In reply to: Corey Huinker (#4)

Re: Query Rewrite for Materialized Views (Postgres Extension)

I commented to Corey (privately) that, while my rewrite extension has gotten me a server that responds quickly to aggregate queries, the constant need to refresh the supporting MVs means the system’s load average is constant and much higher than before. I’m happy with the tradeoff for now, but it’s a huge waste of energy, and I’m sure it must thrash my disk.

I’m very interested in what other people think of Corey’s idea.

Jim Finnerty

jfinnert@amazon.com

over 7 years ago

In reply to: Dent John (#1)

Re: Query Rewrite for Materialized Views (FDW Extension)

Hi John,

Can you characterize the class of queries that the MVs eligible for
query rewrite may include, and the class of statements that may be rewritten
using those mvs, in terms of joins and join types, aggregation, constraints,
types of aggregate operators, and allowed aggregate expressions?

For example:
- the mv may contain any number of joins, which may be either inner
or left outer, semi-, or anti joins
- aggregates are optional, but if present, all columns in aggregate
expressions must refer to the same table
- MIN, MAX, SUM, COUNT, AVG aggregate operations are allowed
- to be eligible for query rewrite a user must ... <fill in the
details>
- local predicates are / not allowed in the MV

- the statement being rewritten may have one or more joins.
- the joins need not be identical to the corresponding joins in the
MV, but <fill in the details>
- there may be joins in the mv that are not in the statement
provided that ... <fill in the details>
- any restrictions or assumed dependence on fdw's, since the name
'fdw extension' implies as much
- MV rewrite will / not be cost-based
- The statement may / not have local predicates that are not
identical to the local predicates in the mv

If you're going to do MV rewrite using MVs that aren't transactionally
consistent with the underlying tables, then you're going to need a different
sort of isolation mode, or some sort of permissions model that enables users
to opt-in to permit the optimizer to give back "wrong results" from
potentially stale MVs.

I didn't attend the Ottawa conference this year, so I'd be interested to
hear a summary of what the community thinks about MV incremental maintenance
as well. There are several important subtypes of MV incremental
maintenance:

1) incremental refresh on-commit, for individual row DML (enables MVs
to be transactionally consistent, but incurs overhead at COMMIT time)
2) bulk incremental refresh on commit or on demand, after a COPY
operation (more for a DW use case, but much more efficient for bulk-insert
scenarios, and permits some optimizations that are not possible for the
first case)
3) full refresh by partition (not an automatic solution, and there are
some subtleties here about rows that move from one partition to another, but
it's an easy first step toward incremental mv maintenance without most of
the complexity)

there is also the possibility of doing mv refresh from other mv's that have
already been refreshed, so mv rewrite and mv refresh sometimes interact.

thank you,

/Jim F

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Nico Williams

nico@cryptonector.com

over 7 years ago

In reply to: Dent John (#5)

Re: Query Rewrite for Materialized Views (Postgres Extension)

On Mon, Jun 18, 2018 at 07:38:13PM +0100, Dent John wrote:

I commented to Corey (privately) that, while my rewrite extension has
gotten me a server that responds quickly to aggregate queries, the
constant need to refresh the supporting MVs means the system’s load
average is constant and much higher than before. I’m happy with the
tradeoff for now, but it’s a huge waste of energy, and I’m sure it
must thrash my disk.

I’m very interested in what other people think of Corey’s idea.

I've written an alternative materialization extension (entirely as
PlPgSQL) based on PG's internals, but my version has a few big wins that
might help here. I'm thinking of properly integrating it with PG. Some
of the features include:

- you can write triggers that update the materialization

This is because the materialization is just a regular table in my
implementation.

- you can mark a view as needing a refresh (e.g., in a trigger)

- you can declare a PK, other constraints, and indexes on a
materialization

The DMLs used to refresh a view concurrently can take advantage of
the PK and/or other indexes to go fast.

- you get a history table which records updates to the materialization

This is useful for generating incremental updates to external
systems.

Keeping track of refresh times should help decide whether to use or not
use a materialization in some query, or whether to refresh it first, or
not use it at all.

One of the things I'd eventually like to do is analyze the view query
AST to automatically generate triggers to update materializations or
mark them as needing refreshes. A first, very very rough sketch of such
an analysis looks like this:

- if the view query has CTEs
-> create triggers on all its table sources to mark the
materialization as needing a refresh

- else if a table appears more than once as a table source in the view
query
-> create triggers on that table that mark the materialization as
needing a refresh

- else if a table appears anywhere other than the top-level
-> create triggers .. mark as needing refresh

- else if a table is a right-side of a left join
-> create triggers .. mark as needing refresh

- else if a table has no PK
-> create triggers .. mark as needing refresh

- else if the query has no GROUP BY, or only does a GROUP BY on this
table and a list of columns prefixed by the table's PK
-> rewrite the query to have WHERE eq conditions on values for the
table's PK columns

analyze this query

if the result shows this table source as the first table in the
plan
-> create triggers on this table to update the materialization
directly from querying the source view

- else
-> create triggers .. mark as needing refresh

Nico
--

Dent John

denty@qqdd.eu

over 7 years ago

In reply to: Nico Williams (#7)

Re: Query Rewrite for Materialized Views (Postgres Extension)

Hi Nico,

I’m pretty impressed anything in this space can be written entirely in PlPGQSL!

If you did integrate your implementation, it would be easy for my Extension to read from a table other than the one which it gets the MV definition from... Although having said that, if you went down the route you suggest, would not you make that “regular table” into a first class scheme object very much like Corey’s CONTINUOUS MATERIALIZED VIEW object concept?

It is interesting that you can put triggers onto the table though, as that leads well in to use cases where it is desirable to “stack” MVs upon each other. (I’m not immediately sure whether such a use case is still needed in face of an always-up-to-date MV feature such as is described, but I’ve seen it elsewhere.)

You say you’d like to base it off a VIEW’s AST (rather than, I presume, you must parse the reconstructed VIEW source text as SQL?), and I do agree — that’s probably the right direction... it does seem to me there is scope to leverage the “bottom half” of the ASSERTION stuff from Dave Fetter that Corey linked to — i.e., the part of it that manages triggers. Still leaves the AST crawling deciding what to actually do once a change is caught.

Really good to hear about progress in this area.

Nico Williams

nico@cryptonector.com

over 7 years ago

In reply to: Dent John (#8)

Re: Query Rewrite for Materialized Views (Postgres Extension)

On Tue, Jun 19, 2018 at 08:46:06AM +0100, Dent John wrote:

I’m pretty impressed anything in this space can be written entirely in
PlPGQSL!

https://github.com/twosigma/postgresql-contrib

PG is quite powerful!

I have even implemented a COMMIT TRIGGER in pure PlPgSQL.

You'll notice I make extensive use of record/table types.

If you did integrate your implementation, it would be easy for my
Extension to read from a table other than the one which it gets the MV
definition from... Although having said that, if you went down the
route you suggest, would not you make that “regular table” into a
first class scheme object very much like Corey’s CONTINUOUS
MATERIALIZED VIEW object concept?

I know nothing about the CONTINUOUS MATERIALIZED VIEW concept. What
that would imply to me seems... difficult to achieve. There will be
view queries that are difficult or impossible to automatically write
triggers for that update an MV synchronously.

It is interesting that you can put triggers onto the table though, as
that leads well in to use cases where it is desirable to “stack” MVs
upon each other. (I’m not immediately sure whether such a use case is
still needed in face of an always-up-to-date MV feature such as is
described, but I’ve seen it elsewhere.)

I have done exactly this sort of MV chaining.

In my use case I had an MV of a nesting group transitive closure and
then another of a join between that and user group memberships to get a
complete user group transitive closure.

The transitive closure of nesting groups being computed via a RECURSIVE
CTE... In principle one can understand such a query and automatically
write DMLs to update the MV on the fly (I've done this _manually_), but
the moment you do any bulk updates out of sequence you can't, and then
you have to refresh the view, so you see, I don't quite believe we can
have a true continuously materialized view :(

For me the key requirement is the ability to generate incremental
updates to an external system, but also the whole thing has to be fast.

You say you’d like to base it off a VIEW’s AST (rather than, I
presume, you must parse the reconstructed VIEW source text as SQL?),

PG already stores the AST. There's no need to write a new parser when
PG already has one. At the end of the day you need to analyze an AST
for the MV's source query in order to automatically write the triggers
to keep it updated (or mark it as needing a refresh).

and I do agree — that’s probably the right direction... it does seem
to me there is scope to leverage the “bottom half” of the ASSERTION
stuff from Dave Fetter that Corey linked to — i.e., the part of it
that manages triggers. Still leaves the AST crawling deciding what to
actually do once a change is caught.

I'll search for this.

Really good to hear about progress in this area.

Eh, I've not actually implemented any automatic generation of triggers
to update MVs. I've written enough such triggers manually to believe
that *some* of them could be written by software. If you look at my
sketch for how to do it, you'll notice that many of the sorts of queries
that one would choose to materialize... are not really amenable to this
treatment -- that's precisely because those make for the sorts of slow
queries that make you reach for materialization in the first place :(

But even so, automatically-generated triggers that mark an MV as needing
a refresh are always possible, and that is a huge improvement anyways,
especially if concurrent view refreshes can be made to go faster (by
having PKs on the MVs). The idea is to have some sort of adaptive
automatic background, concurrent MV refresh running on a frequency based
in part of the amount of time it takes to refresh the VIEW.

BTW, MERGE would be a significant optimization for concurrent MV
refresh. Think of MERGE as a statement that can scan a source, FULL
OUTER JOIN it to a target table, and for each row do an INSERT, UPDATE,
or DELETE -- this is 3x faster than the three INSERT/UPDATE/DELETE
statements you need to do the same work without a MERGE!

Nico
--

#10

Dent John

denty@QQdd.eu

over 7 years ago

In reply to: Nico Williams (#9)

Re: Query Rewrite for Materialized Views (Postgres Extension)

Hi Nico,

By the way, I do agree with your point about MERGE — if we can factor MV updates in that fashion, it will certainly save.

I didn’t reply immediately because your point caught me off guard:

[…] If you look at my
sketch for how to do it, you'll notice that many of the sorts of queries
that one would choose to materialize... are not really amenable to this
treatment […]

I’d rather presumed that there would be many examples of DML on base relations that could trigger a direct (incremental) update to the MV. I had presumed it, but not actually done any research.

So I did a bit of a trawl. There’s actually quite a lot of academic research out there, including [1]http://www.dbnet.ece.ntua.gr/pubs/uploads/TR-1998-14.ps <http://www.dbnet.ece.ntua.gr/pubs/uploads/TR-1998-14.ps> and [2]https://arxiv.org/pdf/1806.07344.pdf <https://arxiv.org/pdf/1806.07344.pdf>. [2]https://arxiv.org/pdf/1806.07344.pdf <https://arxiv.org/pdf/1806.07344.pdf> references a bunch of methods for incremental MV refresh, and ties them into a graph query context. I’m not sure if the graph query context itself is relevant for Postgres, but it’s certainly interesting and perhaps suggests that incremental refresh of at least some RECURSIVE MVs may not be entirely impossible. I also found [3]https://www.cs.indiana.edu/pub/techreports/TR280.pdf <https://www.cs.indiana.edu/pub/techreports/TR280.pdf>, which is /very/ dated, but it strongly supports that MVs are a performant path to executing certain types of query.

So I definitely agree with you in the general case, but it seems there is scope to provide an incremental MV refresh capability that is broadly useful.

Almost certainly, any initial implementation would quickly fall back to a “full refresh”. But the refresh planner’s capability develops, I wonder if it could not embody an intelligent strategy that might even recognise common recursive patterns such as the transitive closure you mention, and refresh on an incremental basis — that would be really quite a cool capability to have.

denty.

[1]: http://www.dbnet.ece.ntua.gr/pubs/uploads/TR-1998-14.ps <http://www.dbnet.ece.ntua.gr/pubs/uploads/TR-1998-14.ps>
[2]: https://arxiv.org/pdf/1806.07344.pdf <https://arxiv.org/pdf/1806.07344.pdf>
[3]: https://www.cs.indiana.edu/pub/techreports/TR280.pdf <https://www.cs.indiana.edu/pub/techreports/TR280.pdf>