Incrementally refreshed materialized view

Started by Adam Brusselbackover 9 years ago16 messagesgeneral
Jump to latest
#1Adam Brusselback
adambrusselback@gmail.com

Hello all,
I am working on a plan to implement incrementally refreshed materialized
"views" with the existing functionality in Postgres.

Below is the plan for doing that:

Trigger based eagerly updated materialized tables for Postgres 9.5

High level plan:

Have a view definition stored in the database which we can use for
reference. Create functions which will read that view definition, and
create a materialized table with all the same columns as the reference
view, create triggers on all tables the view depends on to keep the
materialized table fresh within a transaction. All queries would hit the
materialized table, the view is just there so we know what dependencies to
track, and have an easy way to update the materialized table.

How do we actually do the refresh?

1. A refresh key is defined for the materialized table.

2. Every dependent table must roll up to that refresh key so we
know what rows to refresh.

3. That key should be able to be referenced in the views where
clause performantly so we can refresh just the rows that match the refresh
key using the view.

4. The refresh will be done by deleting any existing rows with the
key, and inserting new ones with the key from the view.

How do we know what to refresh?

1. A before statement trigger to create a temp table to log all
changes.

2. A for each row trigger to log the rows modified by DML.

a. This should be done at the refresh key level.

i. We
need to figure out a way to generate queries to roll up things multiple
levels on the dependency chain until we get to the refresh key. Not sure
at all how to do that.

3. An after statement trigger to run a refresh on the materialized
table, looking at only the rows touched by the DML.

I am however stuck on: How do we know what to refresh? -> Step 2
Pretty much, I need to figure out how to follow the joins in the view back
to whatever key was defined as the "refresh key" for each dependent table.
I know about the information_schema.view_column_usage, but I don't think
that'll get me everything I need.

I'd really appreciate any help with this, as i'd love a better way to get
eagerly refreshed materialized views in Postgres rather than doing
everything manually as I have to now.

If I can provide any more info please let me know.
Thanks,
-Adam

#2Melvin Davidson
melvin6925@gmail.com
In reply to: Adam Brusselback (#1)
Re: Incrementally refreshed materialized view

On Mon, Sep 26, 2016 at 1:56 PM, Adam Brusselback <adambrusselback@gmail.com

wrote:

Hello all,
I am working on a plan to implement incrementally refreshed materialized
"views" with the existing functionality in Postgres.

Below is the plan for doing that:

Trigger based eagerly updated materialized tables for Postgres 9.5

High level plan:

Have a view definition stored in the database which we can use for
reference. Create functions which will read that view definition, and
create a materialized table with all the same columns as the reference
view, create triggers on all tables the view depends on to keep the
materialized table fresh within a transaction. All queries would hit the
materialized table, the view is just there so we know what dependencies to
track, and have an easy way to update the materialized table.

How do we actually do the refresh?

1. A refresh key is defined for the materialized table.

2. Every dependent table must roll up to that refresh key so we
know what rows to refresh.

3. That key should be able to be referenced in the views where
clause performantly so we can refresh just the rows that match the refresh
key using the view.

4. The refresh will be done by deleting any existing rows with the
key, and inserting new ones with the key from the view.

How do we know what to refresh?

1. A before statement trigger to create a temp table to log all
changes.

2. A for each row trigger to log the rows modified by DML.

a. This should be done at the refresh key level.

i. We
need to figure out a way to generate queries to roll up things multiple
levels on the dependency chain until we get to the refresh key. Not sure
at all how to do that.

3. An after statement trigger to run a refresh on the materialized
table, looking at only the rows touched by the DML.

I am however stuck on: How do we know what to refresh? -> Step 2
Pretty much, I need to figure out how to follow the joins in the view back
to whatever key was defined as the "refresh key" for each dependent table.
I know about the information_schema.view_column_usage, but I don't think
that'll get me everything I need.

I'd really appreciate any help with this, as i'd love a better way to get
eagerly refreshed materialized views in Postgres rather than doing
everything manually as I have to now.

If I can provide any more info please let me know.
Thanks,
-Adam

*I am a bit curious. Why are you reinventing the wheel?*

*What is wrong with:*

*REFRESH MATERIALIZED VIEW [ CONCURRENTLY ] name
[ WITH [ NO ] DATA ]*

*https://www.postgresql.org/docs/9.4/static/sql-refreshmaterializedview.html
<https://www.postgresql.org/docs/9.4/static/sql-refreshmaterializedview.html&gt;*

*Can't you do that in a cron job?*
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#3Rakesh Kumar
rakeshkumar464@outlook.com
In reply to: Melvin Davidson (#2)
Re: Incrementally refreshed materialized view

Does PG support INCREMENTAL MV ? Looks like not (until 9.5)

#4Rob Sargent
robjsargent@gmail.com
In reply to: Rakesh Kumar (#3)
Re: Incrementally refreshed materialized view

Of course 9.5 is the current release so the answer is Yes, since 9.5

Show quoted text

On 09/26/2016 12:29 PM, Rakesh Kumar wrote:

*Does PG support INCREMENTAL MV ? Looks like not (until 9.5)*

#5Adam Brusselback
adambrusselback@gmail.com
In reply to: Rakesh Kumar (#3)
Re: Incrementally refreshed materialized view

I require eagerly refreshed materialized views for my use case, which is
something Postgres does not currently support. I need my updates to a
table the view refers to visible within the same transaction, and often it
is a single change to one row which will only effect a single row in the
view. If I used materialized views as they're currently implemented, that
would run the entire query and replace what needs to change, but it has to
run that entire query to do so. For my use case, that is totally out of
the question to do for every insert / update / delete that could effect the
view.

For example, if I had a account balance materialized view that pretty much
summed transactions for a specific user from two different tables and
displayed a "balance" for every user, I would want that to only run the
query for the refresh for the specific user(s) that just had data
inserted/updated/deleted. Not every user in the system after every
statement.

I've pretty much implemented this manually for some specific views which
performed horribly in Postgres (but would work fine in SQL Server for
example). I am looking to do this in a generic way so it's easier to
implement when necessary, and can be packaged as an extension for others
who may need to use it.

Obviously if we had better support for statement level triggers (so we
could reference all the NEW / OLD values touched by a statement) this would
be lower overhead, but that is one of the things holding up incrementally
refreshed materialized views from being implemented in the first place. I
just thought up a way to do it which gets around not having better
statement level triggers and wanted to see where I could get with things as
they are.

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Melvin Davidson (#2)
Re: Incrementally refreshed materialized view

On Mon, Sep 26, 2016 at 1:22 PM, Melvin Davidson <melvin6925@gmail.com> wrote:

On Mon, Sep 26, 2016 at 1:56 PM, Adam Brusselback <adambrusselback@gmail.com> wrote:

I am working on a plan to implement incrementally refreshed
materialized "views" with the existing functionality in
Postgres.

Below is the plan for doing that:

[design from scratch, incomplete]

I am however stuck on: How do we know what to refresh?

Pretty much, I need to figure out how to follow the joins in the
view back to whatever key was defined as the "refresh key" for
each dependent table. I know about the
information_schema.view_column_usage, but I don't think that'll
get me everything I need.

I'd really appreciate any help with this, as i'd love a better
way to get eagerly refreshed materialized views in Postgres
rather than doing everything manually as I have to now.

If I can provide any more info please let me know.

I am a bit curious. Why are you reinventing the wheel?
What is wrong with:

REFRESH MATERIALIZED VIEW [ CONCURRENTLY ] name
[ WITH [ NO ] DATA ]

https://www.postgresql.org/docs/9.4/static/sql-refreshmaterializedview.html

Can't you do that in a cron job?

Well, that is not *incremental* maintenance -- the entire query is
executed, with the resulting relation either replacing the previous
contents of the matview or "diffed" against the previous contents
(so that the difference can be applied with transactional
semantics), depending on whether CONCURRENTLY was specified.

The OP is still reinventing the wheel though. A summary of
available techniques as of the mid-90s can be found here:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254&amp;rep=rep1&amp;type=pdf

With some detail for what to me look like the two most promising
techniques here:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.3208&amp;rep=rep1&amp;type=pdf

The first step in using either of those techniques (counting or
DRed) is to capture a delta relation to feed into the relational
algebra used by these techniques. As a first step in that
direction I have been floating a patch to implement the
SQL-standard "transition tables" feature for AFTER triggers.

https://commitfest.postgresql.org/10/778/

If you want to help that effort, reading the thread and reviewing
the patch would be good.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Adam Brusselback
adambrusselback@gmail.com
In reply to: Rob Sargent (#4)
Re: Incrementally refreshed materialized view

On Mon, Sep 26, 2016 at 2:35 PM, Rob Sargent <robjsargent@gmail.com> wrote:

Of course 9.5 is the current release so the answer is Yes, since 9.5

It seems like there is some confusion about what we're talking about. I am

talking about incremental updates to a sort of "fake" materialized view
(implemented as a table). This is totally different than what we currently
have implemented for actual materialized views (REFRESH CONCURRENTLY).
Concurrent refresh just minimizes the time the view is locked by building a
second table in the background and merging the changes between them.

#8Rakesh Kumar
rakeshkumar464@outlook.com
In reply to: Rob Sargent (#4)
Re: Incrementally refreshed materialized view

Of course 9.5 is the current release so the answer is Yes, since 9.5

https://www.postgresql.org/docs/current/static/sql-creatematerializedview.html

What am I missing. I don't see any support for incremental refresh.

Just in case we are not speaking the same thing: When a MV is created for the first time on a base table, all further updates on the base table, will be handled by only applying the delta changes to the base table. That is called incremental refresh. Check how DB2 does it:

http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000977.html

Or does " REFRESH MATERIALIZED VIEW." does it only incremental refresh.

Does PG have a concept of MV log, from where it can detect the delta changes and apply incremental changes quickly.

#9Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Rakesh Kumar (#8)
Re: Incrementally refreshed materialized view

On Mon, Sep 26, 2016 at 2:04 PM, Rakesh Kumar
<rakeshkumar464@outlook.com> wrote:

Does PG have a concept of MV log, from where it can detect the
delta changes and apply incremental changes quickly.

That is what I am trying to work toward with the patch I cited in
an earlier post. Once some variation of that is in, the actual
incremental maintenance can be build on top of it. To see an
example of what would be done with such a delta relation for a
simple MV, using the count algorithm, see below:

--------------------------------------------------------------------------
--
-- This file demonstrates how incremental maintenance of materialized views
-- is intended to work using the counting algorithm. TEMP tables are used
-- to mock up both snapshots of relations and delta relations which will
-- be created automatically "behind the scenes" as part of the algorithm.
--
-- Delta relations for both base tables and materialized views will need a
-- "count(t)" column to track the number of alternative derivations of the
-- tuple. The column is actually named that in this example. Where the
-- count is needed by the algorithm and a relation doesn't have it
-- (because, for example, it is a base table), 1 is used. This is kept in
-- the example for purposes of illustration. While it is explicitly added
-- to the materialized views for this example, the idea would be that
-- specifying these materialized views using DISTINCT would implicitly add
-- the "count(t)" column when incremental maintenance was specified.
--
-- The logic for a JOIN is that the resulting row should have the product
-- of multiplying the two input row counts. Calculating a delta for that
-- requires two such joins which are then processed by a UNION with special
-- semantics. Those semantics are achieved in this example by feeding
-- UNION ALL results to an outer SELECT which uses GROUP BY and HAVING
-- clauses.
--
-- This example dodges some complex and performance-sapping problems that
-- occur when duplicate rows may be present. It does so with a combination
-- of a PRIMARY KEY declaration in the base table and GROUP BY clauses in
-- the materialized views.
--
-- The temporary relation names in the example are chosen for clarity of
-- the example. If temporary relation names are actually needed in the
-- implementation, they would need to be different, probably based on the
-- related permanent relation OID, for length.
--
--------------------------------------------------------------------------

-- Set up the base table used for these materialized views.
CREATE TABLE link
(
src text not null,
dst text not null,
primary key (src, dst)
);
INSERT INTO link
VALUES ('a','b'),('b','c'),('b','e'),('a','d'),('d','c');

-- Display and capture "before" image.
SELECT * FROM link ORDER BY 1, 2;
CREATE TEMP TABLE link1 AS SELECT * FROM link;

-- Create and display the initial state of the "hop" MV.
CREATE MATERIALIZED VIEW hop AS
SELECT t1.src, t2.dst, count(*) AS "count(t)"
FROM link t1
JOIN link t2 ON (t2.src = t1.dst)
GROUP BY t1.src, t2.dst;
SELECT * FROM hop;

-- Execute a transaction that modifies the base table.
-- The assumption is that the changes will be applied shortly after commit.
BEGIN;
DELETE FROM link WHERE src = 'b' AND dst = 'e';
INSERT INTO link VALUES ('c','h'),('f','g');
COMMIT;

-- Display and capture "after" image.
SELECT * FROM link ORDER BY 1, 2;
CREATE TEMP TABLE link2 AS SELECT * FROM link;

-- Mock up the delta(link) table which will get created "behind the scenes".
CREATE TEMP TABLE "Δ(link)" (src text NOT NULL, dst text NOT NULL,
"count(t)" int NOT NULL);
INSERT INTO "Δ(link)" VALUES ('b','e',-1),('c','h',1),('f','g',1);
SELECT * FROM "Δ(link)" ORDER BY 1, 2;

-- Given link before and after images, and delta(link), calculate delta(hop).
-- This must happen while the "before" and "after" snapshots are still
registered,
-- but not necessarily in the same transaction as the modifications to
the base table.
CREATE TEMP TABLE "Δ(hop)"
(src text NOT NULL, dst text NOT NULL, "count(t)" int NOT NULL);
INSERT INTO "Δ(hop)"
SELECT src, dst, sum("count(t)") AS "count(t)"
FROM (
SELECT delta.src, before.dst, delta."count(t)" * 1
FROM "Δ(link)" delta
JOIN link1 before ON (before.src = delta.dst)
UNION ALL
SELECT after.src, delta.dst, 1 * delta."count(t)"
FROM link2 after
JOIN "Δ(link)" delta ON (delta.src = after.dst)
) x(src, dst, "count(t)")
GROUP BY src, dst
HAVING sum("count(t)") <> 0;
SELECT * FROM "Δ(hop)" ORDER BY 1, 2;

-- Once the MV delta has been generated, the snapshots can be released.
-- We're using temp tables for demonstration purposes, so drop those,
-- and the base table's delta.
DROP TABLE link1, link2, "Δ(link)";

-- At some later time the MV delta is processed "behind the scenes".
-- We can't do the demonstration maintenance against the MV, so copy it.
CREATE TEMP TABLE hop2 AS SELECT * FROM hop;
BEGIN;
DELETE FROM hop2 t1
USING "Δ(hop)" t2
WHERE t1.src = t2.src
AND t1.dst = t2.dst
AND t1."count(t)" + t2."count(t)" = 0;
UPDATE hop2 t1
SET "count(t)" = t1."count(t)" + t2."count(t)"
FROM "Δ(hop)" t2
WHERE t1.src = t2.src
AND t1.dst = t2.dst;
INSERT INTO hop2
SELECT * FROM "Δ(hop)" t1
WHERE "count(t)" > 0
AND NOT EXISTS (SELECT * FROM hop2 t2 WHERE t2.src = t1.src AND
t2.dst = t1.dst);
COMMIT;

-- Show that we got to the same result with incremental maintenance as
with REFRESH.
REFRESH MATERIALIZED VIEW hop;
SELECT * FROM hop ORDER BY 1, 2;
SELECT * FROM hop2 ORDER BY 1, 2;

-- Now we're done with the MV delta.
DROP TABLE hop2, "Δ(hop)";

--------------------------------------------------------------------------

-- Let's show how it works when an MV uses another MV.
-- Create the 2nd-level MV.
CREATE MATERIALIZED VIEW tri_hop AS
SELECT t1.src, t2.dst, sum(t1."count(t)" * 1) AS "count(t)"
FROM hop t1
JOIN link t2 ON (t2.src = t1.dst)
GROUP BY t1.src, t2.dst;
SELECT * FROM tri_hop ORDER BY 1, 2;

-- Display and capture "before" image.
-- Include hop now, because it is referenced by tri_hop.
SELECT * FROM link ORDER BY 1, 2;
CREATE TEMP TABLE link1 AS SELECT * FROM link;
SELECT * FROM hop ORDER BY 1, 2;
CREATE TEMP TABLE hop1 AS SELECT * FROM hop;

-- Execute a transaction that modifies the base table.
-- The assumption is that the changes will be applied shortly after commit.
BEGIN;
DELETE FROM link WHERE src = 'a' AND dst = 'b';
INSERT INTO link VALUES ('d','f'),('b','h');
COMMIT;

-- Display and capture "after" image.
SELECT * FROM link ORDER BY 1, 2;
CREATE TEMP TABLE link2 AS SELECT * FROM link;

-- Mock up the delta(link) table which will get created "behind the scenes".
CREATE TEMP TABLE "Δ(link)" (src text NOT NULL, dst text NOT NULL,
"count(t)" int NOT NULL);
INSERT INTO "Δ(link)" VALUES ('a','b',-1),('d','f',1),('b','h',1);
SELECT * FROM "Δ(link)" ORDER BY 1, 2;

-- Given link before and after images, and delta(link), calculate delta(hop).
-- This must happen while the "before" and "after" snapshots are still
registered,
-- but not necessarily in the same transaction as the modifications to
the base table.
-- For a "first level" MV, this is calculated just the same as before,
but it will be used
-- to calculate the 2nd level MV before we discard the snapshots.
CREATE TEMP TABLE "Δ(hop)"
(src text NOT NULL, dst text NOT NULL, "count(t)" int NOT NULL);
INSERT INTO "Δ(hop)"
SELECT src, dst, sum("count(t)") AS "count(t)"
FROM (
SELECT delta.src, before.dst, delta."count(t)" * 1
FROM "Δ(link)" delta
JOIN link1 before ON (before.src = delta.dst)
UNION ALL
SELECT after.src, delta.dst, 1 * delta."count(t)"
FROM link2 after
JOIN "Δ(link)" delta ON (delta.src = after.dst)
) x(src, dst, "count(t)")
GROUP BY src, dst
HAVING sum("count(t)") <> 0;
SELECT * FROM "Δ(hop)" ORDER BY 1, 2;

-- Since the counting algorithm requires the link table "before"
image, as well as
-- the hop MV "after" image and delta(hop) to calculate delta(tri_hop), we must
-- maintain hop before releasing the snapshots used to update link.
CREATE TEMP TABLE hop2 AS SELECT * FROM hop;
BEGIN;
DELETE FROM hop2 t1
USING "Δ(hop)" t2
WHERE t1.src = t2.src
AND t1.dst = t2.dst
AND t1."count(t)" + t2."count(t)" = 0;
UPDATE hop2 t1
SET "count(t)" = t1."count(t)" + t2."count(t)"
FROM "Δ(hop)" t2
WHERE t1.src = t2.src
AND t1.dst = t2.dst;
INSERT INTO hop2
SELECT * FROM "Δ(hop)" t1
WHERE "count(t)" > 0
AND NOT EXISTS (SELECT * FROM hop2 t2 WHERE t2.src = t1.src AND
t2.dst = t1.dst);
COMMIT;

-- Show that we got to the same result with incremental maintenance as
with REFRESH.
REFRESH MATERIALIZED VIEW hop;
SELECT * FROM hop ORDER BY 1, 2;
SELECT * FROM hop2 ORDER BY 1, 2;

-- After the 1st level MV is run, we can calculate the delta for the 2nd level.
CREATE TEMP TABLE "Δ(tri_hop)"
(src text NOT NULL, dst text NOT NULL, "count(t)" int NOT NULL);
INSERT INTO "Δ(tri_hop)"
SELECT src, dst, sum("count(t)") AS "count(t)"
FROM (
SELECT delta.src, before.dst, delta."count(t)" * 1
FROM "Δ(hop)" delta
JOIN link1 before ON (before.src = delta.dst)
UNION ALL
SELECT after.src, delta.dst, 1 * delta."count(t)"
FROM hop2 after
JOIN "Δ(link)" delta ON (delta.src = after.dst)
) x(src, dst, "count(t)")
GROUP BY src, dst
HAVING sum("count(t)") <> 0;
SELECT * FROM "Δ(tri_hop)" ORDER BY 1, 2;

-- Now we're done with snapshots and all but the highest-level delta.
DROP TABLE link1, link2, "Δ(link)";
DROP TABLE hop1, hop2, "Δ(hop)";

-- At some later time the MV delta is processed "behind the scenes".
-- We can't do the demonstration maintenance against the MV, so copy it.
CREATE TEMP TABLE tri_hop2 AS SELECT * FROM tri_hop;
BEGIN;
DELETE FROM tri_hop2 t1
USING "Δ(tri_hop)" t2
WHERE t1.src = t2.src
AND t1.dst = t2.dst
AND t1."count(t)" + t2."count(t)" = 0;
UPDATE tri_hop2 t1
SET "count(t)" = t1."count(t)" + t2."count(t)"
FROM "Δ(tri_hop)" t2
WHERE t1.src = t2.src
AND t1.dst = t2.dst;
INSERT INTO tri_hop2
SELECT * FROM "Δ(tri_hop)" t1
WHERE "count(t)" > 0
AND NOT EXISTS (SELECT * FROM tri_hop2 t2 WHERE t2.src = t1.src
AND t2.dst = t1.dst);
COMMIT;

-- Show that we got to the same result with incremental maintenance as
with REFRESH.
REFRESH MATERIALIZED VIEW tri_hop;
SELECT * FROM tri_hop ORDER BY 1, 2;
SELECT * FROM tri_hop2 ORDER BY 1, 2;

-- Now we're done with the MV delta.
DROP TABLE tri_hop2, "Δ(tri_hop)";

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Adam Brusselback
adambrusselback@gmail.com
In reply to: Kevin Grittner (#9)
Re: Incrementally refreshed materialized view

On Mon, Sep 26, 2016 at 3:21 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Mon, Sep 26, 2016 at 2:04 PM, Rakesh Kumar
<rakeshkumar464@outlook.com> wrote:

Does PG have a concept of MV log, from where it can detect the
delta changes and apply incremental changes quickly.

That is what I am trying to work toward with the patch I cited in
an earlier post. Once some variation of that is in, the actual
incremental maintenance can be build on top of it. To see an
example of what would be done with such a delta relation for a
simple MV, using the count algorithm, see below:

Well I feel like I've learned a ton already reading through the links you
provided earlier and that example above.

I'm very interested in getting this into core. I'll look into what I need
to do to review. Not crazy familiar with C, as I mainly do Java
development. I'll see if I can help in any way though.

The main reason I was working on an alternative is because I need something
now rather than in a couple years, but I've been dealing with manually
creating the few I do need for my database. What I proposed above was just
me thinking about what could be done with things as they are. Obviously
it's junk compared to a real solution in-core. Would you consider my
approach even worth trying, or should I just suck it up and do things
manually for now and put that effort into getting incremental refresh into
core?

#11Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Adam Brusselback (#10)
Re: Incrementally refreshed materialized view

On Mon, Sep 26, 2016 at 3:16 PM, Adam Brusselback
<adambrusselback@gmail.com> wrote:

Well I feel like I've learned a ton already reading through the links you
provided earlier and that example above.

Yeah, I know that example can really help show what will happen
"under the covers", and make it more concrete. The theory that
it's based on seems brilliant to me. That fact that it's based on
relational algebra (RA) means that it is much more likely to be
implemented in a complete and correct manner than something ad hoc.
I know I started at this with a try (similar to yours) at
analyzing from scratch, then went and searched the literature.
When I came back to my ad hoc notes, the RA pointed out some holes
in the logic where corner cases would have been wrong because of
missed details. RA is well thought out and solid; it seems to me
to be the perfect thing to underlie a solution to this problem.

I'm very interested in getting this into core. I'll look into what I need to
do to review. Not crazy familiar with C, as I mainly do Java development.
I'll see if I can help in any way though.

Just testing it and reviewing the documentation for errors,
omissions, or just plain hard-to-follow language would be a big
help. Please post back about any problems getting things patched
and build.

The main reason I was working on an alternative is because I need something
now rather than in a couple years, but I've been dealing with manually
creating the few I do need for my database. What I proposed above was just
me thinking about what could be done with things as they are. Obviously it's
junk compared to a real solution in-core. Would you consider my approach
even worth trying, or should I just suck it up and do things manually for
now and put that effort into getting incremental refresh into core?

Oh, I've used plain tables and triggers many times myself. If you
need something now, you kind of have to go that route. The
algorithms I cited do provide an interesting potential alternative
for how to go about that, although operating a row at a time you
probably won't approach the speed of statement-level set logic for
statements that affect very many rows. :-(

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12hariprasath nallasamy
hariprasathnallasamy@gmail.com
In reply to: Kevin Grittner (#11)
Re: Incrementally refreshed materialized view

We also tried to achieve incremental refresh of materialized view and our
solution doesn't solve all of the use cases.

Players:
1) WAL
2) Logical decoding
3) replication slots
4) custom background worker

Two kinds of approaches :
1. Deferred refresh (oracle type of creating log table for each base tables
with its PK and agg's columns old and new values)
a) Log table for each base table has to be created and this log table
will keep track of delta changes.
b) UDF is called to refresh the view incrementally - this will
run original materialized view query with the tracked delta PK's in their
where clause. so only rows that are modified/inserted will be touched.
c) Log table will keep track of changed rows from the data given by
replication slot which uses logical decoding to decode from WAL.
d) Shared memory is used to maintain the relationship between the
view and its base table. In case of restart they are pushed to maintenance
table.

2. RealTime refresh (update the view whenever we get any change-sets
related to that base tables)
a) Delta data from the replication slot will be applied to view by
checking the relationship between our delta data and the view definiton.
Here also shared memory and maintenance table are used.
b) Work completed only for materialized views having single table.

Main disadvantage :
1) Data inconsistency when master failure and also slave doesn't have
replication slot as of now. But 2ndquard guys try to create slots in slave
using some concepts of failover slots. But that doesn't come along with PG
:(.
2) Sum, count and avg are implemented for aggregates(single table) and for
other aggs full refresh comes to play a role.
3) Right join implementation requires more queries to run on the top of
MV's.

So we are on a long way to go and dono whether this is the right path.

Only deferred refresh was pushed to github.
https://github.com/harry-2016/MV_IncrementalRefresh

I wrote a post regarding that in medium.
https://medium.com/@hariprasathnallsamy/postgresql-materialized-view-incremental-refresh-44d1ca742599

#13Nguyễn Trần Quốc Vinh
ntquocvinh@gmail.com
In reply to: hariprasath nallasamy (#12)
Re: Incrementally refreshed materialized view

Dear,

As it was recommended, we pushed our projects into github:
https://github.com/ntqvinh/PgMvIncrementalUpdate.

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all
underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying tables
to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

We hope that our projects may be helpful for someone!

TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
Website: http://it.ued.vn <http://it.ued.udn.vn&gt;; http://www.ued.vn
<http://www.ued.udn.vn&gt;; http://www.ued.udn.vn
LLKH: http://scv.ued.vn/~ntquocvinh <http://scv.ued.udn.vn/~ntquocvinh&gt;
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Website: http://it.ued.udn.vn; http://www.ued.vn <http://www.ued.udn.vn/&gt;;
http://www.ued.udn.vn
SCV: http://scv.ued.vn/~ntquocvinh <http://scv.ued.udn.vn/~ntquocvinh&gt;
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Tue, Sep 27, 2016 at 12:23 PM, hariprasath nallasamy <
hariprasathnallasamy@gmail.com> wrote:

Show quoted text

We also tried to achieve incremental refresh of materialized view and our
solution doesn't solve all of the use cases.

Players:
1) WAL
2) Logical decoding
3) replication slots
4) custom background worker

Two kinds of approaches :
1. Deferred refresh (oracle type of creating log table for each base
tables with its PK and agg's columns old and new values)
a) Log table for each base table has to be created and this log
table will keep track of delta changes.
b) UDF is called to refresh the view incrementally - this will
run original materialized view query with the tracked delta PK's in their
where clause. so only rows that are modified/inserted will be touched.
c) Log table will keep track of changed rows from the data given by
replication slot which uses logical decoding to decode from WAL.
d) Shared memory is used to maintain the relationship between the
view and its base table. In case of restart they are pushed to maintenance
table.

2. RealTime refresh (update the view whenever we get any change-sets
related to that base tables)
a) Delta data from the replication slot will be applied to view by
checking the relationship between our delta data and the view definiton.
Here also shared memory and maintenance table are used.
b) Work completed only for materialized views having single table.

Main disadvantage :
1) Data inconsistency when master failure and also slave doesn't have
replication slot as of now. But 2ndquard guys try to create slots in slave
using some concepts of failover slots. But that doesn't come along with PG
:(.
2) Sum, count and avg are implemented for aggregates(single table) and for
other aggs full refresh comes to play a role.
3) Right join implementation requires more queries to run on the top of
MV's.

So we are on a long way to go and dono whether this is the right path.

Only deferred refresh was pushed to github.
https://github.com/harry-2016/MV_IncrementalRefresh

I wrote a post regarding that in medium.
https://medium.com/@hariprasathnallsamy/postgresql-materialized-view-
incremental-refresh-44d1ca742599

#14Adam Brusselback
adambrusselback@gmail.com
In reply to: Nguyễn Trần Quốc Vinh (#13)
Re: Incrementally refreshed materialized view

On Thu, Sep 29, 2016 at 8:10 AM, Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com

wrote:

Dear,

As it was recommended, we pushed our projects into github:
https://github.com/ntqvinh/PgMvIncrementalUpdate.

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all
underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying tables
to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

We hope that our projects may be helpful for someone!

Very interesting. Does this support materialized views with recursive
queries? What about left joins? (not) exists? Aggregates? Window functions?
In reading up on the implementations in other databases, I was surprised by
some of the limitations imposed by DB2 / Oracle / Sql Server.

I'm trying to look through the code base to answer my questions, but it's
large enough that it may be easier to just ask first.

#15Nguyễn Trần Quốc Vinh
ntquocvinh@gmail.com
In reply to: Adam Brusselback (#14)
Re: Incrementally refreshed materialized view

Dear Adam.

There are read-me files within each projects. They support only inner-joins
and aggregations (sum, count, min, max, avg). The updating algorithm for
matviews with min/max must be differ from without ones. When there are min
and/or max, we have to use the base table, otherwise, not.

About 15 years ago we implemented for SPJ matvew-queries to generate
triggers in PL/pgSQL. We developed that project for queries with
aggregations about 4 years ago. Unfortunately we lost the last versions. We
have now only the version with some error when there are aggregations. For
SPJ queries it works well. We don't know is it useful or not. If yes, we
can upload for sharing. We didn't share it because we thought that our work
is not good enough for public use.

The projects that generate C-codes was "finished" 2 years ago.

Concern the project that generates triggers (doing incremental update) in
PL/pgSQL, we implemented the algorithm likes your suggestion in this
thread, i.e. at least one key of the base tables are added automatically
into the mat-view queries for further incremental updates.

TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
Website: http://it.ued.vn <http://it.ued.udn.vn&gt;; http://www.ued.vn
<http://www.ued.udn.vn&gt;; http://www.ued.udn.vn
LLKH: http://scv.ued.vn/~ntquocvinh <http://scv.ued.udn.vn/~ntquocvinh&gt;
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Website: http://it.ued.udn.vn; http://www.ued.vn <http://www.ued.udn.vn/&gt;;
http://www.ued.udn.vn
SCV: http://scv.ued.vn/~ntquocvinh <http://scv.ued.udn.vn/~ntquocvinh&gt;
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Thu, Sep 29, 2016 at 8:37 PM, Adam Brusselback <adambrusselback@gmail.com

Show quoted text

wrote:

On Thu, Sep 29, 2016 at 8:10 AM, Nguyễn Trần Quốc Vinh <
ntquocvinh@gmail.com> wrote:

Dear,

As it was recommended, we pushed our projects into github:
https://github.com/ntqvinh/PgMvIncrementalUpdate.

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all
underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying tables
to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

We hope that our projects may be helpful for someone!

Very interesting. Does this support materialized views with recursive
queries? What about left joins? (not) exists? Aggregates? Window functions?
In reading up on the implementations in other databases, I was surprised by
some of the limitations imposed by DB2 / Oracle / Sql Server.

I'm trying to look through the code base to answer my questions, but it's
large enough that it may be easier to just ask first.

#16Nguyễn Trần Quốc Vinh
ntquocvinh@gmail.com
In reply to: Nguyễn Trần Quốc Vinh (#15)
Re: Incrementally refreshed materialized view

We came to C and discontinued the version generating code in
PL/pgSQL because of
- Our testing showed that triggers in C give better performance than the
ones in PL/pgSQL. Our opinion may be wrong.
- If we can generate triggers and other updating codes in C, we may
integrate it into PostgreSQL source codes. We may be wrong too.

:)

We plan to do by the same way for queries with outer-joins next year. With
recursive queries we have no plan because we don't see any effective update
algorithm. The worst is that we have no fund to do with matviews and
PostgreSQL. All that just for fun! We have too many things to do each day.

TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
Website: http://it.ued.vn <http://it.ued.udn.vn&gt;; http://www.ued.vn
<http://www.ued.udn.vn&gt;; http://www.ued.udn.vn
LLKH: http://scv.ued.vn/~ntquocvinh <http://scv.ued.udn.vn/~ntquocvinh&gt;
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Website: http://it.ued.udn.vn; http://www.ued.vn <http://www.ued.udn.vn/&gt;;
http://www.ued.udn.vn
SCV: http://scv.ued.vn/~ntquocvinh <http://scv.ued.udn.vn/~ntquocvinh&gt;
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Thu, Sep 29, 2016 at 8:56 PM, Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com

Show quoted text

wrote:

Dear Adam.

There are read-me files within each projects. They support only
inner-joins and aggregations (sum, count, min, max, avg). The updating
algorithm for matviews with min/max must be differ from without ones. When
there are min and/or max, we have to use the base table, otherwise, not.

About 15 years ago we implemented for SPJ matvew-queries to generate
triggers in PL/pgSQL. We developed that project for queries with
aggregations about 4 years ago. Unfortunately we lost the last versions. We
have now only the version with some error when there are aggregations. For
SPJ queries it works well. We don't know is it useful or not. If yes, we
can upload for sharing. We didn't share it because we thought that our work
is not good enough for public use.

The projects that generate C-codes was "finished" 2 years ago.

Concern the project that generates triggers (doing incremental update) in
PL/pgSQL, we implemented the algorithm likes your suggestion in this
thread, i.e. at least one key of the base tables are added automatically
into the mat-view queries for further incremental updates.

TS. Nguyễn Trần Quốc Vinh
-----------------------------------------------
Chủ nhiệm khoa Tin học
Trường ĐH Sư phạm - ĐH Đà Nẵng
Website: http://it.ued.vn <http://it.ued.udn.vn&gt;; http://www.ued.vn
<http://www.ued.udn.vn&gt;; http://www.ued.udn.vn
LLKH: http://scv.ued.vn/~ntquocvinh <http://scv.ued.udn.vn/~ntquocvinh&gt;
ĐT: 0511.6-512-586
DĐ: 0914.78-08-98
------------------------------------------------
Nguyen Tran Quoc Vinh, PhD
Dean
Faculty of Information Technology
Danang University of Education
Website: http://it.ued.udn.vn; http://www.ued.vn <http://www.ued.udn.vn/&gt;
; http://www.ued.udn.vn
SCV: http://scv.ued.vn/~ntquocvinh <http://scv.ued.udn.vn/~ntquocvinh&gt;
Phone: (+84) 511.6-512-586
Mobile: (+84) 914.78-08-98

On Thu, Sep 29, 2016 at 8:37 PM, Adam Brusselback <
adambrusselback@gmail.com> wrote:

On Thu, Sep 29, 2016 at 8:10 AM, Nguyễn Trần Quốc Vinh <
ntquocvinh@gmail.com> wrote:

Dear,

As it was recommended, we pushed our projects into github:
https://github.com/ntqvinh/PgMvIncrementalUpdate.

1) Synchronous incremental update
- For-each-row triggers are generated for all changing events on all
underlying tables.

2) Asynchronous (deferred) incremental update
- Triggers are generated for all changing events on all underlying
tables to collect all changed rows
- Other codes are generated for each matview-query to update the matview.

We hope that our projects may be helpful for someone!

Very interesting. Does this support materialized views with recursive
queries? What about left joins? (not) exists? Aggregates? Window functions?
In reading up on the implementations in other databases, I was surprised by
some of the limitations imposed by DB2 / Oracle / Sql Server.

I'm trying to look through the code base to answer my questions, but it's
large enough that it may be easier to just ask first.