counting algorithm for incremental matview maintenance
In surveying the literature on $subject, I find that most of the
theoretical work related to how to incrementally update
materialized views based on the matview declaration was published
between 1988 and 1993. The best paper I have been able to find on
the topic was published in ACM SIGMOD in 1993[1]http://dl.acm.org/citation.cfm?id=170066, and covers two
algorithms: counting and DRed. The former should be very fast for
non-recursive views, but not very efficient for recursive views.
The latter algorithm is the other way around -- it looks like it
will do well with recursive views but generally be slower for
non-recursive views.
It does not seem feasible to me to implement both techniques in a
single one-year PostgreSQL release. In fact, if we have trouble
getting everyone onto the same page early, we might have to settle
for trying to just get some infrastructure into place, without
anything to actually make use of it. That would be unfortunate,
since Oracle added incremental maintenance of matviews to their
existing feature in 1999, and have been improving it regularly
since then. Many other products also have mature implementations
of this, and there seems to be a lot of demand for it in
PostgreSQL. In the best of circumstances it will take years for us
to catch up on this front.
What I propose for 9.4 is this:
During the first CF I would like to have a discussion to settle on
how best to do some things that the counting algorithm requires, so
that following the first CF I can try to implement that
infrastructure and get at least simple matviews working for
incremental maintenance, and have something ready for the second
CF. I can hack up a rough patch to suggest a possible approach for
purposes of discussion, but I'm not sure whether either that's
necessary or desirable.
----------
Needed infrastructure changes follow.
----------
First, the algorithm requires a new system column called
count_t (or similar). It would be optional (like oid is) would
only be used by tuples in matviews maintained by the counting
algorithm, and by delta relations used in the matview incremental
maintenance process. Tables and matviews not created or altered to
specify incremental maintenance would not carry this new system
column. Existing matviews from 9.3 would continue to work as
before until and unless they were ALTERed to specify incremental
maintenance; this form of ALTER would require rewrite of the
matview to add the count_t column.
This algorithm also requires that we have new execution node types
(or conditional behavior in existing nodes) to support processing
counts. It is worth noting that relations without a count_t column
will sometimes need to be processed using the new logic (in which
case each tuple is treated as having an implied count of one).
count_t in a matview must always be greater than zero. In delta
relations it can be negative (to reflect tuples being deleted or
the "before" image of an update) or positive (to reflect tuples
being inserted or the "after" image of an update).
Examples of the new behavior in nodes used by incremental
maintenance are:
* DISTINCT or GROUP BY: count_t in the result should represent the
number of tuples being combined.
* JOIN: count_t in the result is the product of multiplying the
counts from the pair of joined tuples.
* UNION (without ALL): count_t in the result should be the value
from an unmatched tuple or the sum of the counts of matched tuples,
with any result row with a zero count_t being omitted from the
result.
Note that for nodes other than UNION, the logic is identical except
for calculation of the count for the output.
The resulting matviews can be used in the traditional set logic of
PostgreSQL, without knowing about or caring about the count_t
column, exactly as before; the new system column is only used for
maintenance.
We could drive the triggering of incremental maintenance off of the
dependency information which is already stored, but for performance
we probably want to add a new pg_class flag to indicate that the
relation is referenced by a matview definition which specifies
incremental update. That would allow a fast path for skipping
other tests for DML on non-referenced relations, at the expense of
some additional catalog updates on some DDL.
----------
The above is everything that is a significant infrastructure change
needed for a first cut at incremental maintenance of materialized
views, driven by the view definition. Below is additional detail
about what will happen when incremental maintenance for a view is
specified, for those who wonder why these infrastructure changes
are needed.
----------
Essentially, the technique is based on relational algebra. This
allows previously existing theory to prove the correctness of the
technique and its optimizations. The addition of the count_t
column allows accurate identification of exactly which rows need to
be touched during incremental maintenance, with minimal overhead.
As the paper notes, these incremental maintenance techniques are a
heuristic; there will always be cases where it is faster to
regenerate the data from scratch than to use these incremental
techniques, so part of the goal of having the transactional REFRESH
(as described in a separate thread) is to allow some front-end
tests to attempt to recognize when an incremental approach will be
slower than a REFRESH and fall back onto the transactional REFRESH
technique seamlessly and fairly quietly. (My gut feeling on the
right level for an ereport of this is DEBUG1, although I can see
the argument for a LOG level entry.)
The original and modified versions of the relations (tables or
other matviews) which define a matview must be available to
calculate the matview deltas, so the snapshot information for this
must be available and registered until the matview delta has been
calculated. They can be released once the delta has been
established and before it has been applied to the matview. Initial
milestones in this development will focus on "eager" maintenance,
so this will not initially be a big issue, but will become more
important as we work on making more of the work asynchronous, so
that the foreground query is not held up maintaining data which is
allowed to be a little stale. This seems not entirely unrelated to
background worker processes and snapshot sharing.
I don't know whether we can get to this during 9.4 development, but
it might be worth at least considering as the other work is done:
some aggregate columns with a small amount of data kept while the
aggregate is being computed can be further optimized under this
technique by storing the working data in the tuple as some sort of
hidden or system column related to the aggregate. A few of the
obvious candidates for such optimization include count(), sum() and
avg(). Note that with this capability, especially if we ever
implement the query rewrite capability for matviews, we would have
a nice answer for the count(*) speed complaints.
Obviously, there will need to be some new syntax around CMV and AMV
to support specification of how the matview should be maintained.
Also, we are moving into a phase where it makes sense to start the
bikeshedding on what timings we should support, how information
about the desired timings should be stored, how "freshness" should
be tracked, and what syntax we put around all that. Frankly, I
think I will have my hands pretty full with the incremental
maintenance work, and I see the timing work as largely orthogonal,
so it would be great if anyone who was interested in that aspect of
things could take primary responsibility for moving that along.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, May 14, 2013 at 2:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
In surveying the literature on $subject, I find that most of the
theoretical work related to how to incrementally update
materialized views based on the matview declaration was published
between 1988 and 1993. The best paper I have been able to find on
the topic was published in ACM SIGMOD in 1993[1], and covers two
algorithms: counting and DRed. The former should be very fast for
non-recursive views, but not very efficient for recursive views.
The latter algorithm is the other way around -- it looks like it
will do well with recursive views but generally be slower for
non-recursive views.It does not seem feasible to me to implement both techniques in a
single one-year PostgreSQL release. In fact, if we have trouble
getting everyone onto the same page early, we might have to settle
for trying to just get some infrastructure into place, without
anything to actually make use of it. That would be unfortunate,
since Oracle added incremental maintenance of matviews to their
existing feature in 1999, and have been improving it regularly
since then. Many other products also have mature implementations
of this, and there seems to be a lot of demand for it in
PostgreSQL. In the best of circumstances it will take years for us
to catch up on this front.
#1 issue I have with current matview functionality is locking.
currently refresh takes out an access exclusive lock. so, question
is, do you think your proposal will be such that it will no longer
require taking out full lock for refresh purposes (either incremental
or otherwise)?
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Merlin Moncure <mmoncure@gmail.com> wrote:
#1 issue I have with current matview functionality is locking.
currently refresh takes out an access exclusive lock. so,
question is, do you think your proposal will be such that it will
no longer require taking out full lock for refresh purposes
(either incremental or otherwise)?
The right thread for *that* question is "Differential
(transactional) REFRESH"; however, I might as well say here that I
don't think we want to get rid of the (faster) version that just
replaces the current heap when we add the (slower) option to
REFRESH it transactionally.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 15, 2013 at 11:33 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Merlin Moncure <mmoncure@gmail.com> wrote:
#1 issue I have with current matview functionality is locking.
currently refresh takes out an access exclusive lock. so,
question is, do you think your proposal will be such that it will
no longer require taking out full lock for refresh purposes
(either incremental or otherwise)?The right thread for *that* question is "Differential
(transactional) REFRESH"; however, I might as well say here that I
don't think we want to get rid of the (faster) version that just
replaces the current heap when we add the (slower) option to
REFRESH it transactionally.
sorry, didn't notice that thread. agreed, that seems good candidate
for user input to refresh command to manage the tradeoff.
well, do you expect the application of differential refresh to be
automatic? lockless + differential refresh would be game changer in
terms of how I build up data for analytics.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Merlin Moncure <mmoncure@gmail.com> wrote:
Kevin Grittner <kgrittn@ymail.com> wrote:
Merlin Moncure <mmoncure@gmail.com> wrote:
#1 issue I have with current matview functionality is locking.
currently refresh takes out an access exclusive lock. so,
question is, do you think your proposal will be such that it
will no longer require taking out full lock for refresh
purposes (either incremental or otherwise)?The right thread for *that* question is "Differential
(transactional) REFRESH"; however, I might as well say here that
I don't think we want to get rid of the (faster) version that
just replaces the current heap when we add the (slower) option
to REFRESH it transactionally.sorry, didn't notice that thread. agreed, that seems good
candidate for user input to refresh command to manage the
tradeoff.well, do you expect the application of differential refresh to be
automatic?
I expect considerable bikeshedding on this, but my preference would
be to allow syntax for specifying which technique is desired on
the REFRESH command, and in the absence of specification I would
prefer that it default to the current technique for a matview which
is not yet populated and the transactional technique for a
populated matview. We could associate some property with the
matview for default REFRESH technique, but I don't know whether
that's worth the trouble.
lockless + differential refresh would be game changer in terms of
how I build up data for analytics.
Yeah, it's definitely something I would have liked to have in the
initial release, but I tried to keep the scope very limited given
how little time there was left in the release cycle when I got a
chance to start on it. Adding this seemed to be just a little too
much.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kevin,
It's fairly common for matviews to be constructed such that updates to
them are strictly appends. For example, a matview which has a daily
summary would just get appended to each day, and existing rows would not
change barring a major historical database cleanup.
It seems like we could ... and ought to ... optimize for this pattern
somehow for incremental updates. That is, if the user knows that we're
going to be only appending new rows and not modifying any old ones, s/he
ought to be able to tell the database that somehow and avoid the
overhead of checking. While the overhead of checking a count wouldn't
be that high for a few hundred rows, I've dealt with matviews which were
thousands to millions of rows.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMdb954018adba74f79f66e5ad0ff1f67803d8fd1615123d18902e6acd9ff7e85833c4e3da7585cc0237e7a29d84e72b3c@asav-1.01.com
Josh Berkus <josh@agliodbs.com> wrote:
It's fairly common for matviews to be constructed such that
updates to them are strictly appends. For example, a matview
which has a daily summary would just get appended to each day,
and existing rows would not change barring a major historical
database cleanup.It seems like we could ... and ought to ... optimize for this
pattern somehow for incremental updates. That is, if the user
knows that we're going to be only appending new rows and not
modifying any old ones, s/he ought to be able to tell the
database that somehow and avoid the overhead of checking. While
the overhead of checking a count wouldn't be that high for a few
hundred rows, I've dealt with matviews which were thousands to
millions of rows.
Thanks for the suggestion; I will keep an eye out for ways this
insight might allow an optimization. I think there might be some
misunderstanding of the counting algorithm, though -- there is no
need to do a sequential pass through the matview examining the
counts.
I don't want to replicate the content of a fairly dense (in the
sense of having a lot of content per page) 10 page computer science
paper in this email, but for purposes of illustration I will take a
very simple case and show how it works. (This is not geared toward
your particular case, because that could get kinda long to explain
here and now, but hopefully this will give an insight into the
technique overall.)
Let's say there is a table and matview like this:
create table foo (fooid int primary key, val int not null);
create materialized view bar as select distinct val from foo;
Let's say there are millions of rows in both, and that we have
flagged the view for incremental maintenance. (Syntax omitted to
avoid distracting bikeshedding on that when the point is the
algorithm.)
Now, someone runs this:
update foo set val = val + 1 where fooid between 1 and 10;
What will happen is this:
Since foo will be flagged as a relation which is referenced by an
incrementally maintained matview, a delta relation will be
materialized for this update, which will contain the net change to
the underlying table in the count_t system column. "Before" tuples
will have a count of -1; "after" tuples will have a count of 1.
Then the query defining the view will be run *against the delta*,
resulting in a relation with a count_t column reflecting the net
change for each val. Anything with a zero for the net change will
be dropped. We will run a logical UNION of this relation and the
bar matview. In this case, we obviously want this to be done in a
way that for each row in this "net change" relation, we do an index
scan against the bar matview; if not found, we insert the new row
into the matview with its count from the net change relation.
(That had better be positive or we have a bug -- so elog ERROR if
negative.) If bar does contain a matching row, update count_t in
that row with the sum of its old value and the value from the "net
change" relation. Of course, that new value also had better be
positive or zero -- if zero we delete the old row rather than
updating count_t.
The count_t column saves us from having to scan foo for all the old
val values. It does not require any scan of the entire bar
matview. It allows us to zero in on exactly the right rows, and
lets us know what needs doing.
Clearly we want the planner to find the best plans for the interim
steps rather than hard-coding particular rule-based plans; I just
used an example where it's pretty clear what a reasonable plan
would be.
Hopefully this makes it fairly clear that the only thing that an
optimization around the "append-only" assertion for a matview would
be the ability to skip the probe for an existing record *related to
rows which are in the delta*. As long as there is reasonable
indexing on the matview, maintenance for the append-only case would
not involve scanning the entire matview.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
I wrote:
Let's say there is a table and matview like this:
create table foo (fooid int primary key, val int not null);
create materialized view bar as select distinct val from foo;
Some of the subsequent text doesn't make sense unless that
materialized view has an index, like this:
create unique index bar_val on bar (val);
Without such an index, it would need to use a plan which scanned
the entire bar relation for any maintenance. Apologies for the
omission and any confusion it caused.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 14, 2013 at 3:52 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
We could drive the triggering of incremental maintenance off of the
dependency information which is already stored, but for performance
we probably want to add a new pg_class flag to indicate that the
relation is referenced by a matview definition which specifies
incremental update. That would allow a fast path for skipping
other tests for DML on non-referenced relations, at the expense of
some additional catalog updates on some DDL.
I'm afraid this might require creating a matview or updating the
definition of a matview to refer to different relations to take
AccessExclusiveLock on those relations, in order to avoid SnapshotNow
problems while updating this flag for those relations, and I think
that's probably unacceptable. Some thought may be needed here to come
up with a good solution.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:
Kevin Grittner <kgrittn@ymail.com> wrote:
We could drive the triggering of incremental maintenance off of the
dependency information which is already stored, but for performance
we probably want to add a new pg_class flag to indicate that the
relation is referenced by a matview definition which specifies
incremental update. That would allow a fast path for skipping
other tests for DML on non-referenced relations, at the expense of
some additional catalog updates on some DDL.I'm afraid this might require creating a matview or updating the
definition of a matview to refer to different relations to take
AccessExclusiveLock on those relations, in order to avoid SnapshotNow
problems while updating this flag for those relations, and I think
that's probably unacceptable. Some thought may be needed here to come
up with a good solution.
Thanks for the feedback.
I had been thinking that such a flag would be the moral equivalent
of such existing flags as relhaspkey, relhasrules, relhastriggers,
and relhassubclass. Those all require owner rights to change, and
perhaps we don't want to require that a user be the owner of a
table to define a materialized view which references that table and
specifies incremental update. On the other hand, we might not want
to let just any old user who has SELECT permission on a table to
specify that it feeds an incrementally updated matview, since there
is no escaping the fact that extra work will be needed for DML
against that table if it is doing that. I seem to recall that at
least one other product requires the owner of a table to ALTER it
to set a flag specifying that the table is allowed to be used to
back incrementally updated matviews; perhaps that's the way to go?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 16, 2013 at 8:33 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
Kevin Grittner <kgrittn@ymail.com> wrote:
We could drive the triggering of incremental maintenance off of the
dependency information which is already stored, but for performance
we probably want to add a new pg_class flag to indicate that the
relation is referenced by a matview definition which specifies
incremental update. That would allow a fast path for skipping
other tests for DML on non-referenced relations, at the expense of
some additional catalog updates on some DDL.I'm afraid this might require creating a matview or updating the
definition of a matview to refer to different relations to take
AccessExclusiveLock on those relations, in order to avoid SnapshotNow
problems while updating this flag for those relations, and I think
that's probably unacceptable. Some thought may be needed here to come
up with a good solution.Thanks for the feedback.
I had been thinking that such a flag would be the moral equivalent
of such existing flags as relhaspkey, relhasrules, relhastriggers,
and relhassubclass. Those all require owner rights to change, and
perhaps we don't want to require that a user be the owner of a
table to define a materialized view which references that table and
specifies incremental update. On the other hand, we might not want
to let just any old user who has SELECT permission on a table to
specify that it feeds an incrementally updated matview, since there
is no escaping the fact that extra work will be needed for DML
against that table if it is doing that. I seem to recall that at
least one other product requires the owner of a table to ALTER it
to set a flag specifying that the table is allowed to be used to
back incrementally updated matviews; perhaps that's the way to go?
Possibly. That at least has the advantage of transparency: if you do
ALTER TABLE wunk ENABLE DELTA QUEUE or somesuch syntax, it's very
clear that you're buying an AccessExclusiveLock. And while
AccessExclusiveLocks are not a lot of fun, one that you know is coming
is a lot better than one that comes as a surprise.
I feel like it would be nicer, though, to come up with some trick that
avoids the need to update the referenced table's pg_class entry
altogether. I don't immediately have a good idea, but I'll mull it
over and see if I come up with anything.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wednesday, May 15, 2013 1:22 AM Kevin Grittner wrote:
Good explanation for understanding the initial concept of incremental update
of matviews.
The original and modified versions of the relations (tables or
other matviews) which define a matview must be available to
calculate the matview deltas, so the snapshot information for this
must be available and registered until the matview delta has been
calculated. They can be released once the delta has been
established and before it has been applied to the matview.
Here by modified versions of the relations, do you mean to say delta
relations for recording changes.
Could you elaborate a bit about snapshot information, is this snapshot is
for delta relation, when will it acquire snapshot information to
Update matviews?
Initial
milestones in this development will focus on "eager" maintenance,
so this will not initially be a big issue, but will become more
important as we work on making more of the work asynchronous, so
that the foreground query is not held up maintaining data which is
allowed to be a little stale. This seems not entirely unrelated to
background worker processes and snapshot sharing.
With Regards,
Amit Kapila.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thursday, May 16, 2013 7:02 PM Kevin Grittner wrote:
Josh Berkus <josh@agliodbs.com> wrote:
Let's say there is a table and matview like this:
create table foo (fooid int primary key, val int not null);
create materialized view bar as select distinct val from foo;
Let's say there are millions of rows in both, and that we have
flagged the view for incremental maintenance. (Syntax omitted to
avoid distracting bikeshedding on that when the point is the
algorithm.)
Now, someone runs this:
update foo set val = val + 1 where fooid between 1 and 10;
What will happen is this:
Since foo will be flagged as a relation which is referenced by an
incrementally maintained matview, a delta relation will be
materialized for this update, which will contain the net change to
the underlying table in the count_t system column.
How and when will it clear old rows of delta relation especially when there
are more than one matview is defined on table and how will it maintain from
where to start
second time refresh from this delta relation to matview.
With Regards,
Amit Kapila.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila <amit.kapila@huawei.com> wrote:
On Wednesday, May 15, 2013 1:22 AM Kevin Grittner wrote:
Good explanation for understanding the initial concept of
incremental update of matviews.
Thanks. This is one of those topics where it takes a lot of time
going over highly technical papers to really get your head around
it. I'm hoping some simple examples will give a decent intuitive
grasp for those who want that but don't want to invest the time to
cover all the details.
At some point a README file will be needed. I will probably start
with a Wiki that can evolve during development and be used to help
created the README file near the end of the release cycle. Dan and
I did that for SSI and it seemed to work reasonably well.
The original and modified versions of the relations (tables or
other matviews) which define a matview must be available to
calculate the matview deltas, so the snapshot information for
this must be available and registered until the matview delta
has been calculated. They can be released once the delta has
been established and before it has been applied to the matview.Here by modified versions of the relations, do you mean to say
delta relations for recording changes.
During calculation of the deltas to apply to the matviews, it must
be possible to query the referenced tables from the perspective of
both the "before" and "after" versions of the data.
Could you elaborate a bit about snapshot information, is this
snapshot is for delta relation
I don't think we need to use snapshots to read the deltas -- those
contain the information about the transition from one state to
another, rather than any persistent state which would be visible to
the user. I see them more like tuple stores created as interim
steps in query execution.
when will it acquire snapshot information to Update matviews?
In early milestones the work will be done in the context of
completing a statement or a transaction, and the MVCC data used to
update matviews will be from that context -- much like for changes
made within a trigger.
Once we move on to asynchronous matview maintenance it seems pretty
clear that we need to have queues and background worker processes.
If *deltas for the matview changes* are derived in the originating
transaction we might want one queue of deltas for each target
matview. I don't think the process applying the deltas needs to do
anything unusual or special about acquiring snapshots as it
processes the delta from each transaction, unless we see some way
to optimize things once we're hip-deep in the details.
We could off-load the delta calculations for the matviews from the
originating transaction, but that would require a separate set of
queues background workers, and it would require that snapshots from
the transaction remain registered so that they are valid for use by
the delta calculations. It's not clear whether the benefit of that
would justify the extra complexity. That would be, essentially,
another form of parallelizing the work of a connection; so I'm
leaving that as *possible* work later on, which is only likely to
be worthwhile if there is enough infrastructure from other work on
parallelization to reduce the scope here.
Let me know if that doesn't address what you wanted to know.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila <amit.kapila@huawei.com> wrote:
On Thursday, May 16, 2013 7:02 PM Kevin Grittner wrote:
Let's say there is a table and matview like this:
create table foo (fooid int primary key, val int not null);
create materialized view bar as select distinct val from foo;Let's say there are millions of rows in both, and that we have
flagged the view for incremental maintenance. (Syntax omitted
to avoid distracting bikeshedding on that when the point is the
algorithm.)Now, someone runs this:
update foo set val = val + 1 where fooid between 1 and 10;
What will happen is this:
Since foo will be flagged as a relation which is referenced by
an incrementally maintained matview, a delta relation will be
materialized for this update, which will contain the net change
to the underlying table in the count_t system column.How and when will it clear old rows of delta relation especially
when there are more than one matview is defined on table
There will not be a permanent delta relation -- that is something
that will exist in a tuple store or other transient structure for
the duration it is needed. Once the incremental maintenance code
iterates through all affected matviews, the delta can be destroyed.
and how will it maintain from where to start second time refresh
from this delta relation to matview.
I'm not clear on what you're asking here. If it's about how
incremental maintenance will be coordinated with any explicit
REFRESH, it seems to me that at the point a REFRESH starts, any
changes which are from commits too late to be included in the
snapshot used for the REFRESH would need to block or be queued
until completion of the REFRESH (depending on whether the view is
using synchronous or asynchronous maintenance), and any unapplied
matview deltas from before that could be discarded.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/5/17 Kevin Grittner <kgrittn@ymail.com>:
During calculation of the deltas to apply to the matviews, it must
be possible to query the referenced tables from the perspective of
both the "before" and "after" versions of the data.
[..]
I don't think the process applying the deltas needs to do
anything unusual or special about acquiring snapshots as it
processes the delta from each transaction, unless we see some way
to optimize things once we're hip-deep in the details.
Note that the basic count algorithm assumes real-serializable
transactions for correctness. Example:
* The example matview’s defining query is a simple join: SELECT * FROM
A JOIN B ON A.a = B.b.
* For simplicity, assume that the matview is initially empty.
* Assume a non real-serializable isolation mode, e.g., REPEATABLE READ.
* Assume that there are two transactions T1 and T2 running in
parallel, that don’t see each other’s changes.
* Both T1 and T2 insert a row in A and B, respectively, in such a way
that the combination of the rows is matched by the join condition
(i.e., the row inserted by T1 has A.a equal to the B.b of the row
inserted by T2).
* The count algorithm will not add anything to the matview as part of
the execution of T1 (because the new row in A doesn’t join with any
rows in B that T1 observes).
* Idem for T2 (nothing will be added).
* After T1 and T2 completed, a new transaction will see an
inconsistency: A and B contain rows that are supposed to result in a
row in the matview, but it doesn’t contain any.
Each of the following would fix this problem:
(1) Real-serializability must be required for all transactions that
update any base table of a matview’s that is maintained using the
count algorithm.
(2) The matview’s defining query and the corresponding base tables
must adhere to certain (very commonly satisfied) conditions (for
example, if there is an FK between A.a and B.b, the example cannot be
made to fail; The FK must not be deferred in the case of
right-after-each-statement matview maintenance).
(3) The count algorithm must be implemented in a way that understands
MVCC internals: Reading the base tables must be done using a technique
that reads all rows (i.e., also the ones not visible to the current
transaction), and the xmin/xmax/etc of the updated rows in the matview
must be set in a smart way, so as to yield visibility results that
correspond to the visibility of the “originating” rows in the base
tables. Calculation of the matview deltas (especially the part where
the base tables are inspected for joining rows) must in this case
probably be done in a serialized manner.
Nicolas
--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
2013/5/17 Kevin Grittner <kgrittn@ymail.com>:
During calculation of the deltas to apply to the matviews, it must
be possible to query the referenced tables from the perspective of
both the "before" and "after" versions of the data.[..]
I don't think the process applying the deltas needs to do
anything unusual or special about acquiring snapshots as it
processes the delta from each transaction, unless we see some way
to optimize things once we're hip-deep in the details.Note that the basic count algorithm assumes real-serializable
transactions for correctness. Example:* The example matview’s defining query is a simple join: SELECT * FROM
A JOIN B ON A.a = B.b.
* For simplicity, assume that the matview is initially empty.
* Assume a non real-serializable isolation mode, e.g., REPEATABLE READ.
* Assume that there are two transactions T1 and T2 running in
parallel, that don’t see each other’s changes.
* Both T1 and T2 insert a row in A and B, respectively, in such a way
that the combination of the rows is matched by the join condition
(i.e., the row inserted by T1 has A.a equal to the B.b of the row
inserted by T2).
* The count algorithm will not add anything to the matview as part of
the execution of T1 (because the new row in A doesn’t join with any
rows in B that T1 observes).
* Idem for T2 (nothing will be added).
* After T1 and T2 completed, a new transaction will see an
inconsistency: A and B contain rows that are supposed to result in a
row in the matview, but it doesn’t contain any.
Good point.
It might be hard to detect when this type of race condition exists,
since it could be hidden inside a function. Any thoughts on that?
Each of the following would fix this problem:
(1) Real-serializability must be required for all transactions that
update any base table of a matview’s that is maintained using the
count algorithm.
Agreed. I wonder whether we should have an option to constraint
DML on a table to serializable transactions, and an option for a
matview to flag that its incremental update depends on that. This
strategy would automatically detect problems hidden in functions,
whether or not they were explicitly recognized issues with the
matview definition.
It is clear, however, that this cannot be the only supported
approach.
(2) The matview’s defining query and the corresponding base tables
must adhere to certain (very commonly satisfied) conditions (for
example, if there is an FK between A.a and B.b, the example cannot be
made to fail; The FK must not be deferred in the case of
right-after-each-statement matview maintenance).
Agreed. That would cover a high percentage of cases regardless of
transaction isolation level. Knowing when you needed such a
constraint to make incremental maintenance safe would be hard,
though.
(3) The count algorithm must be implemented in a way that understands
MVCC internals: Reading the base tables must be done using a technique
that reads all rows (i.e., also the ones not visible to the current
transaction), and the xmin/xmax/etc of the updated rows in the matview
must be set in a smart way, so as to yield visibility results that
correspond to the visibility of the “originating” rows in the base
tables. Calculation of the matview deltas (especially the part where
the base tables are inspected for joining rows) must in this case
probably be done in a serialized manner.
I will look at this option some more, but on the face of it, I'm
not quite following how it would work; and it sounds invasive and
fragile. If you know of any paper on this approach you could point
me to, or if you could sketch it out in a little more detail, I
would appreciate that. While (1) and (2) are straightforward
approaches, either of which would clearly address the race
conditions, I fear that there are those who would not be happy with
a requirement that one or the other of those are required in order
to use incremental maintenance -- so it would be nice to have a
fall-back strategy.
There will, at least for the next several releases, be matviews
defined with queries for which we cannot support incremental
maintenance -- REFRESH will continue to be the only way to deal
with these until we enhance incremental maintenance to support them.
For example, in 9.4 I don't intend to support incremental
maintenance of matviews defined with recursive queries. To me it
makes sense to prohibit incremental maintenance of a matview for
which there is no strategy to deal with concurrency problems.
Thoughts?
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 17, 2013 at 4:25 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
(3) The count algorithm must be implemented in a way that understands
MVCC internals: Reading the base tables must be done using a technique
that reads all rows (i.e., also the ones not visible to the current
transaction), and the xmin/xmax/etc of the updated rows in the matview
must be set in a smart way, so as to yield visibility results that
correspond to the visibility of the “originating” rows in the base
tables. Calculation of the matview deltas (especially the part where
the base tables are inspected for joining rows) must in this case
probably be done in a serialized manner.I will look at this option some more, but on the face of it, I'm
not quite following how it would work; and it sounds invasive and
fragile.
I don't believe it would be that problematic if deltas:
1 - contains all changes, matching join conditions or not, that could
potentially alter the matview. This includes the example's alterations
since the columns being altered were part of the matview's definition,
but it would not cover updates to columns that were not part of the
definition (ie: not involved in the join or the selected rows).
2 - each update is marked with its xid
Then, "serialization" can be achieved by only applying committed xids,
discarding rolled back xids, and evaluating join satisfaction only
during the updates, and not during delta logging.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/5/17 Kevin Grittner <kgrittn@ymail.com>:
Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
Note that the basic count algorithm assumes real-serializable
transactions for correctness. Example:
[..]
Good point.
It might be hard to detect when this type of race condition exists,
since it could be hidden inside a function. Any thoughts on that?
I think that matviews with defining queries that use functions for
which the system can not prove whether they are safe for the given
incremental maintenance technique (e.g., count algorithm) and
situation (e.g., mandatory isolation level), should not be allowed to
use incremental maintenance of that type.
Although I have only limited practical matview experience (with SQL
Server (2005?) only, there it is called “indexed views”): I assume
that in most other DBMSs, the queries that can be used for defining
incrementally maintained matviews are very limited (they are in SQL
Server).
Each of the following would fix this problem:
(2) The matview’s defining query and the corresponding base tables
must adhere to certain (very commonly satisfied) conditions (for
example, if there is an FK between A.a and B.b, the example cannot be
made to fail; The FK must not be deferred in the case of
right-after-each-statement matview maintenance).Agreed. That would cover a high percentage of cases regardless of
transaction isolation level. Knowing when you needed such a
constraint to make incremental maintenance safe would be hard,
though.
Reasoning about it becomes easier when you have a clean definition of
which queries you accept, and which ones you don’t (see below). Then
you can look at your definition and say “I allow X because it is
always correct I don’t allow Y because I didn’t prove it to be always
correct (yet).”
(3) The count algorithm must be implemented in a way that understands
MVCC internals: Reading the base tables must be done using a technique
that reads all rows (i.e., also the ones not visible to the current
transaction), and the xmin/xmax/etc of the updated rows in the matview
must be set in a smart way, so as to yield visibility results that
correspond to the visibility of the “originating” rows in the base
tables. Calculation of the matview deltas (especially the part where
the base tables are inspected for joining rows) must in this case
probably be done in a serialized manner.I will look at this option some more, but on the face of it, I'm
not quite following how it would work; and it sounds invasive and
fragile. If you know of any paper on this approach you could point
me to, or if you could sketch it out in a little more detail, I
would appreciate that.
Sorry, these are my own musings: I have never found any paper about it.
After a bit more thought, I think it wouldn’t be so easy for the count
algorithm: It would be necessary to express (using physical MVCC rows)
something like “you can only see this (logical) row if both
transaction T1 and T2 are visible to you.” Disjunction would be easy:
create two physical rows with the same data, one that is visible if T1
is visible, and one that is visible if T2 is visible. Unfortunately,
we need conjunction here :-(.
(Upgrading the expressiveness of the MVCC-related information in the
rows would help, but as that is more invasive than anyone wants to
think about, I restrain those thoughts to the confines my own head
;-).)
OTOH, I think that this technique could be used in the case of simple
“aggregation” matviews (without joins, or *maybe* when all joins are
constrained by FKs). (I assume an implementation such as the one that
I sketched in a previous post[1]<URL:/messages/by-id/CAP-rdTY+VZ8T_8JBvsksDRdz-_wr5ef80M-_UKgu5bHw+rkjpg@mail.gmail.com> a few months back.)
[1]: <URL:/messages/by-id/CAP-rdTY+VZ8T_8JBvsksDRdz-_wr5ef80M-_UKgu5bHw+rkjpg@mail.gmail.com>
There will, at least for the next several releases, be matviews
defined with queries for which we cannot support incremental
maintenance -- REFRESH will continue to be the only way to deal
with these until we enhance incremental maintenance to support them.
For example, in 9.4 I don't intend to support incremental
maintenance of matviews defined with recursive queries. To me it
makes sense to prohibit incremental maintenance of a matview for
which there is no strategy to deal with concurrency problems.
+1
I think you need a strict definition, specific to your incremental
maintenance algorithm, of which queries you accept and which ones you
don’t. Such a definition could be based on SQL syntax, or on any of
the intermediate forms that are produced by the
parser/rewriter/analyzer/planner/etc. I think that basing your
definition on a later processing stage (e.g., after subqueries are
flattened, after views are expanded, or after “redundant” predicates
are removed) would increase the amount of queries that are allowed,
but might make it somewhat more difficult for a user to predict
whether a query allows incremental mantenance or not. (It might also
make your definition depend on “internal stuff” that might change from
one release to the next.)
Nicolas
--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
2013/5/17 Kevin Grittner <kgrittn@ymail.com>:
Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
Note that the basic count algorithm assumes real-serializable
transactions for correctness. Example:[..]
Good point.
It might be hard to detect when this type of race condition exists,
since it could be hidden inside a function. Any thoughts on that?I think that matviews with defining queries that use functions for
which the system can not prove whether they are safe for the given
incremental maintenance technique (e.g., count algorithm) and
situation (e.g., mandatory isolation level), should not be allowed to
use incremental maintenance of that type.
It's clear enough that any IMMUTABLE function should be safe, but
for anything less I don't see how we could trust it short of having
a "white list" of which functions are known to be safe, or creating
a new property of for functions. Not something I want to require
up front; maybe later....
Although I have only limited practical matview experience (with SQL
Server (2005?) only, there it is called “indexed views”): I assume
that in most other DBMSs, the queries that can be used for defining
incrementally maintained matviews are very limited (they are in SQL
Server).
In researching this feature I have found a few comments by Oracle
users that when incremental maintenance was first added (in 1999)
the queries which it could support were severely limited, but that
they have expanded coverage with each major release. People don't
seem to have too many complaints about what is covered now;
although how much of that is adapting to the limits and how much is
due to expansion of the limits is not clear. You would think that
somewhere they would define what the limits for a given version
are, but I have not stumbled across anything like that.
Each of the following would fix this problem:
(2) The matview’s defining query and the corresponding base tables
must adhere to certain (very commonly satisfied) conditions (for
example, if there is an FK between A.a and B.b, the example cannot be
made to fail; The FK must not be deferred in the case of
right-after-each-statement matview maintenance).Agreed. That would cover a high percentage of cases regardless of
transaction isolation level. Knowing when you needed such a
constraint to make incremental maintenance safe would be hard,
though.Reasoning about it becomes easier when you have a clean definition of
which queries you accept, and which ones you don’t (see below). Then
you can look at your definition and say “I allow X because it is
always correct I don’t allow Y because I didn’t prove it to be always
correct (yet).”
Agreed. I'm just haven't figured out precisely how concurrency
issues adjust those boundaries yet.
(3) The count algorithm must be implemented in a way that understands
MVCC internals: Reading the base tables must be done using a technique
that reads all rows (i.e., also the ones not visible to the current
transaction), and the xmin/xmax/etc of the updated rows in the matview
must be set in a smart way, so as to yield visibility results that
correspond to the visibility of the “originating” rows in the base
tables. Calculation of the matview deltas (especially the part where
the base tables are inspected for joining rows) must in this case
probably be done in a serialized manner.I will look at this option some more, but on the face of it, I'm
not quite following how it would work; and it sounds invasive and
fragile. If you know of any paper on this approach you could point
me to, or if you could sketch it out in a little more detail, I
would appreciate that.Sorry, these are my own musings: I have never found any paper
about it.
[musings]
Will review and ponder these and what Claudio offered, and see
where that takes me.
Thanks to both of you for the input!
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kevin,
The count_t column saves us from having to scan foo for all the old
val values. It does not require any scan of the entire bar
matview. It allows us to zero in on exactly the right rows, and
lets us know what needs doing.
This sounds like a fairly good approach. It would require a couple of
things though:
1) admins would need to be able to enable and disable incremental
updating of matviews, so that if the creation of delta tables is bogging
down writes, they can disable them temporarily as a performance workaround.
2) if an admin enables incremental updating of an existing matview, it
would need to be REFRESHed immediately before we could start
incrementally updating it. So an ENABLE should call a REFRESH.
Hopefully this makes it fairly clear that the only thing that an
optimization around the "append-only" assertion for a matview would
be the ability to skip the probe for an existing record *related to
rows which are in the delta*. As long as there is reasonable
indexing on the matview, maintenance for the append-only case would
not involve scanning the entire matview.
Yeah, given what you've explained, my first inclination would be just to
let the counting do its thing and see how slow/expensive it is before we
try further optimizations.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMfc6621ea914d28adffa94a6e8a26e2c164cdd60f76dd7993df49daa4814cfd91c5bfa15c164cfcfd4b4e3e0239296347@asav-2.01.com
Josh Berkus <josh@agliodbs.com> wrote:
This sounds like a fairly good approach. It would require a
couple of things though:1) admins would need to be able to enable and disable incremental
updating of matviews, so that if the creation of delta tables is
bogging down writes, they can disable them temporarily as a
performance workaround.
Yes. This is the sort of thing I plan to put on ALTER MATERIALIZED
VIEW and perhaps an ALTER TABLE option which allows the table to
generate deltas. Turning off the table option should probably have
a CASCADE option with the sort of user feedback DROP options have.
Some other products have AMV options that do things like allow an
attempt to scan a matview automatically cause a REFRESH if
referenced tables have changed since the last REFRESH. That seems
like something to defer until we have a more sophisticated notion
of freshness (or staleness -- if we prefer to view the glass as
half-empty).
2) if an admin enables incremental updating of an existing
matview, it would need to be REFRESHed immediately before we
could start incrementally updating it. So an ENABLE should call
a REFRESH.
Right. We would need to coordinate the start of the incremental
maintenance with the snapshot used for the REFRESH. As far as I
can tell, it could be either type of REFRESH (truncate and create a
new heap or create the new data contents in a temporary relation
and use a differential DELETE and INSERT technique), as long as the
snapshot used for determining the new contents is used to determine
which transactions can provide deltas.
Yeah, given what you've explained, my first inclination would be
just to let the counting do its thing and see how slow/expensive
it is before we try further optimizations.
Agreed. The counting algorithm itself has some optional
optimizations that I'm not sure we'll get to in 9.4. Anything this
specialized should be evaluated only after we have all the more
"generic" optimizations in place.
Thanks for the feedback!
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/5/17 Claudio Freire <klaussfreire@gmail.com>:
On Fri, May 17, 2013 at 4:25 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
(3) The count algorithm must be implemented in a way that understands
MVCC internals: Reading the base tables must be done using a technique
that reads all rows (i.e., also the ones not visible to the current
transaction), and the xmin/xmax/etc of the updated rows in the matview
must be set in a smart way, so as to yield visibility results that
correspond to the visibility of the “originating” rows in the base
tables. Calculation of the matview deltas (especially the part where
the base tables are inspected for joining rows) must in this case
probably be done in a serialized manner.I will look at this option some more, but on the face of it, I'm
not quite following how it would work; and it sounds invasive and
fragile.I don't believe it would be that problematic if deltas:
1 - contains all changes, matching join conditions or not, that could
potentially alter the matview. This includes the example's alterations
since the columns being altered were part of the matview's definition,
but it would not cover updates to columns that were not part of the
definition (ie: not involved in the join or the selected rows).
2 - each update is marked with its xidThen, "serialization" can be achieved by only applying committed xids,
discarding rolled back xids, and evaluating join satisfaction only
during the updates, and not during delta logging.
I think that your idea of postponing some of the processing to later
commits might indeed make it possible to implement this without
needing the “MVCC-expressiveness enhancement” that I mentioned in my
previous post (by postponing the application to the matview until at
least *someone* must be able to see that particular change). Right now
I have some difficulties working out the MVCC-details of what to do
when joining a base table delta row with the contents of the other
base tables (to create the matview delta), but I’m probably just tired
:-).
In any case, your proposal seems to require the following additional
“infrastructural changes” relative to Kevin’s proposal:
(1) The base table deltas must be shared by all “in flight”
transactions that are in the process of putting stuff in them and
using them to apply changes to the matviews.
(2) Putting stuff in base table deltas must be transactionally safe
(unless all matviews that cares about that specific base table delta
are unlogged).
(3) “Right-after-each-statement matview maintenance” is not possible
(without additional effort).
Nicolas
--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers