Incremental View Maintenance, take 2
On Thu, 1 Jun 2023 23:59:09 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
Hello hackers,
Here's a rebased version of the patch-set adding Incremental View
Maintenance support for PostgreSQL. That was discussed in [1].
[1] /messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp
---------------------------------------------------------------------------------------
* Overview
Incremental View Maintenance (IVM) is a way to make materialized views
up-to-date by computing only incremental changes and applying them on
views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
only small parts of the view are changed.
** Feature
The attached patchset provides a feature that allows materialized views
to be updated automatically and incrementally just after a underlying
table is modified.
You can create an incementally maintainable materialized view (IMMV)
by using CREATE INCREMENTAL MATERIALIZED VIEW command.
The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
- some built-in aggregate functions (count, sum, avg, min, max)
- GROUP BY clause
- DISTINCT clause
Views can contain multiple tuples with the same content (duplicate tuples).
** Restriction
The following are not supported in a view definition:
- Outer joins
- Aggregates otehr than above, window functions, HAVING
- Sub-queries, CTEs
- Set operations (UNION, INTERSECT, EXCEPT)
- DISTINCT ON, ORDER BY, LIMIT, OFFSET
Also, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.
---------------------------------------------------------------------------------------
* Design
An IMMV is maintained using statement-level AFTER triggers.
When an IMMV is created, triggers are automatically created on all base
tables contained in the view definition query.
When a table is modified, changes that occurred in the table are extracted
as transition tables in the AFTER triggers. Then, changes that will occur in
the view are calculated by a rewritten view dequery in which the modified table
is replaced with the transition table.
For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted
into R are stored in a transiton table dR, the tuples that will be inserted into
the view are calculated as the result of "SELECT * FROM dR, S".
** Multiple Tables Modification
Multiple tables can be modified in a statement when using triggers, foreign key
constraint, or modifying CTEs. When multiple tables are modified, we need
the state of tables before the modification.
For example, when some tuples, dR and dS, are inserted into R and S respectively,
the tuples that will be inserted into the view are calculated by the following
two queries:
"SELECT * FROM dR, S_pre"
"SELECT * FROM R, dS"
where S_pre is the table before the modification, R is the current state of
table, that is, after the modification. This pre-update states of table
is calculated by filtering inserted tuples and appending deleted tuples.
The subquery that represents pre-update state is generated in get_prestate_rte().
Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate()
in WHERE clause. This function checks the visibility of tuples by using
the snapshot taken before table modification. The deleted tuples are contained
in the old transition table, and this table is appended using UNION ALL.
Transition tables for each modification are collected in each AFTER trigger
function call. Then, the view maintenance is performed in the last call of
the trigger.
In the original PostgreSQL, tuplestores of transition tables are freed at the
end of each nested query. However, their lifespan needs to be prolonged to
the end of the out-most query in order to maintain the view in the last AFTER
trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c.
** Duplicate Tulpes
When calculating changes that will occur in the view (= delta tables),
multiplicity of tuples are calculated by using count(*).
When deleting tuples from the view, tuples to be deleted are identified by
joining the delta table with the view, and tuples are deleted as many as
specified multiplicity by numbered using row_number() function.
This is implemented in apply_old_delta().
When inserting tuples into the view, each tuple is duplicated to the
specified multiplicity using generate_series() function. This is implemented
in apply_new_delta().
** DISTINCT clause
When DISTINCT is used, the view has a hidden column __ivm_count__ that
stores multiplicity for tuples. When tuples are deleted from or inserted into
the view, the values of __ivm_count__ column is decreased or increased as many
as specified multiplicity. Eventually, when the values becomes zero, the
corresponding tuple is deleted from the view. This is implemented in
apply_old_delta_with_count() and apply_new_delta_with_count().
** Aggregates
Built-in count sum, avg, min, and max are supported. Whether a given
aggregate function can be used or not is checked by using its OID in
check_aggregate_supports_ivm().
When creating a materialized view containing aggregates, in addition
to __ivm_count__, more than one hidden columns for each aggregate are
added to the target list. For example, columns for storing sum(x),
count(x) are added if we have avg(x). When the view is maintained,
aggregated values are updated using these hidden columns, also hidden
columns are updated at the same time.
The maintenance of aggregated view is performed in
apply_old_delta_with_count() and apply_new_delta_with_count(). The SET
clauses for updating columns are generated by append_set_clause_*().
If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation. This
is performed in recalc_and_set_values().
---------------------------------------------------------------------------------------
* Details of the patch-set (v28)
The patch-set consists of the following eleven patches.
In the previous version, the number of patches were nine.
In the latest patch-set, the patches are divided more finely
aiming to make the review easier.
- 0001: Add a syntax to create Incrementally Maintainable Materialized Views
The prposed syntax to create an incrementally maintainable materialized
view (IMMV) is;
CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
However, this syntax is tentative, so any suggestions are welcomed.
- 0002: Add relisivm column to pg_class system catalog
We add a new field in pg_class to indicate a relation is IMMV.
Another alternative is to add a new catalog for managing materialized
views including IMMV, but I am not sure if we want this.
- 0003: Allow to prolong life span of transition tables until transaction end
This patch fixes the trigger system to allow to prolong lifespan of
tuple stores for transition tables until the transaction end. We need
this because multiple transition tables have to be preserved until the
end of the out-most query when multiple tables are modified by nested
triggers. (as explained above in Design - Multiple Tables Modification)
If we don't want to change the trigger system in such way, the alternative
is to copy the contents of transition tables to other tuplestores, although
it needs more time and memory.
- 0004: Add Incremental View Maintenance support to pg_dump
This patch enables pg_dump to output IMMV using the new syntax.
- 0005: Add Incremental View Maintenance support to psql
This patch implements tab-completion for the new syntax and adds
information of IMMV to \d meta-command results.
- 0006: Add Incremental View Maintenance support
This patch implements the basic IVM feature.
DISTINCT and aggregate are not supported here.
When an IMMV is created, the view query is checked, and if any
non-supported feature is used, it raises an error. If it is ok,
triggers are created on base tables and an unique index is
created on the view if possible.
In BEFORE trigger, an entry is created for each IMMV and the number
of trigger firing is counted. Also, the snapshot just before the
table modification is stored.
In AFTER triggers, each transition tables are preserved. The number
of trigger firing is counted also here, and when the firing number of
BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
trigger call.
In the final AFTER trigger, the IMMV is maintained. Rewritten view
query is executed to generate delta tables, and deltas are applied
to the view. If multiple tables are modified simultaneously, this
process is iterated for each modified table. Tables before processed
are represented in "pre-update-state", processed tables are
"post-update-state" in the rewritten query.
- 0007: Add DISTINCT support for IVM
This patch adds DISTINCT clause support.
When an IMMV including DISTINCT is created, a hidden column
"__ivm_count__" is added to the target list. This column has the
number of duplicity of the same tuples. The duplicity is calculated
by adding "count(*)" and GROUP BY to the view query.
When an IMMV is maintained, the duplicity in __ivm_count__ is updated,
and a tuples whose duplicity becomes zero can be deleted from the view.
This logic is implemented by SQL in apply_old_delta_with_count and
apply_new_delta_with_count.
Columns starting with "__ivm_" are deemed hidden columns that doesn't
appear when a view is accessed by "SELECT * FROM ....". This is
implemented by fixing parse_relation.c.
- 0008: Add aggregates support in IVM
This patch provides codes for aggregates support, specifically
for builtin count, sum, and avg.
When an IMMV containing an aggregate is created, it is checked if this
aggregate function is supported, and if it is ok, some hidden columns
are added to the target list.
When the IMMV is maintained, the aggregated value is updated as well as
related hidden columns. The way of update depends the type of aggregate
functions, and SET clause string is generated for each aggregate.
- 0009: Add support for min/max aggregates for IVM
This patch adds min/max aggregates support.
This is separated from #0008 because min/max needs more complicated
work than count, sum, and avg.
If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation.
This is performed in recalc_and_set_values().
TIDs and keys of tuples that need re-calculation are returned as a
result of the query that deleted min/max values from the view using
RETURNING clause. The plan to recalculate and set the new min/max value
are stored and reused.
- 0010: regression tests
This patch provides regression tests for IVM.
- 0011: documentation
This patch provides documantation for IVM.
---------------------------------------------------------------------------------------
* Changes from the Previous Version (v27)
- Allow TRUNCATE on base tables
When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.
Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.
- Fix bugs reported by huyajun [1]/messages/by-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
[1]: /messages/by-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
---------------------------------------------------------------------------------------
* Discussion
** Aggregate support
There were a few suggestions that general aggregate functions should be
supported [2]/messages/by-id/20191128140333.GA25947@alvherre.pgsql[3]/messages/by-id/CAM-w4HOvDrL4ou6m=592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg@mail.gmail.com, which may be possible by extending pg_aggregate catalog.
However, we decided to leave supporting general aggregates to the future work [4]/messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp
because it would need substantial works and make the patch more complex and
bigger.
There has been no opposite opinion on this. However, if we need more discussion
on the design of aggregate support, we can omit aggregate support for the first
release of IVM.
[2]: /messages/by-id/20191128140333.GA25947@alvherre.pgsql
[3]: /messages/by-id/CAM-w4HOvDrL4ou6m=592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg@mail.gmail.com
[4]: /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp
** Hidden columns
In order to support DISTINCT or aggregates, our implementation uses hidden columns.
Columns starting with "__ivm_" are hidden columns that doesn't appear when a
view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is
fixed. There was a proposal to enable hidden columns by adding a new flag to
pg_attribute [5]/messages/by-id/CAEepm=3ZHh=p0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA@mail.gmail.com, but this thread is no longer active, so we decided to check
the hidden column by its name [6]/messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp.
[5]: /messages/by-id/CAEepm=3ZHh=p0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA@mail.gmail.com
[6]: /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp
** Concurrent Transactions
When the view definition has more than one table, we acquire an exclusive
lock before the view maintenance in order to avoid inconsistent results.
This behavior was explained in [7]/messages/by-id/20200909092752.c91758a1bec3479668e82643@sraoss.co.jp. The lock was improved to use weaker lock
when the view has only one table based on a suggestion from Konstantin Knizhnik [8]/messages/by-id/5663f5f0-48af-686c-bf3c-62d279567e2a@postgrespro.ru.
However, due to the implementation that uses ctid for identifying target tuples,
we still have to use an exclusive lock for DELETE and UPDATE.
[7]: /messages/by-id/20200909092752.c91758a1bec3479668e82643@sraoss.co.jp
[8]: /messages/by-id/5663f5f0-48af-686c-bf3c-62d279567e2a@postgrespro.ru
** Automatic Index Creation
When a view is created, a unique index is automatically created if
possible, that is, if the view definition query has a GROUP BY or
DISTINCT, or if the view contains all primary key attributes of
its base tables in the target list. It is necessary for efficient
view maintenance. This feature is based on a suggestion from
Konstantin Knizhnik [9]/messages/by-id/89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru.
[9]: /messages/by-id/89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
** Trigger and Transition Tables
We implemented IVM based on triggers. This is because we want to use
transition tables to extract changes on base tables. Also, there are
other constraint that are using triggers in its implementation, like
foreign references. However, if we can use transition table like feature
without relying triggers, we don't have to insist to use triggers and we
might implement IVM in the executor directly as similar as declarative
partitioning.
** Feature to be Supported in the First Release
The current patch-set supports DISTINCT and aggregates for built-in count,
sum, avg, min and max. Do we need all these feature for the first IVM release?
Supporting DISTINCT and aggregates needs discussion on hidden columns, and
for supporting min/max we need to discuss on re-calculation method. Before
handling such relatively advanced feature, maybe, should we focus to design
and implement of the basic feature of IVM?
Any suggestion and discussion are welcomed!
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Hello hackers,
Here's a rebased version of the patch-set adding Incremental View
Maintenance support for PostgreSQL. That was discussed in [1]/messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp.
The patch-set consists of the following eleven patches.
- 0001: Add a syntax to create Incrementally Maintainable Materialized Views
- 0002: Add relisivm column to pg_class system catalog
- 0003: Allow to prolong life span of transition tables until transaction end
- 0004: Add Incremental View Maintenance support to pg_dum
- 0005: Add Incremental View Maintenance support to psql
- 0006: Add Incremental View Maintenance support
- 0007: Add DISTINCT support for IVM
- 0008: Add aggregates support in IVM
- 0009: Add support for min/max aggregates for IVM
- 0010: regression tests
- 0011: documentation
[1]: /messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v28-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patchtext/x-diff; name=v28-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patchDownload+23-12
v28-0002-Add-relisivm-column-to-pg_class-system-catalog.patchtext/x-diff; name=v28-0002-Add-relisivm-column-to-pg_class-system-catalog.patchDownload+34-1
v28-0003-Allow-to-prolong-life-span-of-transition-tables-.patchtext/x-diff; name=v28-0003-Allow-to-prolong-life-span-of-transition-tables-.patchDownload+81-5
v28-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patchtext/x-diff; name=v28-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patchDownload+31-4
v28-0005-Add-Incremental-View-Maintenance-support-to-psql.patchtext/x-diff; name=v28-0005-Add-Incremental-View-Maintenance-support-to-psql.patchDownload+40-7
v28-0006-Add-Incremental-View-Maintenance-support.patchtext/x-diff; name=v28-0006-Add-Incremental-View-Maintenance-support.patchDownload+2139-36
v28-0007-Add-DISTINCT-support-for-IVM.patchtext/x-diff; name=v28-0007-Add-DISTINCT-support-for-IVM.patchDownload+317-46
v28-0008-Add-aggregates-support-in-IVM.patchtext/x-diff; name=v28-0008-Add-aggregates-support-in-IVM.patchDownload+661-38
v28-0009-Add-support-for-min-max-aggregates-for-IVM.patchtext/x-diff; name=v28-0009-Add-support-for-min-max-aggregates-for-IVM.patchDownload+680-10
v28-0010-Add-regression-tests-for-Incremental-View-Mainte.patchtext/x-diff; name=v28-0010-Add-regression-tests-for-Incremental-View-Mainte.patchDownload+1567-2
v28-0011-Add-documentations-about-Incremental-View-Mainte.patchtext/x-diff; name=v28-0011-Add-documentations-about-Incremental-View-Mainte.patchDownload+587-5
On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Thu, 1 Jun 2023 23:59:09 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:Hello hackers,
Here's a rebased version of the patch-set adding Incremental View
Maintenance support for PostgreSQL. That was discussed in [1].[1] /messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp
---------------------------------------------------------------------------------------
* OverviewIncremental View Maintenance (IVM) is a way to make materialized views
up-to-date by computing only incremental changes and applying them on
views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
only small parts of the view are changed.** Feature
The attached patchset provides a feature that allows materialized views
to be updated automatically and incrementally just after a underlying
table is modified.You can create an incementally maintainable materialized view (IMMV)
by using CREATE INCREMENTAL MATERIALIZED VIEW command.The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
- some built-in aggregate functions (count, sum, avg, min, max)
- GROUP BY clause
- DISTINCT clauseViews can contain multiple tuples with the same content (duplicate tuples).
** Restriction
The following are not supported in a view definition:
- Outer joins
- Aggregates otehr than above, window functions, HAVING
- Sub-queries, CTEs
- Set operations (UNION, INTERSECT, EXCEPT)
- DISTINCT ON, ORDER BY, LIMIT, OFFSETAlso, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.---------------------------------------------------------------------------------------
* DesignAn IMMV is maintained using statement-level AFTER triggers.
When an IMMV is created, triggers are automatically created on all base
tables contained in the view definition query.When a table is modified, changes that occurred in the table are extracted
as transition tables in the AFTER triggers. Then, changes that will occur in
the view are calculated by a rewritten view dequery in which the modified table
is replaced with the transition table.For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted
into R are stored in a transiton table dR, the tuples that will be inserted into
the view are calculated as the result of "SELECT * FROM dR, S".** Multiple Tables Modification
Multiple tables can be modified in a statement when using triggers, foreign key
constraint, or modifying CTEs. When multiple tables are modified, we need
the state of tables before the modification.For example, when some tuples, dR and dS, are inserted into R and S respectively,
the tuples that will be inserted into the view are calculated by the following
two queries:"SELECT * FROM dR, S_pre"
"SELECT * FROM R, dS"where S_pre is the table before the modification, R is the current state of
table, that is, after the modification. This pre-update states of table
is calculated by filtering inserted tuples and appending deleted tuples.
The subquery that represents pre-update state is generated in get_prestate_rte().
Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate()
in WHERE clause. This function checks the visibility of tuples by using
the snapshot taken before table modification. The deleted tuples are contained
in the old transition table, and this table is appended using UNION ALL.Transition tables for each modification are collected in each AFTER trigger
function call. Then, the view maintenance is performed in the last call of
the trigger.In the original PostgreSQL, tuplestores of transition tables are freed at the
end of each nested query. However, their lifespan needs to be prolonged to
the end of the out-most query in order to maintain the view in the last AFTER
trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c.** Duplicate Tulpes
When calculating changes that will occur in the view (= delta tables),
multiplicity of tuples are calculated by using count(*).When deleting tuples from the view, tuples to be deleted are identified by
joining the delta table with the view, and tuples are deleted as many as
specified multiplicity by numbered using row_number() function.
This is implemented in apply_old_delta().When inserting tuples into the view, each tuple is duplicated to the
specified multiplicity using generate_series() function. This is implemented
in apply_new_delta().** DISTINCT clause
When DISTINCT is used, the view has a hidden column __ivm_count__ that
stores multiplicity for tuples. When tuples are deleted from or inserted into
the view, the values of __ivm_count__ column is decreased or increased as many
as specified multiplicity. Eventually, when the values becomes zero, the
corresponding tuple is deleted from the view. This is implemented in
apply_old_delta_with_count() and apply_new_delta_with_count().** Aggregates
Built-in count sum, avg, min, and max are supported. Whether a given
aggregate function can be used or not is checked by using its OID in
check_aggregate_supports_ivm().When creating a materialized view containing aggregates, in addition
to __ivm_count__, more than one hidden columns for each aggregate are
added to the target list. For example, columns for storing sum(x),
count(x) are added if we have avg(x). When the view is maintained,
aggregated values are updated using these hidden columns, also hidden
columns are updated at the same time.The maintenance of aggregated view is performed in
apply_old_delta_with_count() and apply_new_delta_with_count(). The SET
clauses for updating columns are generated by append_set_clause_*().If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation. This
is performed in recalc_and_set_values().---------------------------------------------------------------------------------------
* Details of the patch-set (v28)The patch-set consists of the following eleven patches.
In the previous version, the number of patches were nine.
In the latest patch-set, the patches are divided more finely
aiming to make the review easier.- 0001: Add a syntax to create Incrementally Maintainable Materialized Views
The prposed syntax to create an incrementally maintainable materialized
view (IMMV) is;CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
However, this syntax is tentative, so any suggestions are welcomed.
- 0002: Add relisivm column to pg_class system catalog
We add a new field in pg_class to indicate a relation is IMMV.
Another alternative is to add a new catalog for managing materialized
views including IMMV, but I am not sure if we want this.- 0003: Allow to prolong life span of transition tables until transaction end
This patch fixes the trigger system to allow to prolong lifespan of
tuple stores for transition tables until the transaction end. We need
this because multiple transition tables have to be preserved until the
end of the out-most query when multiple tables are modified by nested
triggers. (as explained above in Design - Multiple Tables Modification)If we don't want to change the trigger system in such way, the alternative
is to copy the contents of transition tables to other tuplestores, although
it needs more time and memory.- 0004: Add Incremental View Maintenance support to pg_dump
This patch enables pg_dump to output IMMV using the new syntax.
- 0005: Add Incremental View Maintenance support to psql
This patch implements tab-completion for the new syntax and adds
information of IMMV to \d meta-command results.- 0006: Add Incremental View Maintenance support
This patch implements the basic IVM feature.
DISTINCT and aggregate are not supported here.When an IMMV is created, the view query is checked, and if any
non-supported feature is used, it raises an error. If it is ok,
triggers are created on base tables and an unique index is
created on the view if possible.In BEFORE trigger, an entry is created for each IMMV and the number
of trigger firing is counted. Also, the snapshot just before the
table modification is stored.In AFTER triggers, each transition tables are preserved. The number
of trigger firing is counted also here, and when the firing number of
BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
trigger call.In the final AFTER trigger, the IMMV is maintained. Rewritten view
query is executed to generate delta tables, and deltas are applied
to the view. If multiple tables are modified simultaneously, this
process is iterated for each modified table. Tables before processed
are represented in "pre-update-state", processed tables are
"post-update-state" in the rewritten query.- 0007: Add DISTINCT support for IVM
This patch adds DISTINCT clause support.
When an IMMV including DISTINCT is created, a hidden column
"__ivm_count__" is added to the target list. This column has the
number of duplicity of the same tuples. The duplicity is calculated
by adding "count(*)" and GROUP BY to the view query.When an IMMV is maintained, the duplicity in __ivm_count__ is updated,
and a tuples whose duplicity becomes zero can be deleted from the view.
This logic is implemented by SQL in apply_old_delta_with_count and
apply_new_delta_with_count.Columns starting with "__ivm_" are deemed hidden columns that doesn't
appear when a view is accessed by "SELECT * FROM ....". This is
implemented by fixing parse_relation.c.- 0008: Add aggregates support in IVM
This patch provides codes for aggregates support, specifically
for builtin count, sum, and avg.When an IMMV containing an aggregate is created, it is checked if this
aggregate function is supported, and if it is ok, some hidden columns
are added to the target list.When the IMMV is maintained, the aggregated value is updated as well as
related hidden columns. The way of update depends the type of aggregate
functions, and SET clause string is generated for each aggregate.- 0009: Add support for min/max aggregates for IVM
This patch adds min/max aggregates support.
This is separated from #0008 because min/max needs more complicated
work than count, sum, and avg.If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation.
This is performed in recalc_and_set_values().TIDs and keys of tuples that need re-calculation are returned as a
result of the query that deleted min/max values from the view using
RETURNING clause. The plan to recalculate and set the new min/max value
are stored and reused.- 0010: regression tests
This patch provides regression tests for IVM.
- 0011: documentation
This patch provides documantation for IVM.
---------------------------------------------------------------------------------------
* Changes from the Previous Version (v27)- Allow TRUNCATE on base tables
When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.- Fix bugs reported by huyajun [1]
[1] /messages/by-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
---------------------------------------------------------------------------------------
* Discussion** Aggregate support
There were a few suggestions that general aggregate functions should be
supported [2][3], which may be possible by extending pg_aggregate catalog.
However, we decided to leave supporting general aggregates to the future work [4]
because it would need substantial works and make the patch more complex and
bigger.There has been no opposite opinion on this. However, if we need more discussion
on the design of aggregate support, we can omit aggregate support for the first
release of IVM.[2] /messages/by-id/20191128140333.GA25947@alvherre.pgsql
[3] /messages/by-id/CAM-w4HOvDrL4ou6m=592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg@mail.gmail.com
[4] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Hidden columns
In order to support DISTINCT or aggregates, our implementation uses hidden columns.
Columns starting with "__ivm_" are hidden columns that doesn't appear when a
view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is
fixed. There was a proposal to enable hidden columns by adding a new flag to
pg_attribute [5], but this thread is no longer active, so we decided to check
the hidden column by its name [6].[5] /messages/by-id/CAEepm=3ZHh=p0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA@mail.gmail.com
[6] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Concurrent Transactions
When the view definition has more than one table, we acquire an exclusive
lock before the view maintenance in order to avoid inconsistent results.
This behavior was explained in [7]. The lock was improved to use weaker lock
when the view has only one table based on a suggestion from Konstantin Knizhnik [8].
However, due to the implementation that uses ctid for identifying target tuples,
we still have to use an exclusive lock for DELETE and UPDATE.[7] /messages/by-id/20200909092752.c91758a1bec3479668e82643@sraoss.co.jp
[8] /messages/by-id/5663f5f0-48af-686c-bf3c-62d279567e2a@postgrespro.ru** Automatic Index Creation
When a view is created, a unique index is automatically created if
possible, that is, if the view definition query has a GROUP BY or
DISTINCT, or if the view contains all primary key attributes of
its base tables in the target list. It is necessary for efficient
view maintenance. This feature is based on a suggestion from
Konstantin Knizhnik [9].[9] /messages/by-id/89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
** Trigger and Transition Tables
We implemented IVM based on triggers. This is because we want to use
transition tables to extract changes on base tables. Also, there are
other constraint that are using triggers in its implementation, like
foreign references. However, if we can use transition table like feature
without relying triggers, we don't have to insist to use triggers and we
might implement IVM in the executor directly as similar as declarative
partitioning.** Feature to be Supported in the First Release
The current patch-set supports DISTINCT and aggregates for built-in count,
sum, avg, min and max. Do we need all these feature for the first IVM release?
Supporting DISTINCT and aggregates needs discussion on hidden columns, and
for supporting min/max we need to discuss on re-calculation method. Before
handling such relatively advanced feature, maybe, should we focus to design
and implement of the basic feature of IVM?Any suggestion and discussion are welcomed!
Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>
The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
Also, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.
Does this also apply to tableoid? but tableoid is a constant, so it
should be fine?
can following two queries apply to this feature.
select tableoid, unique1 from tenk1;
select 1 as constant, unique1 from tenk1;
I didn't apply the patch.(will do later, for someone to test, it would
be a better idea to dump a whole file separately....).
On Wed, 28 Jun 2023 00:01:02 +0800
jian he <jian.universality@gmail.com> wrote:
On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Thu, 1 Jun 2023 23:59:09 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:Hello hackers,
Here's a rebased version of the patch-set adding Incremental View
Maintenance support for PostgreSQL. That was discussed in [1].[1] /messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp
---------------------------------------------------------------------------------------
* OverviewIncremental View Maintenance (IVM) is a way to make materialized views
up-to-date by computing only incremental changes and applying them on
views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
only small parts of the view are changed.** Feature
The attached patchset provides a feature that allows materialized views
to be updated automatically and incrementally just after a underlying
table is modified.You can create an incementally maintainable materialized view (IMMV)
by using CREATE INCREMENTAL MATERIALIZED VIEW command.The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
- some built-in aggregate functions (count, sum, avg, min, max)
- GROUP BY clause
- DISTINCT clauseViews can contain multiple tuples with the same content (duplicate tuples).
** Restriction
The following are not supported in a view definition:
- Outer joins
- Aggregates otehr than above, window functions, HAVING
- Sub-queries, CTEs
- Set operations (UNION, INTERSECT, EXCEPT)
- DISTINCT ON, ORDER BY, LIMIT, OFFSETAlso, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.---------------------------------------------------------------------------------------
* DesignAn IMMV is maintained using statement-level AFTER triggers.
When an IMMV is created, triggers are automatically created on all base
tables contained in the view definition query.When a table is modified, changes that occurred in the table are extracted
as transition tables in the AFTER triggers. Then, changes that will occur in
the view are calculated by a rewritten view dequery in which the modified table
is replaced with the transition table.For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted
into R are stored in a transiton table dR, the tuples that will be inserted into
the view are calculated as the result of "SELECT * FROM dR, S".** Multiple Tables Modification
Multiple tables can be modified in a statement when using triggers, foreign key
constraint, or modifying CTEs. When multiple tables are modified, we need
the state of tables before the modification.For example, when some tuples, dR and dS, are inserted into R and S respectively,
the tuples that will be inserted into the view are calculated by the following
two queries:"SELECT * FROM dR, S_pre"
"SELECT * FROM R, dS"where S_pre is the table before the modification, R is the current state of
table, that is, after the modification. This pre-update states of table
is calculated by filtering inserted tuples and appending deleted tuples.
The subquery that represents pre-update state is generated in get_prestate_rte().
Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate()
in WHERE clause. This function checks the visibility of tuples by using
the snapshot taken before table modification. The deleted tuples are contained
in the old transition table, and this table is appended using UNION ALL.Transition tables for each modification are collected in each AFTER trigger
function call. Then, the view maintenance is performed in the last call of
the trigger.In the original PostgreSQL, tuplestores of transition tables are freed at the
end of each nested query. However, their lifespan needs to be prolonged to
the end of the out-most query in order to maintain the view in the last AFTER
trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c.** Duplicate Tulpes
When calculating changes that will occur in the view (= delta tables),
multiplicity of tuples are calculated by using count(*).When deleting tuples from the view, tuples to be deleted are identified by
joining the delta table with the view, and tuples are deleted as many as
specified multiplicity by numbered using row_number() function.
This is implemented in apply_old_delta().When inserting tuples into the view, each tuple is duplicated to the
specified multiplicity using generate_series() function. This is implemented
in apply_new_delta().** DISTINCT clause
When DISTINCT is used, the view has a hidden column __ivm_count__ that
stores multiplicity for tuples. When tuples are deleted from or inserted into
the view, the values of __ivm_count__ column is decreased or increased as many
as specified multiplicity. Eventually, when the values becomes zero, the
corresponding tuple is deleted from the view. This is implemented in
apply_old_delta_with_count() and apply_new_delta_with_count().** Aggregates
Built-in count sum, avg, min, and max are supported. Whether a given
aggregate function can be used or not is checked by using its OID in
check_aggregate_supports_ivm().When creating a materialized view containing aggregates, in addition
to __ivm_count__, more than one hidden columns for each aggregate are
added to the target list. For example, columns for storing sum(x),
count(x) are added if we have avg(x). When the view is maintained,
aggregated values are updated using these hidden columns, also hidden
columns are updated at the same time.The maintenance of aggregated view is performed in
apply_old_delta_with_count() and apply_new_delta_with_count(). The SET
clauses for updating columns are generated by append_set_clause_*().If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation. This
is performed in recalc_and_set_values().---------------------------------------------------------------------------------------
* Details of the patch-set (v28)The patch-set consists of the following eleven patches.
In the previous version, the number of patches were nine.
In the latest patch-set, the patches are divided more finely
aiming to make the review easier.- 0001: Add a syntax to create Incrementally Maintainable Materialized Views
The prposed syntax to create an incrementally maintainable materialized
view (IMMV) is;CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
However, this syntax is tentative, so any suggestions are welcomed.
- 0002: Add relisivm column to pg_class system catalog
We add a new field in pg_class to indicate a relation is IMMV.
Another alternative is to add a new catalog for managing materialized
views including IMMV, but I am not sure if we want this.- 0003: Allow to prolong life span of transition tables until transaction end
This patch fixes the trigger system to allow to prolong lifespan of
tuple stores for transition tables until the transaction end. We need
this because multiple transition tables have to be preserved until the
end of the out-most query when multiple tables are modified by nested
triggers. (as explained above in Design - Multiple Tables Modification)If we don't want to change the trigger system in such way, the alternative
is to copy the contents of transition tables to other tuplestores, although
it needs more time and memory.- 0004: Add Incremental View Maintenance support to pg_dump
This patch enables pg_dump to output IMMV using the new syntax.
- 0005: Add Incremental View Maintenance support to psql
This patch implements tab-completion for the new syntax and adds
information of IMMV to \d meta-command results.- 0006: Add Incremental View Maintenance support
This patch implements the basic IVM feature.
DISTINCT and aggregate are not supported here.When an IMMV is created, the view query is checked, and if any
non-supported feature is used, it raises an error. If it is ok,
triggers are created on base tables and an unique index is
created on the view if possible.In BEFORE trigger, an entry is created for each IMMV and the number
of trigger firing is counted. Also, the snapshot just before the
table modification is stored.In AFTER triggers, each transition tables are preserved. The number
of trigger firing is counted also here, and when the firing number of
BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
trigger call.In the final AFTER trigger, the IMMV is maintained. Rewritten view
query is executed to generate delta tables, and deltas are applied
to the view. If multiple tables are modified simultaneously, this
process is iterated for each modified table. Tables before processed
are represented in "pre-update-state", processed tables are
"post-update-state" in the rewritten query.- 0007: Add DISTINCT support for IVM
This patch adds DISTINCT clause support.
When an IMMV including DISTINCT is created, a hidden column
"__ivm_count__" is added to the target list. This column has the
number of duplicity of the same tuples. The duplicity is calculated
by adding "count(*)" and GROUP BY to the view query.When an IMMV is maintained, the duplicity in __ivm_count__ is updated,
and a tuples whose duplicity becomes zero can be deleted from the view.
This logic is implemented by SQL in apply_old_delta_with_count and
apply_new_delta_with_count.Columns starting with "__ivm_" are deemed hidden columns that doesn't
appear when a view is accessed by "SELECT * FROM ....". This is
implemented by fixing parse_relation.c.- 0008: Add aggregates support in IVM
This patch provides codes for aggregates support, specifically
for builtin count, sum, and avg.When an IMMV containing an aggregate is created, it is checked if this
aggregate function is supported, and if it is ok, some hidden columns
are added to the target list.When the IMMV is maintained, the aggregated value is updated as well as
related hidden columns. The way of update depends the type of aggregate
functions, and SET clause string is generated for each aggregate.- 0009: Add support for min/max aggregates for IVM
This patch adds min/max aggregates support.
This is separated from #0008 because min/max needs more complicated
work than count, sum, and avg.If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation.
This is performed in recalc_and_set_values().TIDs and keys of tuples that need re-calculation are returned as a
result of the query that deleted min/max values from the view using
RETURNING clause. The plan to recalculate and set the new min/max value
are stored and reused.- 0010: regression tests
This patch provides regression tests for IVM.
- 0011: documentation
This patch provides documantation for IVM.
---------------------------------------------------------------------------------------
* Changes from the Previous Version (v27)- Allow TRUNCATE on base tables
When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.- Fix bugs reported by huyajun [1]
[1] /messages/by-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
---------------------------------------------------------------------------------------
* Discussion** Aggregate support
There were a few suggestions that general aggregate functions should be
supported [2][3], which may be possible by extending pg_aggregate catalog.
However, we decided to leave supporting general aggregates to the future work [4]
because it would need substantial works and make the patch more complex and
bigger.There has been no opposite opinion on this. However, if we need more discussion
on the design of aggregate support, we can omit aggregate support for the first
release of IVM.[2] /messages/by-id/20191128140333.GA25947@alvherre.pgsql
[3] /messages/by-id/CAM-w4HOvDrL4ou6m=592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg@mail.gmail.com
[4] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Hidden columns
In order to support DISTINCT or aggregates, our implementation uses hidden columns.
Columns starting with "__ivm_" are hidden columns that doesn't appear when a
view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is
fixed. There was a proposal to enable hidden columns by adding a new flag to
pg_attribute [5], but this thread is no longer active, so we decided to check
the hidden column by its name [6].[5] /messages/by-id/CAEepm=3ZHh=p0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA@mail.gmail.com
[6] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Concurrent Transactions
When the view definition has more than one table, we acquire an exclusive
lock before the view maintenance in order to avoid inconsistent results.
This behavior was explained in [7]. The lock was improved to use weaker lock
when the view has only one table based on a suggestion from Konstantin Knizhnik [8].
However, due to the implementation that uses ctid for identifying target tuples,
we still have to use an exclusive lock for DELETE and UPDATE.[7] /messages/by-id/20200909092752.c91758a1bec3479668e82643@sraoss.co.jp
[8] /messages/by-id/5663f5f0-48af-686c-bf3c-62d279567e2a@postgrespro.ru** Automatic Index Creation
When a view is created, a unique index is automatically created if
possible, that is, if the view definition query has a GROUP BY or
DISTINCT, or if the view contains all primary key attributes of
its base tables in the target list. It is necessary for efficient
view maintenance. This feature is based on a suggestion from
Konstantin Knizhnik [9].[9] /messages/by-id/89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
** Trigger and Transition Tables
We implemented IVM based on triggers. This is because we want to use
transition tables to extract changes on base tables. Also, there are
other constraint that are using triggers in its implementation, like
foreign references. However, if we can use transition table like feature
without relying triggers, we don't have to insist to use triggers and we
might implement IVM in the executor directly as similar as declarative
partitioning.** Feature to be Supported in the First Release
The current patch-set supports DISTINCT and aggregates for built-in count,
sum, avg, min and max. Do we need all these feature for the first IVM release?
Supporting DISTINCT and aggregates needs discussion on hidden columns, and
for supporting min/max we need to discuss on re-calculation method. Before
handling such relatively advanced feature, maybe, should we focus to design
and implement of the basic feature of IVM?Any suggestion and discussion are welcomed!
Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)Also, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.Does this also apply to tableoid? but tableoid is a constant, so it
should be fine?
can following two queries apply to this feature.
select tableoid, unique1 from tenk1;
Currently, this is not allowed because tableoid is a system column.
As you say, tableoid is a constant, so we can allow. Should we do this?
select 1 as constant, unique1 from tenk1;
This is allowed, of course.
I didn't apply the patch.(will do later, for someone to test, it would
be a better idea to dump a whole file separately....).
Thank you! I'm looking forward to your feedback.
(I didn't attach a whole patch separately because I wouldn't like
cfbot to be unhappy...)
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Wed, Jun 28, 2023 at 4:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Wed, 28 Jun 2023 00:01:02 +0800
jian he <jian.universality@gmail.com> wrote:On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Thu, 1 Jun 2023 23:59:09 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:Hello hackers,
Here's a rebased version of the patch-set adding Incremental View
Maintenance support for PostgreSQL. That was discussed in [1].[1] /messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp
---------------------------------------------------------------------------------------
* OverviewIncremental View Maintenance (IVM) is a way to make materialized views
up-to-date by computing only incremental changes and applying them on
views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
only small parts of the view are changed.** Feature
The attached patchset provides a feature that allows materialized views
to be updated automatically and incrementally just after a underlying
table is modified.You can create an incementally maintainable materialized view (IMMV)
by using CREATE INCREMENTAL MATERIALIZED VIEW command.The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
- some built-in aggregate functions (count, sum, avg, min, max)
- GROUP BY clause
- DISTINCT clauseViews can contain multiple tuples with the same content (duplicate tuples).
** Restriction
The following are not supported in a view definition:
- Outer joins
- Aggregates otehr than above, window functions, HAVING
- Sub-queries, CTEs
- Set operations (UNION, INTERSECT, EXCEPT)
- DISTINCT ON, ORDER BY, LIMIT, OFFSETAlso, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.---------------------------------------------------------------------------------------
* DesignAn IMMV is maintained using statement-level AFTER triggers.
When an IMMV is created, triggers are automatically created on all base
tables contained in the view definition query.When a table is modified, changes that occurred in the table are extracted
as transition tables in the AFTER triggers. Then, changes that will occur in
the view are calculated by a rewritten view dequery in which the modified table
is replaced with the transition table.For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted
into R are stored in a transiton table dR, the tuples that will be inserted into
the view are calculated as the result of "SELECT * FROM dR, S".** Multiple Tables Modification
Multiple tables can be modified in a statement when using triggers, foreign key
constraint, or modifying CTEs. When multiple tables are modified, we need
the state of tables before the modification.For example, when some tuples, dR and dS, are inserted into R and S respectively,
the tuples that will be inserted into the view are calculated by the following
two queries:"SELECT * FROM dR, S_pre"
"SELECT * FROM R, dS"where S_pre is the table before the modification, R is the current state of
table, that is, after the modification. This pre-update states of table
is calculated by filtering inserted tuples and appending deleted tuples.
The subquery that represents pre-update state is generated in get_prestate_rte().
Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate()
in WHERE clause. This function checks the visibility of tuples by using
the snapshot taken before table modification. The deleted tuples are contained
in the old transition table, and this table is appended using UNION ALL.Transition tables for each modification are collected in each AFTER trigger
function call. Then, the view maintenance is performed in the last call of
the trigger.In the original PostgreSQL, tuplestores of transition tables are freed at the
end of each nested query. However, their lifespan needs to be prolonged to
the end of the out-most query in order to maintain the view in the last AFTER
trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c.** Duplicate Tulpes
When calculating changes that will occur in the view (= delta tables),
multiplicity of tuples are calculated by using count(*).When deleting tuples from the view, tuples to be deleted are identified by
joining the delta table with the view, and tuples are deleted as many as
specified multiplicity by numbered using row_number() function.
This is implemented in apply_old_delta().When inserting tuples into the view, each tuple is duplicated to the
specified multiplicity using generate_series() function. This is implemented
in apply_new_delta().** DISTINCT clause
When DISTINCT is used, the view has a hidden column __ivm_count__ that
stores multiplicity for tuples. When tuples are deleted from or inserted into
the view, the values of __ivm_count__ column is decreased or increased as many
as specified multiplicity. Eventually, when the values becomes zero, the
corresponding tuple is deleted from the view. This is implemented in
apply_old_delta_with_count() and apply_new_delta_with_count().** Aggregates
Built-in count sum, avg, min, and max are supported. Whether a given
aggregate function can be used or not is checked by using its OID in
check_aggregate_supports_ivm().When creating a materialized view containing aggregates, in addition
to __ivm_count__, more than one hidden columns for each aggregate are
added to the target list. For example, columns for storing sum(x),
count(x) are added if we have avg(x). When the view is maintained,
aggregated values are updated using these hidden columns, also hidden
columns are updated at the same time.The maintenance of aggregated view is performed in
apply_old_delta_with_count() and apply_new_delta_with_count(). The SET
clauses for updating columns are generated by append_set_clause_*().If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation. This
is performed in recalc_and_set_values().---------------------------------------------------------------------------------------
* Details of the patch-set (v28)The patch-set consists of the following eleven patches.
In the previous version, the number of patches were nine.
In the latest patch-set, the patches are divided more finely
aiming to make the review easier.- 0001: Add a syntax to create Incrementally Maintainable Materialized Views
The prposed syntax to create an incrementally maintainable materialized
view (IMMV) is;CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
However, this syntax is tentative, so any suggestions are welcomed.
- 0002: Add relisivm column to pg_class system catalog
We add a new field in pg_class to indicate a relation is IMMV.
Another alternative is to add a new catalog for managing materialized
views including IMMV, but I am not sure if we want this.- 0003: Allow to prolong life span of transition tables until transaction end
This patch fixes the trigger system to allow to prolong lifespan of
tuple stores for transition tables until the transaction end. We need
this because multiple transition tables have to be preserved until the
end of the out-most query when multiple tables are modified by nested
triggers. (as explained above in Design - Multiple Tables Modification)If we don't want to change the trigger system in such way, the alternative
is to copy the contents of transition tables to other tuplestores, although
it needs more time and memory.- 0004: Add Incremental View Maintenance support to pg_dump
This patch enables pg_dump to output IMMV using the new syntax.
- 0005: Add Incremental View Maintenance support to psql
This patch implements tab-completion for the new syntax and adds
information of IMMV to \d meta-command results.- 0006: Add Incremental View Maintenance support
This patch implements the basic IVM feature.
DISTINCT and aggregate are not supported here.When an IMMV is created, the view query is checked, and if any
non-supported feature is used, it raises an error. If it is ok,
triggers are created on base tables and an unique index is
created on the view if possible.In BEFORE trigger, an entry is created for each IMMV and the number
of trigger firing is counted. Also, the snapshot just before the
table modification is stored.In AFTER triggers, each transition tables are preserved. The number
of trigger firing is counted also here, and when the firing number of
BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
trigger call.In the final AFTER trigger, the IMMV is maintained. Rewritten view
query is executed to generate delta tables, and deltas are applied
to the view. If multiple tables are modified simultaneously, this
process is iterated for each modified table. Tables before processed
are represented in "pre-update-state", processed tables are
"post-update-state" in the rewritten query.- 0007: Add DISTINCT support for IVM
This patch adds DISTINCT clause support.
When an IMMV including DISTINCT is created, a hidden column
"__ivm_count__" is added to the target list. This column has the
number of duplicity of the same tuples. The duplicity is calculated
by adding "count(*)" and GROUP BY to the view query.When an IMMV is maintained, the duplicity in __ivm_count__ is updated,
and a tuples whose duplicity becomes zero can be deleted from the view.
This logic is implemented by SQL in apply_old_delta_with_count and
apply_new_delta_with_count.Columns starting with "__ivm_" are deemed hidden columns that doesn't
appear when a view is accessed by "SELECT * FROM ....". This is
implemented by fixing parse_relation.c.- 0008: Add aggregates support in IVM
This patch provides codes for aggregates support, specifically
for builtin count, sum, and avg.When an IMMV containing an aggregate is created, it is checked if this
aggregate function is supported, and if it is ok, some hidden columns
are added to the target list.When the IMMV is maintained, the aggregated value is updated as well as
related hidden columns. The way of update depends the type of aggregate
functions, and SET clause string is generated for each aggregate.- 0009: Add support for min/max aggregates for IVM
This patch adds min/max aggregates support.
This is separated from #0008 because min/max needs more complicated
work than count, sum, and avg.If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation.
This is performed in recalc_and_set_values().TIDs and keys of tuples that need re-calculation are returned as a
result of the query that deleted min/max values from the view using
RETURNING clause. The plan to recalculate and set the new min/max value
are stored and reused.- 0010: regression tests
This patch provides regression tests for IVM.
- 0011: documentation
This patch provides documantation for IVM.
---------------------------------------------------------------------------------------
* Changes from the Previous Version (v27)- Allow TRUNCATE on base tables
When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.- Fix bugs reported by huyajun [1]
[1] /messages/by-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
---------------------------------------------------------------------------------------
* Discussion** Aggregate support
There were a few suggestions that general aggregate functions should be
supported [2][3], which may be possible by extending pg_aggregate catalog.
However, we decided to leave supporting general aggregates to the future work [4]
because it would need substantial works and make the patch more complex and
bigger.There has been no opposite opinion on this. However, if we need more discussion
on the design of aggregate support, we can omit aggregate support for the first
release of IVM.[2] /messages/by-id/20191128140333.GA25947@alvherre.pgsql
[3] /messages/by-id/CAM-w4HOvDrL4ou6m=592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg@mail.gmail.com
[4] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Hidden columns
In order to support DISTINCT or aggregates, our implementation uses hidden columns.
Columns starting with "__ivm_" are hidden columns that doesn't appear when a
view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is
fixed. There was a proposal to enable hidden columns by adding a new flag to
pg_attribute [5], but this thread is no longer active, so we decided to check
the hidden column by its name [6].[5] /messages/by-id/CAEepm=3ZHh=p0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA@mail.gmail.com
[6] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Concurrent Transactions
When the view definition has more than one table, we acquire an exclusive
lock before the view maintenance in order to avoid inconsistent results.
This behavior was explained in [7]. The lock was improved to use weaker lock
when the view has only one table based on a suggestion from Konstantin Knizhnik [8].
However, due to the implementation that uses ctid for identifying target tuples,
we still have to use an exclusive lock for DELETE and UPDATE.[7] /messages/by-id/20200909092752.c91758a1bec3479668e82643@sraoss.co.jp
[8] /messages/by-id/5663f5f0-48af-686c-bf3c-62d279567e2a@postgrespro.ru** Automatic Index Creation
When a view is created, a unique index is automatically created if
possible, that is, if the view definition query has a GROUP BY or
DISTINCT, or if the view contains all primary key attributes of
its base tables in the target list. It is necessary for efficient
view maintenance. This feature is based on a suggestion from
Konstantin Knizhnik [9].[9] /messages/by-id/89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
** Trigger and Transition Tables
We implemented IVM based on triggers. This is because we want to use
transition tables to extract changes on base tables. Also, there are
other constraint that are using triggers in its implementation, like
foreign references. However, if we can use transition table like feature
without relying triggers, we don't have to insist to use triggers and we
might implement IVM in the executor directly as similar as declarative
partitioning.** Feature to be Supported in the First Release
The current patch-set supports DISTINCT and aggregates for built-in count,
sum, avg, min and max. Do we need all these feature for the first IVM release?
Supporting DISTINCT and aggregates needs discussion on hidden columns, and
for supporting min/max we need to discuss on re-calculation method. Before
handling such relatively advanced feature, maybe, should we focus to design
and implement of the basic feature of IVM?Any suggestion and discussion are welcomed!
Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)Also, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.Does this also apply to tableoid? but tableoid is a constant, so it
should be fine?
can following two queries apply to this feature.
select tableoid, unique1 from tenk1;Currently, this is not allowed because tableoid is a system column.
As you say, tableoid is a constant, so we can allow. Should we do this?select 1 as constant, unique1 from tenk1;
This is allowed, of course.
I didn't apply the patch.(will do later, for someone to test, it would
be a better idea to dump a whole file separately....).Thank you! I'm looking forward to your feedback.
(I didn't attach a whole patch separately because I wouldn't like
cfbot to be unhappy...)Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>
I played around first half of regress patch.
these all following queries fails.
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a;
so the hidden column reserved pattern "__ivm_count.*"? that would be a lot....
select * from pg_matviews where matviewname = 'mv_ivm_1';
don't have relisivm option. it's reasonable to make it in view pg_matviews?
On Thu, Jun 29, 2023 at 12:40 AM jian he <jian.universality@gmail.com> wrote:
On Wed, Jun 28, 2023 at 4:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Wed, 28 Jun 2023 00:01:02 +0800
jian he <jian.universality@gmail.com> wrote:On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Thu, 1 Jun 2023 23:59:09 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:Hello hackers,
Here's a rebased version of the patch-set adding Incremental View
Maintenance support for PostgreSQL. That was discussed in [1].[1] /messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp
---------------------------------------------------------------------------------------
* OverviewIncremental View Maintenance (IVM) is a way to make materialized views
up-to-date by computing only incremental changes and applying them on
views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
only small parts of the view are changed.** Feature
The attached patchset provides a feature that allows materialized views
to be updated automatically and incrementally just after a underlying
table is modified.You can create an incementally maintainable materialized view (IMMV)
by using CREATE INCREMENTAL MATERIALIZED VIEW command.The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
- some built-in aggregate functions (count, sum, avg, min, max)
- GROUP BY clause
- DISTINCT clauseViews can contain multiple tuples with the same content (duplicate tuples).
** Restriction
The following are not supported in a view definition:
- Outer joins
- Aggregates otehr than above, window functions, HAVING
- Sub-queries, CTEs
- Set operations (UNION, INTERSECT, EXCEPT)
- DISTINCT ON, ORDER BY, LIMIT, OFFSETAlso, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.---------------------------------------------------------------------------------------
* DesignAn IMMV is maintained using statement-level AFTER triggers.
When an IMMV is created, triggers are automatically created on all base
tables contained in the view definition query.When a table is modified, changes that occurred in the table are extracted
as transition tables in the AFTER triggers. Then, changes that will occur in
the view are calculated by a rewritten view dequery in which the modified table
is replaced with the transition table.For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted
into R are stored in a transiton table dR, the tuples that will be inserted into
the view are calculated as the result of "SELECT * FROM dR, S".** Multiple Tables Modification
Multiple tables can be modified in a statement when using triggers, foreign key
constraint, or modifying CTEs. When multiple tables are modified, we need
the state of tables before the modification.For example, when some tuples, dR and dS, are inserted into R and S respectively,
the tuples that will be inserted into the view are calculated by the following
two queries:"SELECT * FROM dR, S_pre"
"SELECT * FROM R, dS"where S_pre is the table before the modification, R is the current state of
table, that is, after the modification. This pre-update states of table
is calculated by filtering inserted tuples and appending deleted tuples.
The subquery that represents pre-update state is generated in get_prestate_rte().
Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate()
in WHERE clause. This function checks the visibility of tuples by using
the snapshot taken before table modification. The deleted tuples are contained
in the old transition table, and this table is appended using UNION ALL.Transition tables for each modification are collected in each AFTER trigger
function call. Then, the view maintenance is performed in the last call of
the trigger.In the original PostgreSQL, tuplestores of transition tables are freed at the
end of each nested query. However, their lifespan needs to be prolonged to
the end of the out-most query in order to maintain the view in the last AFTER
trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c.** Duplicate Tulpes
When calculating changes that will occur in the view (= delta tables),
multiplicity of tuples are calculated by using count(*).When deleting tuples from the view, tuples to be deleted are identified by
joining the delta table with the view, and tuples are deleted as many as
specified multiplicity by numbered using row_number() function.
This is implemented in apply_old_delta().When inserting tuples into the view, each tuple is duplicated to the
specified multiplicity using generate_series() function. This is implemented
in apply_new_delta().** DISTINCT clause
When DISTINCT is used, the view has a hidden column __ivm_count__ that
stores multiplicity for tuples. When tuples are deleted from or inserted into
the view, the values of __ivm_count__ column is decreased or increased as many
as specified multiplicity. Eventually, when the values becomes zero, the
corresponding tuple is deleted from the view. This is implemented in
apply_old_delta_with_count() and apply_new_delta_with_count().** Aggregates
Built-in count sum, avg, min, and max are supported. Whether a given
aggregate function can be used or not is checked by using its OID in
check_aggregate_supports_ivm().When creating a materialized view containing aggregates, in addition
to __ivm_count__, more than one hidden columns for each aggregate are
added to the target list. For example, columns for storing sum(x),
count(x) are added if we have avg(x). When the view is maintained,
aggregated values are updated using these hidden columns, also hidden
columns are updated at the same time.The maintenance of aggregated view is performed in
apply_old_delta_with_count() and apply_new_delta_with_count(). The SET
clauses for updating columns are generated by append_set_clause_*().If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation. This
is performed in recalc_and_set_values().---------------------------------------------------------------------------------------
* Details of the patch-set (v28)The patch-set consists of the following eleven patches.
In the previous version, the number of patches were nine.
In the latest patch-set, the patches are divided more finely
aiming to make the review easier.- 0001: Add a syntax to create Incrementally Maintainable Materialized Views
The prposed syntax to create an incrementally maintainable materialized
view (IMMV) is;CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
However, this syntax is tentative, so any suggestions are welcomed.
- 0002: Add relisivm column to pg_class system catalog
We add a new field in pg_class to indicate a relation is IMMV.
Another alternative is to add a new catalog for managing materialized
views including IMMV, but I am not sure if we want this.- 0003: Allow to prolong life span of transition tables until transaction end
This patch fixes the trigger system to allow to prolong lifespan of
tuple stores for transition tables until the transaction end. We need
this because multiple transition tables have to be preserved until the
end of the out-most query when multiple tables are modified by nested
triggers. (as explained above in Design - Multiple Tables Modification)If we don't want to change the trigger system in such way, the alternative
is to copy the contents of transition tables to other tuplestores, although
it needs more time and memory.- 0004: Add Incremental View Maintenance support to pg_dump
This patch enables pg_dump to output IMMV using the new syntax.
- 0005: Add Incremental View Maintenance support to psql
This patch implements tab-completion for the new syntax and adds
information of IMMV to \d meta-command results.- 0006: Add Incremental View Maintenance support
This patch implements the basic IVM feature.
DISTINCT and aggregate are not supported here.When an IMMV is created, the view query is checked, and if any
non-supported feature is used, it raises an error. If it is ok,
triggers are created on base tables and an unique index is
created on the view if possible.In BEFORE trigger, an entry is created for each IMMV and the number
of trigger firing is counted. Also, the snapshot just before the
table modification is stored.In AFTER triggers, each transition tables are preserved. The number
of trigger firing is counted also here, and when the firing number of
BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
trigger call.In the final AFTER trigger, the IMMV is maintained. Rewritten view
query is executed to generate delta tables, and deltas are applied
to the view. If multiple tables are modified simultaneously, this
process is iterated for each modified table. Tables before processed
are represented in "pre-update-state", processed tables are
"post-update-state" in the rewritten query.- 0007: Add DISTINCT support for IVM
This patch adds DISTINCT clause support.
When an IMMV including DISTINCT is created, a hidden column
"__ivm_count__" is added to the target list. This column has the
number of duplicity of the same tuples. The duplicity is calculated
by adding "count(*)" and GROUP BY to the view query.When an IMMV is maintained, the duplicity in __ivm_count__ is updated,
and a tuples whose duplicity becomes zero can be deleted from the view.
This logic is implemented by SQL in apply_old_delta_with_count and
apply_new_delta_with_count.Columns starting with "__ivm_" are deemed hidden columns that doesn't
appear when a view is accessed by "SELECT * FROM ....". This is
implemented by fixing parse_relation.c.- 0008: Add aggregates support in IVM
This patch provides codes for aggregates support, specifically
for builtin count, sum, and avg.When an IMMV containing an aggregate is created, it is checked if this
aggregate function is supported, and if it is ok, some hidden columns
are added to the target list.When the IMMV is maintained, the aggregated value is updated as well as
related hidden columns. The way of update depends the type of aggregate
functions, and SET clause string is generated for each aggregate.- 0009: Add support for min/max aggregates for IVM
This patch adds min/max aggregates support.
This is separated from #0008 because min/max needs more complicated
work than count, sum, and avg.If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation.
This is performed in recalc_and_set_values().TIDs and keys of tuples that need re-calculation are returned as a
result of the query that deleted min/max values from the view using
RETURNING clause. The plan to recalculate and set the new min/max value
are stored and reused.- 0010: regression tests
This patch provides regression tests for IVM.
- 0011: documentation
This patch provides documantation for IVM.
---------------------------------------------------------------------------------------
* Changes from the Previous Version (v27)- Allow TRUNCATE on base tables
When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.- Fix bugs reported by huyajun [1]
[1] /messages/by-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
---------------------------------------------------------------------------------------
* Discussion** Aggregate support
There were a few suggestions that general aggregate functions should be
supported [2][3], which may be possible by extending pg_aggregate catalog.
However, we decided to leave supporting general aggregates to the future work [4]
because it would need substantial works and make the patch more complex and
bigger.There has been no opposite opinion on this. However, if we need more discussion
on the design of aggregate support, we can omit aggregate support for the first
release of IVM.[2] /messages/by-id/20191128140333.GA25947@alvherre.pgsql
[3] /messages/by-id/CAM-w4HOvDrL4ou6m=592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg@mail.gmail.com
[4] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Hidden columns
In order to support DISTINCT or aggregates, our implementation uses hidden columns.
Columns starting with "__ivm_" are hidden columns that doesn't appear when a
view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is
fixed. There was a proposal to enable hidden columns by adding a new flag to
pg_attribute [5], but this thread is no longer active, so we decided to check
the hidden column by its name [6].[5] /messages/by-id/CAEepm=3ZHh=p0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA@mail.gmail.com
[6] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Concurrent Transactions
When the view definition has more than one table, we acquire an exclusive
lock before the view maintenance in order to avoid inconsistent results.
This behavior was explained in [7]. The lock was improved to use weaker lock
when the view has only one table based on a suggestion from Konstantin Knizhnik [8].
However, due to the implementation that uses ctid for identifying target tuples,
we still have to use an exclusive lock for DELETE and UPDATE.[7] /messages/by-id/20200909092752.c91758a1bec3479668e82643@sraoss.co.jp
[8] /messages/by-id/5663f5f0-48af-686c-bf3c-62d279567e2a@postgrespro.ru** Automatic Index Creation
When a view is created, a unique index is automatically created if
possible, that is, if the view definition query has a GROUP BY or
DISTINCT, or if the view contains all primary key attributes of
its base tables in the target list. It is necessary for efficient
view maintenance. This feature is based on a suggestion from
Konstantin Knizhnik [9].[9] /messages/by-id/89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
** Trigger and Transition Tables
We implemented IVM based on triggers. This is because we want to use
transition tables to extract changes on base tables. Also, there are
other constraint that are using triggers in its implementation, like
foreign references. However, if we can use transition table like feature
without relying triggers, we don't have to insist to use triggers and we
might implement IVM in the executor directly as similar as declarative
partitioning.** Feature to be Supported in the First Release
The current patch-set supports DISTINCT and aggregates for built-in count,
sum, avg, min and max. Do we need all these feature for the first IVM release?
Supporting DISTINCT and aggregates needs discussion on hidden columns, and
for supporting min/max we need to discuss on re-calculation method. Before
handling such relatively advanced feature, maybe, should we focus to design
and implement of the basic feature of IVM?Any suggestion and discussion are welcomed!
Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)Also, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.Does this also apply to tableoid? but tableoid is a constant, so it
should be fine?
can following two queries apply to this feature.
select tableoid, unique1 from tenk1;Currently, this is not allowed because tableoid is a system column.
As you say, tableoid is a constant, so we can allow. Should we do this?select 1 as constant, unique1 from tenk1;
This is allowed, of course.
I didn't apply the patch.(will do later, for someone to test, it would
be a better idea to dump a whole file separately....).Thank you! I'm looking forward to your feedback.
(I didn't attach a whole patch separately because I wouldn't like
cfbot to be unhappy...)Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>I played around first half of regress patch.
these all following queries fails.CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a;so the hidden column reserved pattern "__ivm_count.*"? that would be a lot....
select * from pg_matviews where matviewname = 'mv_ivm_1';
don't have relisivm option. it's reasonable to make it in view pg_matviews?
another trivial:
incremental_matview.out (last few lines) last transaction seems to
need COMMIT command.
I cannot build the doc.
git clean -fdx
git am ~/Desktop/tmp/*.patch
Applying: Add a syntax to create Incrementally Maintainable Materialized Views
Applying: Add relisivm column to pg_class system catalog
Applying: Allow to prolong life span of transition tables until transaction end
Applying: Add Incremental View Maintenance support to pg_dump
Applying: Add Incremental View Maintenance support to psql
Applying: Add Incremental View Maintenance support
Applying: Add DISTINCT support for IVM
Applying: Add aggregates support in IVM
Applying: Add support for min/max aggregates for IVM
Applying: Add regression tests for Incremental View Maintenance
Applying: Add documentations about Incremental View Maintenance
.git/rebase-apply/patch:79: trailing whitespace.
clause.
warning: 1 line adds whitespace errors.
Because of this, the {ninja docs} command failed. ERROR message:
[6/6] Generating doc/src/sgml/html with a custom command
FAILED: doc/src/sgml/html
/usr/bin/python3
../../Desktop/pg_sources/main/postgres/doc/src/sgml/xmltools_dep_wrapper
--targetname doc/src/sgml/html --depfile doc/src/sgml/html.d --tool
/usr/bin/xsltproc -- -o doc/src/sgml/ --nonet --stringparam pg.version
16beta2 --path doc/src/sgml --path
../../Desktop/pg_sources/main/postgres/doc/src/sgml
../../Desktop/pg_sources/main/postgres/doc/src/sgml/stylesheet.xsl
doc/src/sgml/postgres-full.xml
ERROR: id attribute missing on <sect2> element under /book[@id =
'postgres']/part[@id = 'server-programming']/chapter[@id =
'rules']/sect1[@id = 'rules-ivm']
error: file doc/src/sgml/postgres-full.xml
xsltRunStylesheet : run failed
ninja: build stopped: subcommand failed.
On Thu, Jun 29, 2023 at 6:51 PM jian he <jian.universality@gmail.com> wrote:
I cannot build the doc.
git clean -fdx
git am ~/Desktop/tmp/*.patchApplying: Add a syntax to create Incrementally Maintainable Materialized Views
Applying: Add relisivm column to pg_class system catalog
Applying: Allow to prolong life span of transition tables until transaction end
Applying: Add Incremental View Maintenance support to pg_dump
Applying: Add Incremental View Maintenance support to psql
Applying: Add Incremental View Maintenance support
Applying: Add DISTINCT support for IVM
Applying: Add aggregates support in IVM
Applying: Add support for min/max aggregates for IVM
Applying: Add regression tests for Incremental View Maintenance
Applying: Add documentations about Incremental View Maintenance
.git/rebase-apply/patch:79: trailing whitespace.
clause.
warning: 1 line adds whitespace errors.Because of this, the {ninja docs} command failed. ERROR message:
[6/6] Generating doc/src/sgml/html with a custom command
FAILED: doc/src/sgml/html
/usr/bin/python3
../../Desktop/pg_sources/main/postgres/doc/src/sgml/xmltools_dep_wrapper
--targetname doc/src/sgml/html --depfile doc/src/sgml/html.d --tool
/usr/bin/xsltproc -- -o doc/src/sgml/ --nonet --stringparam pg.version
16beta2 --path doc/src/sgml --path
../../Desktop/pg_sources/main/postgres/doc/src/sgml
../../Desktop/pg_sources/main/postgres/doc/src/sgml/stylesheet.xsl
doc/src/sgml/postgres-full.xml
ERROR: id attribute missing on <sect2> element under /book[@id =
'postgres']/part[@id = 'server-programming']/chapter[@id =
'rules']/sect1[@id = 'rules-ivm']
error: file doc/src/sgml/postgres-full.xml
xsltRunStylesheet : run failed
ninja: build stopped: subcommand failed.
so far what I tried:
git am --ignore-whitespace --whitespace=nowarn ~/Desktop/tmp/*.patch
git am --whitespace=fix ~/Desktop/tmp/*.patch
git am --whitespace=error ~/Desktop/tmp/*.patch
I still cannot generate html docs.
Hi there.
in v28-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
I don't know how to set psql to get the output
"Incremental view maintenance: yes"
This is probably not trivial.
In function apply_new_delta_with_count.
appendStringInfo(&querybuf,
"WITH updt AS (" /* update a tuple if this exists in the view */
"UPDATE %s AS mv SET %s = mv.%s OPERATOR(pg_catalog.+) diff.%s "
"%s " /* SET clauses for aggregates */
"FROM %s AS diff "
"WHERE %s " /* tuple matching condition */
"RETURNING %s" /* returning keys of updated tuples */
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "
"WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE %s);",
---------------------
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "
the INSERT INTO line, should have one white space in the end?
also "existw" should be "exists"
ok. Now I really found a small bug.
this works as intended:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as
min_j FROM mv_base_a group by 1;
INSERT INTO mv_base_a select 1,-2 where false;
rollback;
however the following one:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as
min_j FROM mv_base_a;
INSERT INTO mv_base_a select 1, -2 where false;
rollback;
will evaluate
tuplestore_tuple_count(new_tuplestores) to 1, it will walk through
IVM_immediate_maintenance function to apply_delta.
but should it be zero?
On Thu, 29 Jun 2023 00:40:45 +0800
jian he <jian.universality@gmail.com> wrote:
On Wed, Jun 28, 2023 at 4:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Wed, 28 Jun 2023 00:01:02 +0800
jian he <jian.universality@gmail.com> wrote:On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Thu, 1 Jun 2023 23:59:09 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:Hello hackers,
Here's a rebased version of the patch-set adding Incremental View
Maintenance support for PostgreSQL. That was discussed in [1].[1] /messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp
---------------------------------------------------------------------------------------
* OverviewIncremental View Maintenance (IVM) is a way to make materialized views
up-to-date by computing only incremental changes and applying them on
views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
only small parts of the view are changed.** Feature
The attached patchset provides a feature that allows materialized views
to be updated automatically and incrementally just after a underlying
table is modified.You can create an incementally maintainable materialized view (IMMV)
by using CREATE INCREMENTAL MATERIALIZED VIEW command.The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
- some built-in aggregate functions (count, sum, avg, min, max)
- GROUP BY clause
- DISTINCT clauseViews can contain multiple tuples with the same content (duplicate tuples).
** Restriction
The following are not supported in a view definition:
- Outer joins
- Aggregates otehr than above, window functions, HAVING
- Sub-queries, CTEs
- Set operations (UNION, INTERSECT, EXCEPT)
- DISTINCT ON, ORDER BY, LIMIT, OFFSETAlso, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.---------------------------------------------------------------------------------------
* DesignAn IMMV is maintained using statement-level AFTER triggers.
When an IMMV is created, triggers are automatically created on all base
tables contained in the view definition query.When a table is modified, changes that occurred in the table are extracted
as transition tables in the AFTER triggers. Then, changes that will occur in
the view are calculated by a rewritten view dequery in which the modified table
is replaced with the transition table.For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted
into R are stored in a transiton table dR, the tuples that will be inserted into
the view are calculated as the result of "SELECT * FROM dR, S".** Multiple Tables Modification
Multiple tables can be modified in a statement when using triggers, foreign key
constraint, or modifying CTEs. When multiple tables are modified, we need
the state of tables before the modification.For example, when some tuples, dR and dS, are inserted into R and S respectively,
the tuples that will be inserted into the view are calculated by the following
two queries:"SELECT * FROM dR, S_pre"
"SELECT * FROM R, dS"where S_pre is the table before the modification, R is the current state of
table, that is, after the modification. This pre-update states of table
is calculated by filtering inserted tuples and appending deleted tuples.
The subquery that represents pre-update state is generated in get_prestate_rte().
Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate()
in WHERE clause. This function checks the visibility of tuples by using
the snapshot taken before table modification. The deleted tuples are contained
in the old transition table, and this table is appended using UNION ALL.Transition tables for each modification are collected in each AFTER trigger
function call. Then, the view maintenance is performed in the last call of
the trigger.In the original PostgreSQL, tuplestores of transition tables are freed at the
end of each nested query. However, their lifespan needs to be prolonged to
the end of the out-most query in order to maintain the view in the last AFTER
trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c.** Duplicate Tulpes
When calculating changes that will occur in the view (= delta tables),
multiplicity of tuples are calculated by using count(*).When deleting tuples from the view, tuples to be deleted are identified by
joining the delta table with the view, and tuples are deleted as many as
specified multiplicity by numbered using row_number() function.
This is implemented in apply_old_delta().When inserting tuples into the view, each tuple is duplicated to the
specified multiplicity using generate_series() function. This is implemented
in apply_new_delta().** DISTINCT clause
When DISTINCT is used, the view has a hidden column __ivm_count__ that
stores multiplicity for tuples. When tuples are deleted from or inserted into
the view, the values of __ivm_count__ column is decreased or increased as many
as specified multiplicity. Eventually, when the values becomes zero, the
corresponding tuple is deleted from the view. This is implemented in
apply_old_delta_with_count() and apply_new_delta_with_count().** Aggregates
Built-in count sum, avg, min, and max are supported. Whether a given
aggregate function can be used or not is checked by using its OID in
check_aggregate_supports_ivm().When creating a materialized view containing aggregates, in addition
to __ivm_count__, more than one hidden columns for each aggregate are
added to the target list. For example, columns for storing sum(x),
count(x) are added if we have avg(x). When the view is maintained,
aggregated values are updated using these hidden columns, also hidden
columns are updated at the same time.The maintenance of aggregated view is performed in
apply_old_delta_with_count() and apply_new_delta_with_count(). The SET
clauses for updating columns are generated by append_set_clause_*().If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation. This
is performed in recalc_and_set_values().---------------------------------------------------------------------------------------
* Details of the patch-set (v28)The patch-set consists of the following eleven patches.
In the previous version, the number of patches were nine.
In the latest patch-set, the patches are divided more finely
aiming to make the review easier.- 0001: Add a syntax to create Incrementally Maintainable Materialized Views
The prposed syntax to create an incrementally maintainable materialized
view (IMMV) is;CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
However, this syntax is tentative, so any suggestions are welcomed.
- 0002: Add relisivm column to pg_class system catalog
We add a new field in pg_class to indicate a relation is IMMV.
Another alternative is to add a new catalog for managing materialized
views including IMMV, but I am not sure if we want this.- 0003: Allow to prolong life span of transition tables until transaction end
This patch fixes the trigger system to allow to prolong lifespan of
tuple stores for transition tables until the transaction end. We need
this because multiple transition tables have to be preserved until the
end of the out-most query when multiple tables are modified by nested
triggers. (as explained above in Design - Multiple Tables Modification)If we don't want to change the trigger system in such way, the alternative
is to copy the contents of transition tables to other tuplestores, although
it needs more time and memory.- 0004: Add Incremental View Maintenance support to pg_dump
This patch enables pg_dump to output IMMV using the new syntax.
- 0005: Add Incremental View Maintenance support to psql
This patch implements tab-completion for the new syntax and adds
information of IMMV to \d meta-command results.- 0006: Add Incremental View Maintenance support
This patch implements the basic IVM feature.
DISTINCT and aggregate are not supported here.When an IMMV is created, the view query is checked, and if any
non-supported feature is used, it raises an error. If it is ok,
triggers are created on base tables and an unique index is
created on the view if possible.In BEFORE trigger, an entry is created for each IMMV and the number
of trigger firing is counted. Also, the snapshot just before the
table modification is stored.In AFTER triggers, each transition tables are preserved. The number
of trigger firing is counted also here, and when the firing number of
BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
trigger call.In the final AFTER trigger, the IMMV is maintained. Rewritten view
query is executed to generate delta tables, and deltas are applied
to the view. If multiple tables are modified simultaneously, this
process is iterated for each modified table. Tables before processed
are represented in "pre-update-state", processed tables are
"post-update-state" in the rewritten query.- 0007: Add DISTINCT support for IVM
This patch adds DISTINCT clause support.
When an IMMV including DISTINCT is created, a hidden column
"__ivm_count__" is added to the target list. This column has the
number of duplicity of the same tuples. The duplicity is calculated
by adding "count(*)" and GROUP BY to the view query.When an IMMV is maintained, the duplicity in __ivm_count__ is updated,
and a tuples whose duplicity becomes zero can be deleted from the view.
This logic is implemented by SQL in apply_old_delta_with_count and
apply_new_delta_with_count.Columns starting with "__ivm_" are deemed hidden columns that doesn't
appear when a view is accessed by "SELECT * FROM ....". This is
implemented by fixing parse_relation.c.- 0008: Add aggregates support in IVM
This patch provides codes for aggregates support, specifically
for builtin count, sum, and avg.When an IMMV containing an aggregate is created, it is checked if this
aggregate function is supported, and if it is ok, some hidden columns
are added to the target list.When the IMMV is maintained, the aggregated value is updated as well as
related hidden columns. The way of update depends the type of aggregate
functions, and SET clause string is generated for each aggregate.- 0009: Add support for min/max aggregates for IVM
This patch adds min/max aggregates support.
This is separated from #0008 because min/max needs more complicated
work than count, sum, and avg.If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation.
This is performed in recalc_and_set_values().TIDs and keys of tuples that need re-calculation are returned as a
result of the query that deleted min/max values from the view using
RETURNING clause. The plan to recalculate and set the new min/max value
are stored and reused.- 0010: regression tests
This patch provides regression tests for IVM.
- 0011: documentation
This patch provides documantation for IVM.
---------------------------------------------------------------------------------------
* Changes from the Previous Version (v27)- Allow TRUNCATE on base tables
When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.- Fix bugs reported by huyajun [1]
[1] /messages/by-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
---------------------------------------------------------------------------------------
* Discussion** Aggregate support
There were a few suggestions that general aggregate functions should be
supported [2][3], which may be possible by extending pg_aggregate catalog.
However, we decided to leave supporting general aggregates to the future work [4]
because it would need substantial works and make the patch more complex and
bigger.There has been no opposite opinion on this. However, if we need more discussion
on the design of aggregate support, we can omit aggregate support for the first
release of IVM.[2] /messages/by-id/20191128140333.GA25947@alvherre.pgsql
[3] /messages/by-id/CAM-w4HOvDrL4ou6m=592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg@mail.gmail.com
[4] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Hidden columns
In order to support DISTINCT or aggregates, our implementation uses hidden columns.
Columns starting with "__ivm_" are hidden columns that doesn't appear when a
view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is
fixed. There was a proposal to enable hidden columns by adding a new flag to
pg_attribute [5], but this thread is no longer active, so we decided to check
the hidden column by its name [6].[5] /messages/by-id/CAEepm=3ZHh=p0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA@mail.gmail.com
[6] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Concurrent Transactions
When the view definition has more than one table, we acquire an exclusive
lock before the view maintenance in order to avoid inconsistent results.
This behavior was explained in [7]. The lock was improved to use weaker lock
when the view has only one table based on a suggestion from Konstantin Knizhnik [8].
However, due to the implementation that uses ctid for identifying target tuples,
we still have to use an exclusive lock for DELETE and UPDATE.[7] /messages/by-id/20200909092752.c91758a1bec3479668e82643@sraoss.co.jp
[8] /messages/by-id/5663f5f0-48af-686c-bf3c-62d279567e2a@postgrespro.ru** Automatic Index Creation
When a view is created, a unique index is automatically created if
possible, that is, if the view definition query has a GROUP BY or
DISTINCT, or if the view contains all primary key attributes of
its base tables in the target list. It is necessary for efficient
view maintenance. This feature is based on a suggestion from
Konstantin Knizhnik [9].[9] /messages/by-id/89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
** Trigger and Transition Tables
We implemented IVM based on triggers. This is because we want to use
transition tables to extract changes on base tables. Also, there are
other constraint that are using triggers in its implementation, like
foreign references. However, if we can use transition table like feature
without relying triggers, we don't have to insist to use triggers and we
might implement IVM in the executor directly as similar as declarative
partitioning.** Feature to be Supported in the First Release
The current patch-set supports DISTINCT and aggregates for built-in count,
sum, avg, min and max. Do we need all these feature for the first IVM release?
Supporting DISTINCT and aggregates needs discussion on hidden columns, and
for supporting min/max we need to discuss on re-calculation method. Before
handling such relatively advanced feature, maybe, should we focus to design
and implement of the basic feature of IVM?Any suggestion and discussion are welcomed!
Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)Also, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.Does this also apply to tableoid? but tableoid is a constant, so it
should be fine?
can following two queries apply to this feature.
select tableoid, unique1 from tenk1;Currently, this is not allowed because tableoid is a system column.
As you say, tableoid is a constant, so we can allow. Should we do this?select 1 as constant, unique1 from tenk1;
This is allowed, of course.
I didn't apply the patch.(will do later, for someone to test, it would
be a better idea to dump a whole file separately....).Thank you! I'm looking forward to your feedback.
(I didn't attach a whole patch separately because I wouldn't like
cfbot to be unhappy...)Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>I played around first half of regress patch.
I'm so sorry for the late reply.
these all following queries fails.
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a;so the hidden column reserved pattern "__ivm_count.*"? that would be a lot....
Column names which start with "__ivm_" are prohibited because hidden columns
using this pattern are used for handling views with aggregate or DISTINCT.
Even when neither aggregate or DISINCT is used, such column name is used
for handling tuple duplicates in views. So, if we choose not to allow
tuple duplicates in the initial version of IVM, we would remove this
restriction for now....
select * from pg_matviews where matviewname = 'mv_ivm_1';
don't have relisivm option. it's reasonable to make it in view pg_matviews?
Make sense. I'll do it.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 29 Jun 2023 18:20:32 +0800
jian he <jian.universality@gmail.com> wrote:
On Thu, Jun 29, 2023 at 12:40 AM jian he <jian.universality@gmail.com> wrote:
On Wed, Jun 28, 2023 at 4:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Wed, 28 Jun 2023 00:01:02 +0800
jian he <jian.universality@gmail.com> wrote:On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Thu, 1 Jun 2023 23:59:09 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:Hello hackers,
Here's a rebased version of the patch-set adding Incremental View
Maintenance support for PostgreSQL. That was discussed in [1].[1] /messages/by-id/20181227215726.4d166b4874f8983a641123f5@sraoss.co.jp
---------------------------------------------------------------------------------------
* OverviewIncremental View Maintenance (IVM) is a way to make materialized views
up-to-date by computing only incremental changes and applying them on
views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
only small parts of the view are changed.** Feature
The attached patchset provides a feature that allows materialized views
to be updated automatically and incrementally just after a underlying
table is modified.You can create an incementally maintainable materialized view (IMMV)
by using CREATE INCREMENTAL MATERIALIZED VIEW command.The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
- some built-in aggregate functions (count, sum, avg, min, max)
- GROUP BY clause
- DISTINCT clauseViews can contain multiple tuples with the same content (duplicate tuples).
** Restriction
The following are not supported in a view definition:
- Outer joins
- Aggregates otehr than above, window functions, HAVING
- Sub-queries, CTEs
- Set operations (UNION, INTERSECT, EXCEPT)
- DISTINCT ON, ORDER BY, LIMIT, OFFSETAlso, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.---------------------------------------------------------------------------------------
* DesignAn IMMV is maintained using statement-level AFTER triggers.
When an IMMV is created, triggers are automatically created on all base
tables contained in the view definition query.When a table is modified, changes that occurred in the table are extracted
as transition tables in the AFTER triggers. Then, changes that will occur in
the view are calculated by a rewritten view dequery in which the modified table
is replaced with the transition table.For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted
into R are stored in a transiton table dR, the tuples that will be inserted into
the view are calculated as the result of "SELECT * FROM dR, S".** Multiple Tables Modification
Multiple tables can be modified in a statement when using triggers, foreign key
constraint, or modifying CTEs. When multiple tables are modified, we need
the state of tables before the modification.For example, when some tuples, dR and dS, are inserted into R and S respectively,
the tuples that will be inserted into the view are calculated by the following
two queries:"SELECT * FROM dR, S_pre"
"SELECT * FROM R, dS"where S_pre is the table before the modification, R is the current state of
table, that is, after the modification. This pre-update states of table
is calculated by filtering inserted tuples and appending deleted tuples.
The subquery that represents pre-update state is generated in get_prestate_rte().
Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate()
in WHERE clause. This function checks the visibility of tuples by using
the snapshot taken before table modification. The deleted tuples are contained
in the old transition table, and this table is appended using UNION ALL.Transition tables for each modification are collected in each AFTER trigger
function call. Then, the view maintenance is performed in the last call of
the trigger.In the original PostgreSQL, tuplestores of transition tables are freed at the
end of each nested query. However, their lifespan needs to be prolonged to
the end of the out-most query in order to maintain the view in the last AFTER
trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c.** Duplicate Tulpes
When calculating changes that will occur in the view (= delta tables),
multiplicity of tuples are calculated by using count(*).When deleting tuples from the view, tuples to be deleted are identified by
joining the delta table with the view, and tuples are deleted as many as
specified multiplicity by numbered using row_number() function.
This is implemented in apply_old_delta().When inserting tuples into the view, each tuple is duplicated to the
specified multiplicity using generate_series() function. This is implemented
in apply_new_delta().** DISTINCT clause
When DISTINCT is used, the view has a hidden column __ivm_count__ that
stores multiplicity for tuples. When tuples are deleted from or inserted into
the view, the values of __ivm_count__ column is decreased or increased as many
as specified multiplicity. Eventually, when the values becomes zero, the
corresponding tuple is deleted from the view. This is implemented in
apply_old_delta_with_count() and apply_new_delta_with_count().** Aggregates
Built-in count sum, avg, min, and max are supported. Whether a given
aggregate function can be used or not is checked by using its OID in
check_aggregate_supports_ivm().When creating a materialized view containing aggregates, in addition
to __ivm_count__, more than one hidden columns for each aggregate are
added to the target list. For example, columns for storing sum(x),
count(x) are added if we have avg(x). When the view is maintained,
aggregated values are updated using these hidden columns, also hidden
columns are updated at the same time.The maintenance of aggregated view is performed in
apply_old_delta_with_count() and apply_new_delta_with_count(). The SET
clauses for updating columns are generated by append_set_clause_*().If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation. This
is performed in recalc_and_set_values().---------------------------------------------------------------------------------------
* Details of the patch-set (v28)The patch-set consists of the following eleven patches.
In the previous version, the number of patches were nine.
In the latest patch-set, the patches are divided more finely
aiming to make the review easier.- 0001: Add a syntax to create Incrementally Maintainable Materialized Views
The prposed syntax to create an incrementally maintainable materialized
view (IMMV) is;CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
However, this syntax is tentative, so any suggestions are welcomed.
- 0002: Add relisivm column to pg_class system catalog
We add a new field in pg_class to indicate a relation is IMMV.
Another alternative is to add a new catalog for managing materialized
views including IMMV, but I am not sure if we want this.- 0003: Allow to prolong life span of transition tables until transaction end
This patch fixes the trigger system to allow to prolong lifespan of
tuple stores for transition tables until the transaction end. We need
this because multiple transition tables have to be preserved until the
end of the out-most query when multiple tables are modified by nested
triggers. (as explained above in Design - Multiple Tables Modification)If we don't want to change the trigger system in such way, the alternative
is to copy the contents of transition tables to other tuplestores, although
it needs more time and memory.- 0004: Add Incremental View Maintenance support to pg_dump
This patch enables pg_dump to output IMMV using the new syntax.
- 0005: Add Incremental View Maintenance support to psql
This patch implements tab-completion for the new syntax and adds
information of IMMV to \d meta-command results.- 0006: Add Incremental View Maintenance support
This patch implements the basic IVM feature.
DISTINCT and aggregate are not supported here.When an IMMV is created, the view query is checked, and if any
non-supported feature is used, it raises an error. If it is ok,
triggers are created on base tables and an unique index is
created on the view if possible.In BEFORE trigger, an entry is created for each IMMV and the number
of trigger firing is counted. Also, the snapshot just before the
table modification is stored.In AFTER triggers, each transition tables are preserved. The number
of trigger firing is counted also here, and when the firing number of
BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
trigger call.In the final AFTER trigger, the IMMV is maintained. Rewritten view
query is executed to generate delta tables, and deltas are applied
to the view. If multiple tables are modified simultaneously, this
process is iterated for each modified table. Tables before processed
are represented in "pre-update-state", processed tables are
"post-update-state" in the rewritten query.- 0007: Add DISTINCT support for IVM
This patch adds DISTINCT clause support.
When an IMMV including DISTINCT is created, a hidden column
"__ivm_count__" is added to the target list. This column has the
number of duplicity of the same tuples. The duplicity is calculated
by adding "count(*)" and GROUP BY to the view query.When an IMMV is maintained, the duplicity in __ivm_count__ is updated,
and a tuples whose duplicity becomes zero can be deleted from the view.
This logic is implemented by SQL in apply_old_delta_with_count and
apply_new_delta_with_count.Columns starting with "__ivm_" are deemed hidden columns that doesn't
appear when a view is accessed by "SELECT * FROM ....". This is
implemented by fixing parse_relation.c.- 0008: Add aggregates support in IVM
This patch provides codes for aggregates support, specifically
for builtin count, sum, and avg.When an IMMV containing an aggregate is created, it is checked if this
aggregate function is supported, and if it is ok, some hidden columns
are added to the target list.When the IMMV is maintained, the aggregated value is updated as well as
related hidden columns. The way of update depends the type of aggregate
functions, and SET clause string is generated for each aggregate.- 0009: Add support for min/max aggregates for IVM
This patch adds min/max aggregates support.
This is separated from #0008 because min/max needs more complicated
work than count, sum, and avg.If the view has min(x) or max(x) and the minimum or maximal value is
deleted from a table, we need to update the value to the new min/max
recalculated from the tables rather than incremental computation.
This is performed in recalc_and_set_values().TIDs and keys of tuples that need re-calculation are returned as a
result of the query that deleted min/max values from the view using
RETURNING clause. The plan to recalculate and set the new min/max value
are stored and reused.- 0010: regression tests
This patch provides regression tests for IVM.
- 0011: documentation
This patch provides documantation for IVM.
---------------------------------------------------------------------------------------
* Changes from the Previous Version (v27)- Allow TRUNCATE on base tables
When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.- Fix bugs reported by huyajun [1]
[1] /messages/by-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
---------------------------------------------------------------------------------------
* Discussion** Aggregate support
There were a few suggestions that general aggregate functions should be
supported [2][3], which may be possible by extending pg_aggregate catalog.
However, we decided to leave supporting general aggregates to the future work [4]
because it would need substantial works and make the patch more complex and
bigger.There has been no opposite opinion on this. However, if we need more discussion
on the design of aggregate support, we can omit aggregate support for the first
release of IVM.[2] /messages/by-id/20191128140333.GA25947@alvherre.pgsql
[3] /messages/by-id/CAM-w4HOvDrL4ou6m=592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg@mail.gmail.com
[4] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Hidden columns
In order to support DISTINCT or aggregates, our implementation uses hidden columns.
Columns starting with "__ivm_" are hidden columns that doesn't appear when a
view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is
fixed. There was a proposal to enable hidden columns by adding a new flag to
pg_attribute [5], but this thread is no longer active, so we decided to check
the hidden column by its name [6].[5] /messages/by-id/CAEepm=3ZHh=p0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA@mail.gmail.com
[6] /messages/by-id/20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp** Concurrent Transactions
When the view definition has more than one table, we acquire an exclusive
lock before the view maintenance in order to avoid inconsistent results.
This behavior was explained in [7]. The lock was improved to use weaker lock
when the view has only one table based on a suggestion from Konstantin Knizhnik [8].
However, due to the implementation that uses ctid for identifying target tuples,
we still have to use an exclusive lock for DELETE and UPDATE.[7] /messages/by-id/20200909092752.c91758a1bec3479668e82643@sraoss.co.jp
[8] /messages/by-id/5663f5f0-48af-686c-bf3c-62d279567e2a@postgrespro.ru** Automatic Index Creation
When a view is created, a unique index is automatically created if
possible, that is, if the view definition query has a GROUP BY or
DISTINCT, or if the view contains all primary key attributes of
its base tables in the target list. It is necessary for efficient
view maintenance. This feature is based on a suggestion from
Konstantin Knizhnik [9].[9] /messages/by-id/89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
** Trigger and Transition Tables
We implemented IVM based on triggers. This is because we want to use
transition tables to extract changes on base tables. Also, there are
other constraint that are using triggers in its implementation, like
foreign references. However, if we can use transition table like feature
without relying triggers, we don't have to insist to use triggers and we
might implement IVM in the executor directly as similar as declarative
partitioning.** Feature to be Supported in the First Release
The current patch-set supports DISTINCT and aggregates for built-in count,
sum, avg, min and max. Do we need all these feature for the first IVM release?
Supporting DISTINCT and aggregates needs discussion on hidden columns, and
for supporting min/max we need to discuss on re-calculation method. Before
handling such relatively advanced feature, maybe, should we focus to design
and implement of the basic feature of IVM?Any suggestion and discussion are welcomed!
Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>The followings are supported in view definition queries:
- SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)Also, a view definition query cannot contain other views, materialized views,
foreign tables, partitioned tables, partitions, VALUES, non-immutable functions,
system columns, or expressions that contains aggregates.Does this also apply to tableoid? but tableoid is a constant, so it
should be fine?
can following two queries apply to this feature.
select tableoid, unique1 from tenk1;Currently, this is not allowed because tableoid is a system column.
As you say, tableoid is a constant, so we can allow. Should we do this?select 1 as constant, unique1 from tenk1;
This is allowed, of course.
I didn't apply the patch.(will do later, for someone to test, it would
be a better idea to dump a whole file separately....).Thank you! I'm looking forward to your feedback.
(I didn't attach a whole patch separately because I wouldn't like
cfbot to be unhappy...)Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>I played around first half of regress patch.
these all following queries fails.CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a;CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a;so the hidden column reserved pattern "__ivm_count.*"? that would be a lot....
select * from pg_matviews where matviewname = 'mv_ivm_1';
don't have relisivm option. it's reasonable to make it in view pg_matviews?another trivial:
incremental_matview.out (last few lines) last transaction seems to
need COMMIT command.
Thank you for pointing out it.
There is a unnecessary BEGIN, so I'll remove it.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 29 Jun 2023 18:51:06 +0800
jian he <jian.universality@gmail.com> wrote:
I cannot build the doc.
git clean -fdx
git am ~/Desktop/tmp/*.patchApplying: Add a syntax to create Incrementally Maintainable Materialized Views
Applying: Add relisivm column to pg_class system catalog
Applying: Allow to prolong life span of transition tables until transaction end
Applying: Add Incremental View Maintenance support to pg_dump
Applying: Add Incremental View Maintenance support to psql
Applying: Add Incremental View Maintenance support
Applying: Add DISTINCT support for IVM
Applying: Add aggregates support in IVM
Applying: Add support for min/max aggregates for IVM
Applying: Add regression tests for Incremental View Maintenance
Applying: Add documentations about Incremental View Maintenance
.git/rebase-apply/patch:79: trailing whitespace.
clause.
warning: 1 line adds whitespace errors.Because of this, the {ninja docs} command failed. ERROR message:
[6/6] Generating doc/src/sgml/html with a custom command
FAILED: doc/src/sgml/html
/usr/bin/python3
../../Desktop/pg_sources/main/postgres/doc/src/sgml/xmltools_dep_wrapper
--targetname doc/src/sgml/html --depfile doc/src/sgml/html.d --tool
/usr/bin/xsltproc -- -o doc/src/sgml/ --nonet --stringparam pg.version
16beta2 --path doc/src/sgml --path
../../Desktop/pg_sources/main/postgres/doc/src/sgml
../../Desktop/pg_sources/main/postgres/doc/src/sgml/stylesheet.xsl
doc/src/sgml/postgres-full.xml
ERROR: id attribute missing on <sect2> element under /book[@id =
'postgres']/part[@id = 'server-programming']/chapter[@id =
'rules']/sect1[@id = 'rules-ivm']
error: file doc/src/sgml/postgres-full.xml
xsltRunStylesheet : run failed
ninja: build stopped: subcommand failed.
Thank your for pointing out this.
I'll add ids for all sections to suppress the errors.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Fri, 30 Jun 2023 08:00:00 +0800
jian he <jian.universality@gmail.com> wrote:
Hi there.
in v28-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
I don't know how to set psql to get the output
"Incremental view maintenance: yes"
This information will appear when you use "d+" command for an
incrementally maintained materialized view.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Sun, 2 Jul 2023 08:25:12 +0800
jian he <jian.universality@gmail.com> wrote:
This is probably not trivial.
In function apply_new_delta_with_count.appendStringInfo(&querybuf,
"WITH updt AS (" /* update a tuple if this exists in the view */
"UPDATE %s AS mv SET %s = mv.%s OPERATOR(pg_catalog.+) diff.%s "
"%s " /* SET clauses for aggregates */
"FROM %s AS diff "
"WHERE %s " /* tuple matching condition */
"RETURNING %s" /* returning keys of updated tuples */
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "
"WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE %s);",---------------------
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "the INSERT INTO line, should have one white space in the end?
also "existw" should be "exists"
Yes, we should need a space although it works. I'll fix as well as the typo.
Thank you.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Sun, 2 Jul 2023 10:38:20 +0800
jian he <jian.universality@gmail.com> wrote:
ok. Now I really found a small bug.
this works as intended:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as
min_j FROM mv_base_a group by 1;
INSERT INTO mv_base_a select 1,-2 where false;
rollback;however the following one:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as
min_j FROM mv_base_a;
INSERT INTO mv_base_a select 1, -2 where false;
rollback;will evaluate
tuplestore_tuple_count(new_tuplestores) to 1, it will walk through
IVM_immediate_maintenance function to apply_delta.
but should it be zero?
This is not a bug because an aggregate without GROUP BY always
results one row whose value is NULL.
The contents of test_imv1 would be always same as " SELECT MIN(j) as min_j
FROM mv_base_a;", isn't it?
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Mon, 28 Aug 2023 02:49:08 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Sun, 2 Jul 2023 10:38:20 +0800
jian he <jian.universality@gmail.com> wrote:
I attahed the patches v29 updated to comments from jian he.
The changes from the previous includes:
- errors in documentations is fixed.
- remove unnecessary BEGIN from the test
- add isimmv column to pg_matviews system view
- fix a typo
- rebase to the master branch
ok. Now I really found a small bug.
this works as intended:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as
min_j FROM mv_base_a group by 1;
INSERT INTO mv_base_a select 1,-2 where false;
rollback;however the following one:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as
min_j FROM mv_base_a;
INSERT INTO mv_base_a select 1, -2 where false;
rollback;will evaluate
tuplestore_tuple_count(new_tuplestores) to 1, it will walk through
IVM_immediate_maintenance function to apply_delta.
but should it be zero?This is not a bug because an aggregate without GROUP BY always
results one row whose value is NULL.The contents of test_imv1 would be always same as " SELECT MIN(j) as min_j
FROM mv_base_a;", isn't it?Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v29-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patchtext/x-diff; name=v29-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patchDownload+23-12
v29-0002-Add-relisivm-column-to-pg_class-system-catalog.patchtext/x-diff; name=v29-0002-Add-relisivm-column-to-pg_class-system-catalog.patchDownload+36-1
v29-0003-Allow-to-prolong-life-span-of-transition-tables-.patchtext/x-diff; name=v29-0003-Allow-to-prolong-life-span-of-transition-tables-.patchDownload+81-5
v29-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patchtext/x-diff; name=v29-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patchDownload+31-4
v29-0005-Add-Incremental-View-Maintenance-support-to-psql.patchtext/x-diff; name=v29-0005-Add-Incremental-View-Maintenance-support-to-psql.patchDownload+40-7
v29-0006-Add-Incremental-View-Maintenance-support.patchtext/x-diff; name=v29-0006-Add-Incremental-View-Maintenance-support.patchDownload+2139-36
v29-0007-Add-DISTINCT-support-for-IVM.patchtext/x-diff; name=v29-0007-Add-DISTINCT-support-for-IVM.patchDownload+317-46
v29-0008-Add-aggregates-support-in-IVM.patchtext/x-diff; name=v29-0008-Add-aggregates-support-in-IVM.patchDownload+661-38
v29-0009-Add-support-for-min-max-aggregates-for-IVM.patchtext/x-diff; name=v29-0009-Add-support-for-min-max-aggregates-for-IVM.patchDownload+680-10
v29-0010-Add-regression-tests-for-Incremental-View-Mainte.patchtext/x-diff; name=v29-0010-Add-regression-tests-for-Incremental-View-Mainte.patchDownload+1565-2
v29-0011-Add-documentations-about-Incremental-View-Mainte.patchtext/x-diff; name=v29-0011-Add-documentations-about-Incremental-View-Mainte.patchDownload+583-5
On Mon, 28 Aug 2023 11:52:52 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
On Mon, 28 Aug 2023 02:49:08 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:On Sun, 2 Jul 2023 10:38:20 +0800
jian he <jian.universality@gmail.com> wrote:I attahed the patches v29 updated to comments from jian he.
The changes from the previous includes:- errors in documentations is fixed.
- remove unnecessary BEGIN from the test
- add isimmv column to pg_matviews system view
- fix a typo
- rebase to the master branch
I found pg_dump test was broken, so attached the fixed version.
Regards,
Yugo Nagata
ok. Now I really found a small bug.
this works as intended:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as
min_j FROM mv_base_a group by 1;
INSERT INTO mv_base_a select 1,-2 where false;
rollback;however the following one:
BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as
min_j FROM mv_base_a;
INSERT INTO mv_base_a select 1, -2 where false;
rollback;will evaluate
tuplestore_tuple_count(new_tuplestores) to 1, it will walk through
IVM_immediate_maintenance function to apply_delta.
but should it be zero?This is not a bug because an aggregate without GROUP BY always
results one row whose value is NULL.The contents of test_imv1 would be always same as " SELECT MIN(j) as min_j
FROM mv_base_a;", isn't it?Regards,
Yugo Nagata--
Yugo NAGATA <nagata@sraoss.co.jp>--
Yugo NAGATA <nagata@sraoss.co.jp>
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v29-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patchtext/x-diff; name=v29-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patchDownload+23-12
v29-0002-Add-relisivm-column-to-pg_class-system-catalog.patchtext/x-diff; name=v29-0002-Add-relisivm-column-to-pg_class-system-catalog.patchDownload+36-1
v29-0003-Allow-to-prolong-life-span-of-transition-tables-.patchtext/x-diff; name=v29-0003-Allow-to-prolong-life-span-of-transition-tables-.patchDownload+81-5
v29-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patchtext/x-diff; name=v29-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patchDownload+35-4
v29-0005-Add-Incremental-View-Maintenance-support-to-psql.patchtext/x-diff; name=v29-0005-Add-Incremental-View-Maintenance-support-to-psql.patchDownload+40-7
v29-0006-Add-Incremental-View-Maintenance-support.patchtext/x-diff; name=v29-0006-Add-Incremental-View-Maintenance-support.patchDownload+2139-36
v29-0007-Add-DISTINCT-support-for-IVM.patchtext/x-diff; name=v29-0007-Add-DISTINCT-support-for-IVM.patchDownload+317-46
v29-0008-Add-aggregates-support-in-IVM.patchtext/x-diff; name=v29-0008-Add-aggregates-support-in-IVM.patchDownload+661-38
v29-0009-Add-support-for-min-max-aggregates-for-IVM.patchtext/x-diff; name=v29-0009-Add-support-for-min-max-aggregates-for-IVM.patchDownload+680-10
v29-0010-Add-regression-tests-for-Incremental-View-Mainte.patchtext/x-diff; name=v29-0010-Add-regression-tests-for-Incremental-View-Mainte.patchDownload+1564-2
v29-0011-Add-documentations-about-Incremental-View-Mainte.patchtext/x-diff; name=v29-0011-Add-documentations-about-Incremental-View-Mainte.patchDownload+583-5
hi
based on v29.
based on https://stackoverflow.com/a/4014981/1560347:
I added a new function append_update_set_caluse, and deleted
functions: {append_set_clause_for_count, append_set_clause_for_sum,
append_set_clause_for_avg, append_set_clause_for_minmax}
I guess this way is more extensible/generic than yours.
replaced the following code with the generic function: append_update_set_caluse.
+ /* For views with aggregates, we need to build SET clause for
updating aggregate
+ * values. */
+ if (query->hasAggs && IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ const char *aggname = get_func_name(aggref->aggfnoid);
+
+ /*
+ * We can use function names here because it is already checked if these
+ * can be used in IMMV by its OID at the definition time.
+ */
+
+ /* count */
+ if (!strcmp(aggname, "count"))
+ append_set_clause_for_count(resname, aggs_set_old, aggs_set_new,
aggs_list_buf);
+
+ /* sum */
+ else if (!strcmp(aggname, "sum"))
+ append_set_clause_for_sum(resname, aggs_set_old, aggs_set_new, aggs_list_buf);
+
+ /* avg */
+ else if (!strcmp(aggname, "avg"))
+ append_set_clause_for_avg(resname, aggs_set_old, aggs_set_new, aggs_list_buf,
+ format_type_be(aggref->aggtype));
+
+ else
+ elog(ERROR, "unsupported aggregate function: %s", aggname);
+ }
----------------------<<<
attached is my refactor. there is some whitespace errors in the
patches, you need use
git apply --reject --whitespace=fix
basedon_v29_matview_c_refactor_update_set_clause.patch
Also you patch cannot use git apply, i finally found out bulk apply
using gnu patch from
https://serverfault.com/questions/102324/apply-multiple-patch-files.
previously I just did it manually one by one.
I think if you use { for i in $PATCHES/v29*.patch; do patch -p1 < $i;
done } GNU patch, it will generate an .orig file for every modified
file?
-----------------<<<<<
src/backend/commands/matview.c
2268: /* For tuple deletion */
maybe "/* For tuple deletion and update*/" is more accurate?
-----------------<<<<<
currently at here: src/test/regress/sql/incremental_matview.sql
98: -- support SUM(), COUNT() and AVG() aggregate functions
99: BEGIN;
100: CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i,
SUM(j), COUNT(i), AVG(j) FROM mv_base_a GROUP BY i;
101: SELECT * FROM mv_ivm_agg ORDER BY 1,2,3,4;
102: INSERT INTO mv_base_a VALUES(2,100);
src/backend/commands/matview.c
2858: if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT)
2859: elog(ERROR, "SPI_exec failed: %s", querybuf.data);
then I debug, print out querybuf.data:
WITH updt AS (UPDATE public.mv_ivm_agg AS mv SET __ivm_count__ =
mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ , sum =
(CASE WHEN mv.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 AND
diff.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.sum
IS NULL THEN diff.sum WHEN diff.sum IS NULL THEN mv.sum ELSE (mv.sum
OPERATOR(pg_catalog.+) diff.sum) END), __ivm_count_sum__ =
(mv.__ivm_count_sum__ OPERATOR(pg_catalog.+) diff.__ivm_count_sum__),
count = (mv.count OPERATOR(pg_catalog.+) diff.count), avg = (CASE WHEN
mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND
diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN
mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN
diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE
(mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+)
diff.__ivm_sum_avg__)::numeric END) OPERATOR(pg_catalog./)
(mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__),
__ivm_sum_avg__ = (CASE WHEN mv.__ivm_count_avg__
OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__
OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL
THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN
mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+)
diff.__ivm_sum_avg__) END), __ivm_count_avg__ = (mv.__ivm_count_avg__
OPERATOR(pg_catalog.+) diff.__ivm_count_avg__) FROM new_delta AS diff
WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i
IS NULL)) RETURNING mv.i) INSERT INTO public.mv_ivm_agg (i, sum,
count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__,
__ivm_count__) SELECT i, sum, count, avg, __ivm_count_sum__,
__ivm_count_avg__, __ivm_sum_avg__, __ivm_count__ FROM new_delta AS
diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.i
OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL)));
At this final SPI_exec, we have a update statement with related
columns { __ivm_count_sum__, sum, __ivm_count__, count, avg,
__ivm_sum_avg__, __ivm_count_avg__}. At this time, my mind stops
working, querybuf.data is way too big, but I still feel like there is
some logic associated with these columns, maybe we can use it as an
assertion to prove that this query (querybuf.len = 1834) is indeed
correct.
Since the apply delta query is quite complex, I feel like adding some
"if debug then print out the final querybuf.data end if" would be a
good idea.
we add hidden columns somewhere, also to avoid corner cases, so maybe
somewhere we should assert total attribute number is sane.