Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

Started by Bharath Rupireddyabout 5 years ago15 messages
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
1 attachment(s)

Hi,

I think we can pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() for
refresh mat view so that parallelism can be considered for the SELECT
part of the previously created mat view. The refresh mat view queries
can be faster in cases where SELECT is parallelized.

Attaching a small patch. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-Allow-parallel-mode-in-REFRESH-MAT-VIEW-planning.patchapplication/octet-stream; name=v1-0001-Allow-parallel-mode-in-REFRESH-MAT-VIEW-planning.patchDownload
From e7931a514b01d5ff768d61e7bcd4326cadd35e0e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 1 Dec 2020 16:57:51 +0530
Subject: [PATCH v1] Allow parallel mode in REFRESH MAT VIEW planning

For REFRESH MATERIALIZED VIEW, pass CURSOR_OPT_PARALLEL_OK in
pg_plan_query() so that parallelism can be considered.
---
 src/backend/commands/matview.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index cfc63915f3..9bd977d2c7 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -402,7 +402,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	CHECK_FOR_INTERRUPTS();
 
 	/* Plan the query which will generate data for the refresh. */
-	plan = pg_plan_query(query, queryString, 0, NULL);
+	plan = pg_plan_query(query, queryString, CURSOR_OPT_PARALLEL_OK, NULL);
 
 	/*
 	 * Use a snapshot with an updated command ID to ensure this query sees
-- 
2.25.1

#2Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#1)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Tue, Dec 1, 2020 at 5:34 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

I think we can pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() for
refresh mat view so that parallelism can be considered for the SELECT
part of the previously created mat view. The refresh mat view queries
can be faster in cases where SELECT is parallelized.

Attaching a small patch. Thoughts?

Added this to commitfest, in case it is useful -
https://commitfest.postgresql.org/31/2856/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#3Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Bharath Rupireddy (#2)
RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

Hi

Added this to commitfest, in case it is useful -
https://commitfest.postgresql.org/31/2856/

I have an issue about the safety of enable parallel select.

I checked the [parallel insert into select] patch.
https://commitfest.postgresql.org/31/2844/
It seems parallel select is not allowed when target table is temporary table.

+	/*
+	 * We can't support table modification in parallel-mode if it's a foreign
+	 * table/partition (no FDW API for supporting parallel access) or a
+	 * temporary table.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+		RelationUsesLocalBuffers(rel))
+	{
+		table_close(rel, lockmode);
+		context->max_hazard = PROPARALLEL_UNSAFE;
+		return context->max_hazard;
+	}

For Refresh MatView.
if CONCURRENTLY is specified, It will builds new data in temp tablespace:
if (concurrent)
{
tableSpace = GetDefaultTablespace(RELPERSISTENCE_TEMP, false);
relpersistence = RELPERSISTENCE_TEMP;
}

For the above case, should we still consider parallelism ?

Best regards,
houzj

#4Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Hou, Zhijie (#3)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Tue, Dec 22, 2020 at 4:53 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

I have an issue about the safety of enable parallel select.

I checked the [parallel insert into select] patch.
https://commitfest.postgresql.org/31/2844/
It seems parallel select is not allowed when target table is temporary table.

+       /*
+        * We can't support table modification in parallel-mode if it's a foreign
+        * table/partition (no FDW API for supporting parallel access) or a
+        * temporary table.
+        */
+       if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+               RelationUsesLocalBuffers(rel))
+       {
+               table_close(rel, lockmode);
+               context->max_hazard = PROPARALLEL_UNSAFE;
+               return context->max_hazard;
+       }

For Refresh MatView.
if CONCURRENTLY is specified, It will builds new data in temp tablespace:
if (concurrent)
{
tableSpace = GetDefaultTablespace(RELPERSISTENCE_TEMP, false);
relpersistence = RELPERSISTENCE_TEMP;
}

For the above case, should we still consider parallelism ?

Thanks for taking a look at the patch.

The intention of the patch is to just enable the parallel mode while
planning the select part of the materialized view, but the insertions
do happen in the leader backend itself. That way even if there's
temporary tablespace gets created, we have no problem.

This patch can be thought as a precursor to parallelizing inserts in
refresh matview. I'm thinking to follow the design of parallel inserts
in ctas [1]/messages/by-id/CALj2ACWbQ95gS0z1viQC3qFVnMGAz7dcLjno9GdZv+u9RAU9eQ@mail.gmail.com i.e. pushing the dest receiver down to the workers once
that gets reviewed and finalized. At that time, we should take care of
not pushing inserts down to workers if temporary tablespace gets
created.

In summary, as far as this patch is considered we don't have any
problem with temporary tablespace getting created with CONCURRENTLY
option.

I'm planning to add a few test cases to cover this patch in
matview.sql and post a v2 patch soon.

[1]: /messages/by-id/CALj2ACWbQ95gS0z1viQC3qFVnMGAz7dcLjno9GdZv+u9RAU9eQ@mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#5Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Bharath Rupireddy (#4)
RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

Thanks for taking a look at the patch.

The intention of the patch is to just enable the parallel mode while planning
the select part of the materialized view, but the insertions do happen in
the leader backend itself. That way even if there's temporary tablespace
gets created, we have no problem.

This patch can be thought as a precursor to parallelizing inserts in refresh
matview. I'm thinking to follow the design of parallel inserts in ctas [1]
i.e. pushing the dest receiver down to the workers once that gets reviewed
and finalized. At that time, we should take care of not pushing inserts
down to workers if temporary tablespace gets created.

In summary, as far as this patch is considered we don't have any problem
with temporary tablespace getting created with CONCURRENTLY option.

I'm planning to add a few test cases to cover this patch in matview.sql
and post a v2 patch soon.

Thanks for your explanation!
You are right that temporary tablespace does not affect current patch.

For the testcase:
I noticed that you have post a mail about add explain support for REFRESH MATERIALIZED VIEW.
Do you think we can combine these two features in one thread ?

Personally, The testcase with explain info will be better.

Best regards,
houzj

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Hou, Zhijie (#5)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Wed, Dec 23, 2020 at 9:14 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

Thanks for taking a look at the patch.

The intention of the patch is to just enable the parallel mode while planning
the select part of the materialized view, but the insertions do happen in
the leader backend itself. That way even if there's temporary tablespace
gets created, we have no problem.

This patch can be thought as a precursor to parallelizing inserts in refresh
matview. I'm thinking to follow the design of parallel inserts in ctas [1]
i.e. pushing the dest receiver down to the workers once that gets reviewed
and finalized. At that time, we should take care of not pushing inserts
down to workers if temporary tablespace gets created.

In summary, as far as this patch is considered we don't have any problem
with temporary tablespace getting created with CONCURRENTLY option.

I'm planning to add a few test cases to cover this patch in matview.sql
and post a v2 patch soon.

Thanks for your explanation!
You are right that temporary tablespace does not affect current patch.

For the testcase:
I noticed that you have post a mail about add explain support for REFRESH MATERIALIZED VIEW.
Do you think we can combine these two features in one thread ?

Yeah, it is at [1]/messages/by-id/CALj2ACU71s91G1EOzo-Xx7Z4mvF0dKq-mYeP5u4nikJWxPNRSA@mail.gmail.com and on some initial analysis ISTM that
explain/explain analyze RMV requires a good amount of code
rearrangement in ExecRefreshMatView(). IMO these two features can be
kept separate because they serve different purposes.

[1]: /messages/by-id/CALj2ACU71s91G1EOzo-Xx7Z4mvF0dKq-mYeP5u4nikJWxPNRSA@mail.gmail.com

Personally, The testcase with explain info will be better.

Yeah without explain analyze we can not show whether the parallelism
is picked in the test cases. What we could do is that we can add a
plain RMV test case in write_parallel.sql after CMV so that at least
we can be ensured that the parallelism will be picked because of the
enforcement there. We can always see the parallelism for the select
part of explain analyze CMV in write_parallel.sql and the same select
query gets planned even in RMV cases.

IMO, the patch in this thread can go with test case addition to
write_parallel.sql. since it is very small.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#7Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Bharath Rupireddy (#6)
RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

Yeah without explain analyze we can not show whether the parallelism is
picked in the test cases. What we could do is that we can add a plain RMV
test case in write_parallel.sql after CMV so that at least we can be ensured
that the parallelism will be picked because of the enforcement there. We
can always see the parallelism for the select part of explain analyze CMV
in write_parallel.sql and the same select query gets planned even in RMV
cases.

IMO, the patch in this thread can go with test case addition to
write_parallel.sql. since it is very small.

Thoughts?

Yes, agreed.

Best regards,
houzj

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Hou, Zhijie (#7)
1 attachment(s)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Wed, Dec 30, 2020 at 8:03 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

Yeah without explain analyze we can not show whether the parallelism is
picked in the test cases. What we could do is that we can add a plain RMV
test case in write_parallel.sql after CMV so that at least we can be ensured
that the parallelism will be picked because of the enforcement there. We
can always see the parallelism for the select part of explain analyze CMV
in write_parallel.sql and the same select query gets planned even in RMV
cases.

IMO, the patch in this thread can go with test case addition to
write_parallel.sql. since it is very small.

Thoughts?

Yes, agreed.

Thanks. Added the test case. Attaching v2 patch. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v2-0001-Allow-parallel-mode-in-REFRESH-MAT-VIEW-planning.patchapplication/x-patch; name=v2-0001-Allow-parallel-mode-in-REFRESH-MAT-VIEW-planning.patchDownload
From 0d8a3bff5bf35c200bc319eb24151f297044b86a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 30 Dec 2020 09:05:34 +0530
Subject: [PATCH v2] Allow parallel mode in REFRESH MAT VIEW planning

For REFRESH MATERIALIZED VIEW, pass CURSOR_OPT_PARALLEL_OK in
pg_plan_query() so that parallelism can be considered.
---
 src/backend/commands/matview.c               | 2 +-
 src/test/regress/expected/write_parallel.out | 4 ++++
 src/test/regress/sql/write_parallel.sql      | 4 ++++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index cfc63915f3..9bd977d2c7 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -402,7 +402,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	CHECK_FOR_INTERRUPTS();
 
 	/* Plan the query which will generate data for the refresh. */
-	plan = pg_plan_query(query, queryString, 0, NULL);
+	plan = pg_plan_query(query, queryString, CURSOR_OPT_PARALLEL_OK, NULL);
 
 	/*
 	 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/test/regress/expected/write_parallel.out b/src/test/regress/expected/write_parallel.out
index 0c4da2591a..c4f1f1d18a 100644
--- a/src/test/regress/expected/write_parallel.out
+++ b/src/test/regress/expected/write_parallel.out
@@ -60,6 +60,10 @@ explain (costs off) create materialized view parallel_mat_view as
 
 create materialized view parallel_mat_view as
     select length(stringu1) from tenk1 group by length(stringu1);
+-- Allow parallel planning of the underlying query for refresh materialized
+-- view. We can be ensured that parallelism will be picked because of the
+-- enforcement done at the beginning of the test.
+refresh materialized view parallel_mat_view;
 drop materialized view parallel_mat_view;
 prepare prep_stmt as select length(stringu1) from tenk1 group by length(stringu1);
 explain (costs off) create table parallel_write as execute prep_stmt;
diff --git a/src/test/regress/sql/write_parallel.sql b/src/test/regress/sql/write_parallel.sql
index 78b479cedf..db77533e14 100644
--- a/src/test/regress/sql/write_parallel.sql
+++ b/src/test/regress/sql/write_parallel.sql
@@ -32,6 +32,10 @@ explain (costs off) create materialized view parallel_mat_view as
     select length(stringu1) from tenk1 group by length(stringu1);
 create materialized view parallel_mat_view as
     select length(stringu1) from tenk1 group by length(stringu1);
+-- Allow parallel planning of the underlying query for refresh materialized
+-- view. We can be ensured that parallelism will be picked because of the
+-- enforcement done at the beginning of the test.
+refresh materialized view parallel_mat_view;
 drop materialized view parallel_mat_view;
 
 prepare prep_stmt as select length(stringu1) from tenk1 group by length(stringu1);
-- 
2.25.1

#9Luc Vlaming
luc@swarm64.com
In reply to: Bharath Rupireddy (#8)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On 30-12-2020 04:49, Bharath Rupireddy wrote:

On Wed, Dec 30, 2020 at 8:03 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

Yeah without explain analyze we can not show whether the parallelism is
picked in the test cases. What we could do is that we can add a plain RMV
test case in write_parallel.sql after CMV so that at least we can be ensured
that the parallelism will be picked because of the enforcement there. We
can always see the parallelism for the select part of explain analyze CMV
in write_parallel.sql and the same select query gets planned even in RMV
cases.

IMO, the patch in this thread can go with test case addition to
write_parallel.sql. since it is very small.

Thoughts?

Yes, agreed.

Thanks. Added the test case. Attaching v2 patch. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Hi,

Looks good to me and a nice simple improvement.

Passes everything according to http://cfbot.cputube.org/ so marked it
therefore as ready for commiter.

Cheers,
Luc

#10Luc Vlaming
luc@swarm64.com
In reply to: Bharath Rupireddy (#8)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

passes according to http://cfbot.cputube.org/

The new status of this patch is: Ready for Committer

#11Thomas Munro
thomas.munro@gmail.com
In reply to: Luc Vlaming (#10)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Mon, Jan 4, 2021 at 9:05 PM Luc Vlaming <luc@swarm64.com> wrote:

The new status of this patch is: Ready for Committer

I think the comments above this might as well be removed, because they
aren't very convincing:

+-- Allow parallel planning of the underlying query for refresh materialized
+-- view. We can be ensured that parallelism will be picked because of the
+-- enforcement done at the beginning of the test.
+refresh materialized view parallel_mat_view;

If you just leave the REFRESH command, at least it'll be exercised,
and I know you have a separate CF entry to add EXPLAIN support for
REFRESH. So I'd just rip these weasel words out and then in a later
commit you can add the EXPLAIN there where it's obviously missing.

While reading some back history, I saw that commit e9baa5e9 introduced
parallelism for CREATE M V, but REFRESH was ripped out of the original
patch by Robert, who said:

The problem with a case like REFRESH MATERIALIZED VIEW is that there's
nothing to prevent something that gets run in the course of the query
from trying to access the view (and the heavyweight lock won't prevent
that, due to group locking). That's probably a stupid thing to do,
but it can't be allowed to break the world. The other cases are safe
from that particular problem because the table doesn't exist yet.

Hmmm.

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Thomas Munro (#11)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Mon, Mar 15, 2021 at 10:39 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Jan 4, 2021 at 9:05 PM Luc Vlaming <luc@swarm64.com> wrote:

While reading some back history, I saw that commit e9baa5e9 introduced
parallelism for CREATE M V, but REFRESH was ripped out of the original
patch by Robert, who said:

The problem with a case like REFRESH MATERIALIZED VIEW is that there's
nothing to prevent something that gets run in the course of the query
from trying to access the view (and the heavyweight lock won't prevent
that, due to group locking). That's probably a stupid thing to do,
but it can't be allowed to break the world. The other cases are safe
from that particular problem because the table doesn't exist yet.

Hmmm.

I am not sure but we might want to add this in comments of the refresh
materialize view function so that it would be easier for future
readers to understand why we have not enabled parallelism for this
case.

--
With Regards,
Amit Kapila.

#13Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Thomas Munro (#11)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Mon, Mar 15, 2021 at 10:38 AM Thomas Munro <thomas.munro@gmail.com> wrote:

While reading some back history, I saw that commit e9baa5e9 introduced
parallelism for CREATE M V, but REFRESH was ripped out of the original
patch by Robert, who said:

The problem with a case like REFRESH MATERIALIZED VIEW is that there's
nothing to prevent something that gets run in the course of the query
from trying to access the view (and the heavyweight lock won't prevent
that, due to group locking). That's probably a stupid thing to do,
but it can't be allowed to break the world. The other cases are safe
from that particular problem because the table doesn't exist yet.

Please correct me if my understanding of the above comment (from the
commit e9baa5e9) is wrong - even if the leader opens the matview
relation in exclusive mode, because of group locking(in case we allow
parallel workers to feed in the data to the new heap that gets created
for RMV, see ExecRefreshMatView->make_new_heap), can other sessions
still access the matview relation with older data?

I performed below testing to prove myself wrong for the above understanding:
session 1:
1) added few rows to the table t1 on which the mv1 is defined;
2) refresh materialized view mv1;

session 2:
1) select count(*) from mv1; ---> this query is blocked until
session 1's step (2) is completed and gives the latest result even if
the underlying data-generating query runs select part in parallel.

Is there anything I'm missing here?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#14Thomas Munro
thomas.munro@gmail.com
In reply to: Bharath Rupireddy (#13)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Mon, Mar 15, 2021 at 8:25 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

The problem with a case like REFRESH MATERIALIZED VIEW is that there's
nothing to prevent something that gets run in the course of the query
from trying to access the view (and the heavyweight lock won't prevent
that, due to group locking). That's probably a stupid thing to do,
but it can't be allowed to break the world. The other cases are safe
from that particular problem because the table doesn't exist yet.

Please correct me if my understanding of the above comment (from the
commit e9baa5e9) is wrong - even if the leader opens the matview
relation in exclusive mode, because of group locking(in case we allow
parallel workers to feed in the data to the new heap that gets created
for RMV, see ExecRefreshMatView->make_new_heap), can other sessions
still access the matview relation with older data?

I performed below testing to prove myself wrong for the above understanding:
session 1:
1) added few rows to the table t1 on which the mv1 is defined;
2) refresh materialized view mv1;

session 2:
1) select count(*) from mv1; ---> this query is blocked until
session 1's step (2) is completed and gives the latest result even if
the underlying data-generating query runs select part in parallel.

Is there anything I'm missing here?

I think he was talking about things like functions that try to access
the mv from inside the same query, in a worker. I haven't figured out
exactly which hazards he meant. I thought about wrong-relfilenode
hazards and combocid hazards, but considering the way this thing
always inserts into a fresh table before performing merge or swap
steps later, I don't yet see why this is different from any other
insert-from-select-with-gather.

With the script below you can reach this error in the leader:

postgres=# refresh materialized view CONCURRENTLY mv;
ERROR: cannot start commands during a parallel operation
CONTEXT: SQL function "crazy"

But that's reachable on master too, even if you change crazy() to do
"select 42 from pg_class limit 1" instead of reading mv, when
performing a CREATE M V without NO DATA. :-( Without parallel
leader participation it runs to completion.

===8<===

set parallel_leader_participation = off;
set parallel_setup_cost = 0;
set min_parallel_table_scan_size = 0;
set parallel_tuple_cost = 0;

drop table if exists t cascade;

create table t (i int);
insert into t select generate_series(1, 100000);
create materialized view mv as select 42::int i;
create or replace function crazy() returns int as $$ select i from mv
limit 1; $$ language sql parallel safe;
drop materialized view mv;
create materialized view mv as select i + crazy() i from t with no data;
create unique index on mv(i);

refresh materialized view mv;
refresh materialized view concurrently mv;

begin;
refresh materialized view mv;
refresh materialized view mv;
commit;

begin;
refresh materialized view concurrently mv;
refresh materialized view concurrently mv;
commit;

===8<===

PS, off-topic observation made while trying to think of ways to break
your patch: I noticed that REFRESH CONCURRENTLY spends a lot of time
in refresh_by_match_merge()'s big FULL JOIN. That is separate from
the view query that you're parallelising here, and is used to perform
the merge between a temporary table and the target MV table. I hacked
the code a bit so that it wasn't scanning a temporary table
(unparallelisable), and tried out the Parallel Hash Full Join patch
which I intend to commit soon. This allowed REFRESH CONCURRENTLY to
complete much faster. Huzzah! Unfortunately that query also does
ORDER BY tid; I guess we could remove that to skip a Sort and use
Gather instead of the more expensive Gather Merge, and hopefully soon
a pushed-down Insert.

#15Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#14)
Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

On Tue, Mar 16, 2021 at 2:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Mar 15, 2021 at 8:25 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

The problem with a case like REFRESH MATERIALIZED VIEW is that there's
nothing to prevent something that gets run in the course of the query
from trying to access the view (and the heavyweight lock won't prevent
that, due to group locking). That's probably a stupid thing to do,
but it can't be allowed to break the world. The other cases are safe
from that particular problem because the table doesn't exist yet.

Please correct me if my understanding of the above comment (from the
commit e9baa5e9) is wrong - even if the leader opens the matview
relation in exclusive mode, because of group locking(in case we allow
parallel workers to feed in the data to the new heap that gets created
for RMV, see ExecRefreshMatView->make_new_heap), can other sessions
still access the matview relation with older data?

I performed below testing to prove myself wrong for the above understanding:
session 1:
1) added few rows to the table t1 on which the mv1 is defined;
2) refresh materialized view mv1;

session 2:
1) select count(*) from mv1; ---> this query is blocked until
session 1's step (2) is completed and gives the latest result even if
the underlying data-generating query runs select part in parallel.

Is there anything I'm missing here?

I think he was talking about things like functions that try to access
the mv from inside the same query, in a worker. I haven't figured out
exactly which hazards he meant. I thought about wrong-relfilenode
hazards and combocid hazards, but considering the way this thing
always inserts into a fresh table before performing merge or swap
steps later, I don't yet see why this is different from any other
insert-from-select-with-gather.

I asked Robert if he had some hazard in mind that we haven't already
discussed here when he wrote that, and didn't recall any. I think
we're OK here.

I added the "concurrently" variant to the regression test, just to get
it exercised too.

The documentation needed a small tweak where we have a list of
data-writing commands that are allowed to use parallelism. That run
of sentences was getting a bit tortured so I converted it into a
bullet list; I hope I didn't upset the documentation style police.

Pushed. Thanks for working on this! This is really going to fly with
INSERT pushdown. The 3 merge queries used by CONCURRENTLY will take
some more work.