Existence check for suitable index in advance when concurrently refreshing.

Started by Masahiko Sawadaalmost 10 years ago9 messages
#1Masahiko Sawada
sawada.mshk@gmail.com
1 attachment(s)

Hi all,

In concurrently refreshing materialized view, we check whether that
materialized view has suitable index(unique and not having WHERE
condition), after filling data to new snapshot
(refresh_matview_datafill()).
This logic leads to taking a lot of time until postgres returns ERROR
log if that table doesn't has suitable index and table is large. it
wastes time.
I think we should check whether that materialized view can use
concurrently refreshing or not in advance.
The patch is attached.

Please give me feedbacks.

--
Regards,

--
Masahiko Sawada

Attachments:

matview_concurrently_refresh_check_index_v1.patchbinary/octet-stream; name=matview_concurrently_refresh_check_index_v1.patchDownload
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 869c586..31af8e1 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -216,6 +216,43 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 			 "the rule for materialized view \"%s\" is not a single action",
 			 RelationGetRelationName(matviewRel));
 
+	/* When CONCURRENTLY refresh, we need at least one unique index. */
+	if (concurrent)
+	{
+		List		*indexoidlist = RelationGetIndexList(matviewRel);
+		ListCell 	*indexoidscan;
+		bool		hasUniqueIndex = false;
+
+		foreach(indexoidscan, indexoidlist)
+		{
+			Oid			indexoid = lfirst_oid(indexoidscan);
+			Relation	indexRel;
+			Form_pg_index	indexStruct;
+
+			indexRel = index_open(indexoid, RowExclusiveLock);
+			indexStruct = indexRel->rd_index;
+
+			if (indexStruct->indisunique &&
+				IndexIsValid(indexStruct) &&
+				RelationGetIndexExpressions(indexRel) == NIL &&
+				RelationGetIndexPredicate(indexRel) == NIL)
+				hasUniqueIndex = true;
+
+			index_close(indexRel, RowExclusiveLock);
+		}
+
+		list_free(indexoidlist);
+
+		if (!hasUniqueIndex)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("cannot refresh materialized view \"%s\" concurrently",
+							quote_qualified_identifier(get_namespace_name(RelationGetNamespace(matviewRel)),
+																		  RelationGetRelationName(matviewRel))),
+					 errhint("Create a unique index with no WHERE clause on one or more columns of the materialized view.")));
+	}
+
+
 	/*
 	 * The stored query was rewritten at the time of the MV definition, but
 	 * has not been scribbled on by the planner.
@@ -695,12 +732,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 
 	list_free(indexoidlist);
 
-	if (!foundUniqueIndex)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-			   errmsg("cannot refresh materialized view \"%s\" concurrently",
-					  matviewname),
-				 errhint("Create a unique index with no WHERE clause on one or more columns of the materialized view.")));
+	/* Must have at least one unique index */
+	Assert(foundUniqueIndex);
 
 	appendStringInfoString(&querybuf,
 						   " AND newdata OPERATOR(pg_catalog.*=) mv) "
#2Fujii Masao
masao.fujii@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Existence check for suitable index in advance when concurrently refreshing.

On Tue, Jan 26, 2016 at 9:33 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

In concurrently refreshing materialized view, we check whether that
materialized view has suitable index(unique and not having WHERE
condition), after filling data to new snapshot
(refresh_matview_datafill()).
This logic leads to taking a lot of time until postgres returns ERROR
log if that table doesn't has suitable index and table is large. it
wastes time.
I think we should check whether that materialized view can use
concurrently refreshing or not in advance.

+1

The patch is attached.

Please give me feedbacks.

+ indexRel = index_open(indexoid, RowExclusiveLock);

Can we use AccessShareLock here, instead?

+            if (indexStruct->indisunique &&
+                IndexIsValid(indexStruct) &&
+                RelationGetIndexExpressions(indexRel) == NIL &&
+                RelationGetIndexPredicate(indexRel) == NIL)
+                hasUniqueIndex = true;
+
+            index_close(indexRel, RowExclusiveLock);

In the case where hasUniqueIndex = true, ISTM that we can get out of
the loop immediately just after calling index_close(). No?

+    /* Must have at least one unique index */
+    Assert(foundUniqueIndex);

Can we guarantee that there is at least one valid unique index here?
If yes, it's better to write the comment about that.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Fujii Masao (#2)
1 attachment(s)
Re: Existence check for suitable index in advance when concurrently refreshing.

On Wed, Jan 27, 2016 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Jan 26, 2016 at 9:33 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

In concurrently refreshing materialized view, we check whether that
materialized view has suitable index(unique and not having WHERE
condition), after filling data to new snapshot
(refresh_matview_datafill()).
This logic leads to taking a lot of time until postgres returns ERROR
log if that table doesn't has suitable index and table is large. it
wastes time.
I think we should check whether that materialized view can use
concurrently refreshing or not in advance.

+1

The patch is attached.

Please give me feedbacks.

Thank you for having look at this patch.

+ indexRel = index_open(indexoid, RowExclusiveLock);

Can we use AccessShareLock here, instead?

Yeah, I think we can use it. Fixed.

+            if (indexStruct->indisunique &&
+                IndexIsValid(indexStruct) &&
+                RelationGetIndexExpressions(indexRel) == NIL &&
+                RelationGetIndexPredicate(indexRel) == NIL)
+                hasUniqueIndex = true;
+
+            index_close(indexRel, RowExclusiveLock);

In the case where hasUniqueIndex = true, ISTM that we can get out of
the loop immediately just after calling index_close(). No?

Fixed.

+    /* Must have at least one unique index */
+    Assert(foundUniqueIndex);

Can we guarantee that there is at least one valid unique index here?
If yes, it's better to write the comment about that.

Added.

Attached latest patch. Please review it.

Regards,

--
Masahiko Sawada

Attachments:

matview_concurrently_refresh_check_index_v2.patchbinary/octet-stream; name=matview_concurrently_refresh_check_index_v2.patchDownload
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 869c586..fbf6af8 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -216,6 +216,46 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 			 "the rule for materialized view \"%s\" is not a single action",
 			 RelationGetRelationName(matviewRel));
 
+	/* When CONCURRENTLY refresh, we need at least one unique index. */
+	if (concurrent)
+	{
+		List		*indexoidlist = RelationGetIndexList(matviewRel);
+		ListCell 	*indexoidscan;
+		bool		hasUniqueIndex = false;
+
+		foreach(indexoidscan, indexoidlist)
+		{
+			Oid			indexoid = lfirst_oid(indexoidscan);
+			Relation	indexRel;
+			Form_pg_index	indexStruct;
+
+			indexRel = index_open(indexoid, AccessShareLock);
+			indexStruct = indexRel->rd_index;
+
+			if (indexStruct->indisunique &&
+				IndexIsValid(indexStruct) &&
+				RelationGetIndexExpressions(indexRel) == NIL &&
+				RelationGetIndexPredicate(indexRel) == NIL)
+			{
+				hasUniqueIndex = true;
+				index_close(indexRel, AccessShareLock);
+				break;
+			}
+
+			index_close(indexRel, AccessShareLock);
+		}
+
+		list_free(indexoidlist);
+
+		if (!hasUniqueIndex)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("cannot refresh materialized view \"%s\" concurrently",
+							quote_qualified_identifier(get_namespace_name(RelationGetNamespace(matviewRel)),
+																		  RelationGetRelationName(matviewRel))),
+					 errhint("Create a unique index with no WHERE clause on one or more columns of the materialized view.")));
+	}
+
 	/*
 	 * The stored query was rewritten at the time of the MV definition, but
 	 * has not been scribbled on by the planner.
@@ -523,8 +563,8 @@ mv_GenerateOper(StringInfo buf, Oid opoid)
  * in the face of rows which have at least one NULL value, with all non-NULL
  * columns equal.  The behavior of NULLs on equality tests and on UNIQUE
  * indexes turns out to be quite convenient here; the tests we need to make
- * are consistent with default behavior.  If there is at least one UNIQUE
- * index on the materialized view, we have exactly the guarantee we need.
+ * are consistent with default behavior.  There must be at least one UNIQUE
+ * index on the materialized view, which is the guarantee we need.
  *
  * The temporary table used to hold the diff results contains just the TID of
  * the old record (if matched) and the ROW from the new table as a single
@@ -695,12 +735,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 
 	list_free(indexoidlist);
 
-	if (!foundUniqueIndex)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-			   errmsg("cannot refresh materialized view \"%s\" concurrently",
-					  matviewname),
-				 errhint("Create a unique index with no WHERE clause on one or more columns of the materialized view.")));
+	/* We have guarantee that materialized view has at least one UNIQUE index */
+	Assert(foundUniqueIndex);
 
 	appendStringInfoString(&querybuf,
 						   " AND newdata OPERATOR(pg_catalog.*=) mv) "
#4Fujii Masao
masao.fujii@gmail.com
In reply to: Masahiko Sawada (#3)
1 attachment(s)
Re: Existence check for suitable index in advance when concurrently refreshing.

On Thu, Jan 28, 2016 at 1:01 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 27, 2016 at 4:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Jan 26, 2016 at 9:33 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

In concurrently refreshing materialized view, we check whether that
materialized view has suitable index(unique and not having WHERE
condition), after filling data to new snapshot
(refresh_matview_datafill()).
This logic leads to taking a lot of time until postgres returns ERROR
log if that table doesn't has suitable index and table is large. it
wastes time.
I think we should check whether that materialized view can use
concurrently refreshing or not in advance.

+1

The patch is attached.

Please give me feedbacks.

Thank you for having look at this patch.

+ indexRel = index_open(indexoid, RowExclusiveLock);

Can we use AccessShareLock here, instead?

Yeah, I think we can use it. Fixed.

+            if (indexStruct->indisunique &&
+                IndexIsValid(indexStruct) &&
+                RelationGetIndexExpressions(indexRel) == NIL &&
+                RelationGetIndexPredicate(indexRel) == NIL)
+                hasUniqueIndex = true;
+
+            index_close(indexRel, RowExclusiveLock);

In the case where hasUniqueIndex = true, ISTM that we can get out of
the loop immediately just after calling index_close(). No?

Fixed.

+    /* Must have at least one unique index */
+    Assert(foundUniqueIndex);

Can we guarantee that there is at least one valid unique index here?
If yes, it's better to write the comment about that.

Added.

Attached latest patch. Please review it.

Thanks for updating the patch!
Attached is the updated version of the patch.
I removed unnecessary assertion check and change of source code
that you added, and improved the source comment.
Barring objection, I'll commit this patch.

Regards,

--
Fujii Masao

Attachments:

matview_concurrently_refresh_check_index_v3.patchtext/x-patch; charset=US-ASCII; name=matview_concurrently_refresh_check_index_v3.patchDownload
*** a/src/backend/commands/matview.c
--- b/src/backend/commands/matview.c
***************
*** 217,222 **** ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
--- 217,267 ----
  			 RelationGetRelationName(matviewRel));
  
  	/*
+ 	 * Check that there is a unique index with no WHERE clause on
+ 	 * one or more columns of the materialized view if CONCURRENTLY
+ 	 * is specified.
+ 	 */
+ 	if (concurrent)
+ 	{
+ 		List		*indexoidlist = RelationGetIndexList(matviewRel);
+ 		ListCell 	*indexoidscan;
+ 		bool		hasUniqueIndex = false;
+ 
+ 		foreach(indexoidscan, indexoidlist)
+ 		{
+ 			Oid			indexoid = lfirst_oid(indexoidscan);
+ 			Relation	indexRel;
+ 			Form_pg_index	indexStruct;
+ 
+ 			indexRel = index_open(indexoid, AccessShareLock);
+ 			indexStruct = indexRel->rd_index;
+ 
+ 			if (indexStruct->indisunique &&
+ 				IndexIsValid(indexStruct) &&
+ 				RelationGetIndexExpressions(indexRel) == NIL &&
+ 				RelationGetIndexPredicate(indexRel) == NIL &&
+ 				indexStruct->indnatts > 0)
+ 			{
+ 				hasUniqueIndex = true;
+ 				index_close(indexRel, AccessShareLock);
+ 				break;
+ 			}
+ 
+ 			index_close(indexRel, AccessShareLock);
+ 		}
+ 
+ 		list_free(indexoidlist);
+ 
+ 		if (!hasUniqueIndex)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 					 errmsg("cannot refresh materialized view \"%s\" concurrently",
+ 							quote_qualified_identifier(get_namespace_name(RelationGetNamespace(matviewRel)),
+ 													   RelationGetRelationName(matviewRel))),
+ 					 errhint("Create a unique index with no WHERE clause on one or more columns of the materialized view.")));
+ 	}
+ 
+ 	/*
  	 * The stored query was rewritten at the time of the MV definition, but
  	 * has not been scribbled on by the planner.
  	 */
#5Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#4)
Re: Existence check for suitable index in advance when concurrently refreshing.

On Tue, Feb 9, 2016 at 4:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for updating the patch!
Attached is the updated version of the patch.
I removed unnecessary assertion check and change of source code
that you added, and improved the source comment.
Barring objection, I'll commit this patch.

So, this code basically duplicates what is already in
refresh_by_match_merge to check if there is a UNIQUE index defined. If
we are sure that an error is detected earlier in the code as done in
this patch, wouldn't it be better to replace the error message in
refresh_by_match_merge() by an assertion? Just wondering, I would
think that once this patch is applied the existing error message of
refresh_by_match_merge() is just dead code.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#5)
Re: Existence check for suitable index in advance when concurrently refreshing.

On Tue, Feb 9, 2016 at 9:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Feb 9, 2016 at 4:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for updating the patch!
Attached is the updated version of the patch.
I removed unnecessary assertion check and change of source code
that you added, and improved the source comment.
Barring objection, I'll commit this patch.

So, this code basically duplicates what is already in
refresh_by_match_merge to check if there is a UNIQUE index defined. If
we are sure that an error is detected earlier in the code as done in
this patch, wouldn't it be better to replace the error message in
refresh_by_match_merge() by an assertion?

I'm OK with an assertion if we add source comment about why
refresh_by_match_merge() can always guarantee that there is
a unique index on the matview. Probably it's because the matview
is locked with exclusive lock at the start of ExecRefreshMatView(),
i.e., it's guaranteed that we cannot drop any indexes on the matview
after the first check is passed. Also it might be better to add
another comment about that the caller of refresh_by_match_merge()
must always check that there is a unique index on the matview before
calling that function, just in the case where it's called elsewhere
in the future.

OTOH, I don't think it's not so bad idea to just emit an error, instead.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#6)
Re: Existence check for suitable index in advance when concurrently refreshing.

On Wed, Feb 10, 2016 at 2:21 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Feb 9, 2016 at 9:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Feb 9, 2016 at 4:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for updating the patch!
Attached is the updated version of the patch.
I removed unnecessary assertion check and change of source code
that you added, and improved the source comment.
Barring objection, I'll commit this patch.

So, this code basically duplicates what is already in
refresh_by_match_merge to check if there is a UNIQUE index defined. If
we are sure that an error is detected earlier in the code as done in
this patch, wouldn't it be better to replace the error message in
refresh_by_match_merge() by an assertion?

I'm OK with an assertion if we add source comment about why
refresh_by_match_merge() can always guarantee that there is
a unique index on the matview. Probably it's because the matview
is locked with exclusive lock at the start of ExecRefreshMatView(),
i.e., it's guaranteed that we cannot drop any indexes on the matview
after the first check is passed. Also it might be better to add
another comment about that the caller of refresh_by_match_merge()
must always check that there is a unique index on the matview before
calling that function, just in the case where it's called elsewhere
in the future.

OTOH, I don't think it's not so bad idea to just emit an error, instead.

Sorry, s/it's not/it's

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#7)
Re: Existence check for suitable index in advance when concurrently refreshing.

On Wed, Feb 10, 2016 at 2:23 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Feb 10, 2016 at 2:21 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Feb 9, 2016 at 9:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Feb 9, 2016 at 4:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for updating the patch!
Attached is the updated version of the patch.
I removed unnecessary assertion check and change of source code
that you added, and improved the source comment.
Barring objection, I'll commit this patch.

So, this code basically duplicates what is already in
refresh_by_match_merge to check if there is a UNIQUE index defined. If
we are sure that an error is detected earlier in the code as done in
this patch, wouldn't it be better to replace the error message in
refresh_by_match_merge() by an assertion?

I'm OK with an assertion if we add source comment about why
refresh_by_match_merge() can always guarantee that there is
a unique index on the matview. Probably it's because the matview
is locked with exclusive lock at the start of ExecRefreshMatView(),
i.e., it's guaranteed that we cannot drop any indexes on the matview
after the first check is passed. Also it might be better to add
another comment about that the caller of refresh_by_match_merge()
must always check that there is a unique index on the matview before
calling that function, just in the case where it's called elsewhere
in the future.

OTOH, I don't think it's not so bad idea to just emit an error, instead.

Sorry, s/it's not/it's

Well, refresh_by_match_merge is called only by ExecRefreshMatView()
and it is not exposed externally in matview.h, so it is not like we
are going to break any extension-related code by having an assertion
instead of an error message, and that's less code duplication to
maintain if this extra error message is removed. But feel free to
withdraw my comment if you think that's not worth it, I won't deadly
insist on that either :)
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#8)
Re: Existence check for suitable index in advance when concurrently refreshing.

On Wed, Feb 10, 2016 at 10:35 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Wed, Feb 10, 2016 at 2:23 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Feb 10, 2016 at 2:21 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Feb 9, 2016 at 9:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Feb 9, 2016 at 4:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for updating the patch!
Attached is the updated version of the patch.
I removed unnecessary assertion check and change of source code
that you added, and improved the source comment.
Barring objection, I'll commit this patch.

So, this code basically duplicates what is already in
refresh_by_match_merge to check if there is a UNIQUE index defined. If
we are sure that an error is detected earlier in the code as done in
this patch, wouldn't it be better to replace the error message in
refresh_by_match_merge() by an assertion?

I'm OK with an assertion if we add source comment about why
refresh_by_match_merge() can always guarantee that there is
a unique index on the matview. Probably it's because the matview
is locked with exclusive lock at the start of ExecRefreshMatView(),
i.e., it's guaranteed that we cannot drop any indexes on the matview
after the first check is passed. Also it might be better to add
another comment about that the caller of refresh_by_match_merge()
must always check that there is a unique index on the matview before
calling that function, just in the case where it's called elsewhere
in the future.

OTOH, I don't think it's not so bad idea to just emit an error, instead.

Sorry, s/it's not/it's

Well, refresh_by_match_merge is called only by ExecRefreshMatView()
and it is not exposed externally in matview.h, so it is not like we
are going to break any extension-related code by having an assertion
instead of an error message, and that's less code duplication to
maintain if this extra error message is removed. But feel free to
withdraw my comment if you think that's not worth it, I won't deadly
insist on that either :)

Okay, I revived Sawada's original assertion code and pushed the patch.
Thanks!

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers