Skip Orderby Execution for Materialized Views
Hi, all
When create or refresh a Materialized View, if the view’s query has order by, we may sort and insert the sorted data into view.
Create Materialized View mv1 as select c1, c2 from t1 order by c2;
Refresh Materialized View mv1;
And it appears that we could get ordered data when select from Materialized View;
Select * from mv1;
But it’s not true if we use other access methods, or we choose a parallel seqscan plan.
A non-parallel seqscan on heap table appears ordered as we always create new rel file and swap them, in my opinion, it’s more like a free lunch.
So, conclusion1: We couldn’t rely on the `ordered-data` even the mv’s sql has order by clause, is it right?
And if it’s true, shall we skip the order by clause for Materialized View when executing create/refresh statement?
Materialized View’s order by clause could be skipped if
1. Order by clause is on the top query level
2. There is no real limit clause
The benefit is the query may be speeded up without sort nodes each time creating/refreshing Materialized View.
A simple results:
create table t1 as select i as c1 , i/2 as c2 , i/5 as c3 from generate_series(1, 100000) i;
create materialized view mvt1_order as select c1, c2, c3 from t1 order by c2, c3, c1 asc
Without this patch:
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 228.548 ms
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 230.374 ms
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 217.079 ms
With this patch:
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 192.409 ms
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 204.398 ms
zml=# refresh materialized view mvt1_order;
REFRESH MATERIALIZED VIEW
Time: 197.510 ms
Zhang Mingli
www.hashdata.xyz
Attachments:
v1-01-Skip-Orderby-clause-execution-for-Materialized-Views.patchapplication/octet-streamDownload
From 98eac30e75e806e4f8daafdf2a041abf1fc678b7 Mon Sep 17 00:00:00 2001
From: Zhang Mingli <avamingli@gmail.com>
Date: Sun, 1 Oct 2023 22:24:49 +0800
Subject: [PATCH] Skip Orderby clause execution for Materialized Views
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The order is not guaranteed even if materialized view's sql has
order by clause.
Materialized view’s order by clause could be skipped when execution
if order by clause is on the top query level and there is no real
limit clause.
This will speed up for REFRESH and CREATE AS on materialized views.
Authored-by: Zhang Mingli avamingli@gmail.com
---
src/backend/commands/createas.c | 4 ++++
src/backend/commands/matview.c | 5 +++++
2 files changed, 9 insertions(+)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..35846b4ac6 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -50,6 +50,7 @@
#include "utils/rel.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
+#include "optimizer/planner.h"
typedef struct
{
@@ -141,6 +142,9 @@ create_ctas_internal(List *attrList, IntoClause *into)
Query *query = (Query *) copyObject(into->viewQuery);
StoreViewQuery(intoRelationAddr.objectId, query, false);
+ /* No need to sort for materialized view */
+ if(!limit_needed(query))
+ query->sortClause = NULL;
CommandCounterIncrement();
}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..3b9ebc0077 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -45,6 +45,7 @@
#include "utils/rel.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
+#include "optimizer/planner.h"
typedef struct
@@ -392,6 +393,10 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
elog(ERROR, "unexpected rewrite result for REFRESH MATERIALIZED VIEW");
query = (Query *) linitial(rewritten);
+ /* No need to sort for materialized view */
+ if(!limit_needed(query))
+ query->sortClause = NULL;
+
/* Check for user-requested abort. */
CHECK_FOR_INTERRUPTS();
--
2.36.1
Import Notes
Reply to msg id not found: eb3700e8-f29b-4c34-a3b4-19ae898f31c7@SparkReference msg id not found: eb3700e8-f29b-4c34-a3b4-19ae898f31c7@Spark
Zhang Mingli <zmlpostgres@gmail.com> writes:
When create or refresh a Materialized View, if the view’s query has order by, we may sort and insert the sorted data into view.
Indeed.
And if it’s true, shall we skip the order by clause for Materialized View when executing create/refresh statement?
No. The intent of a materialized view is to execute the query
as presented. If the user doesn't understand the consequences
of that, it's not our job to think we are smarter than they are.
I think this patch should be rejected.
BTW, I'm pretty certain that this patch breaks some valid cases
even if we take your point of view as correct. For one example,
you can't just remove the sort clause if the query uses DISTINCT ON.
regards, tom lane
HI,
On Oct 1, 2023, at 22:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
For one example,
you can't just remove the sort clause if the query uses DISTINCT ON
Hi, Tom, got it, thanks,
Zhang Mingli
HashData https://www.hashdata.xyz
On Sun, Oct 1, 2023 at 8:57 AM Zhang Mingli <zmlpostgres@gmail.com> wrote:
And if it’s true, shall we skip the order by clause for Materialized
View when executing create/refresh statement?
We tend to do precisely what the user writes into their query. If they
don't want an order by they can remove it. I don't see any particular
reason we should be second-guessing them here. And what makes the trade-off
worse is the reasonable expectation that we'd provide a way to force an
ordering of the inserts should the user actually want it after we defaulted
to ignoring that part of their query.
But yes, you are correct that adding an order by to a materialized view is
typically pointless. To the extent it is detrimental varies since even
partially ordered results can save on processing time.
David J.