Allow DELETE to use ORDER BY and LIMIT/OFFSET
Hello hackers,
We cannot use ORDER BY or LIMIT/OFFSET in the current
DELETE statement syntax, so all the row matching the
WHERE condition are deleted. However, the tuple retrieving
process of DELETE is basically same as SELECT statement,
so I think that we can also allow DELETE to use ORDER BY
and LIMIT/OFFSET.
Attached is the concept patch. This enables the following
operations:
================================================================
postgres=# select * from t order by i;
i
----
1
2
2
2
2
5
10
20
33
35
53
(11 rows)
postgres=# delete from t where i = 2 limit 2;
DELETE 2
postgres=# select * from t order by i;
i
----
1
2
2
5
10
20
33
35
53
(9 rows)
postgres=# delete from t order by i offset 3 limit 3;
DELETE 3
postgres=# select * from t order by i;
i
----
1
2
2
33
35
53
(6 rows)
================================================================
Although we can do the similar operations using ctid and a subquery
such as
DELETE FROM t WHERE ctid IN (SELECT ctid FROM t WHERE ... ORDER BY ... LIMIT ...),
it is more user friendly and intuitive to allow it in the DELETE syntax
because ctid is a system column and most users may not be familiar with it.
Although this is not allowed in the SQL standard, it is supported
in MySQL[1]https://dev.mysql.com/doc/refman/8.0/en/delete.html. DB2 also supports it although the syntax is somewhat
strange.[2]https://www.dba-db2.com/2015/04/delete-first-1000-rows-in-a-db2-table-using-fetch-first.html
Also, here seem to be some use cases. For example,
- when you want to delete the specified number of rows from a table
that doesn't have a primary key and contains tuple duplicated.
- when you want to delete the bottom 10 items with bad scores
(without using rank() window function).
- when you want to delete only some of rows because it takes time
to delete all of them.
[1]: https://dev.mysql.com/doc/refman/8.0/en/delete.html
[2]: https://www.dba-db2.com/2015/04/delete-first-1000-rows-in-a-db2-table-using-fetch-first.html
How do you think it?
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
delete_order_limit.patchtext/x-diff; name=delete_order_limit.patchDownload
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 297b6ee715..c596bd80ea 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3241,6 +3241,11 @@ _copyDeleteStmt(const DeleteStmt *from)
COPY_NODE_FIELD(returningList);
COPY_NODE_FIELD(withClause);
+ COPY_NODE_FIELD(sortClause);
+ COPY_NODE_FIELD(limitOffset);
+ COPY_NODE_FIELD(limitCount);
+ COPY_SCALAR_FIELD(limitOption);
+
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index f537d3eb96..84f509b57b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1050,6 +1050,11 @@ _equalDeleteStmt(const DeleteStmt *a, const DeleteStmt *b)
COMPARE_NODE_FIELD(returningList);
COMPARE_NODE_FIELD(withClause);
+ COMPARE_NODE_FIELD(sortClause);
+ COMPARE_NODE_FIELD(limitOffset);
+ COMPARE_NODE_FIELD(limitCount);
+ COMPARE_SCALAR_FIELD(limitOption);
+
return true;
}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e276264882..8a201749bf 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3668,6 +3668,12 @@ raw_expression_tree_walker(Node *node,
return true;
if (walker(stmt->withClause, context))
return true;
+ if (walker(stmt->sortClause, context))
+ return true;
+ if (walker(stmt->limitOffset, context))
+ return true;
+ if (walker(stmt->limitCount, context))
+ return true;
}
break;
case T_UpdateStmt:
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 146ee8dd1e..015e879f7a 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -471,6 +471,12 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt)
qual = transformWhereClause(pstate, stmt->whereClause,
EXPR_KIND_WHERE, "WHERE");
+ qry->sortClause = transformSortClause(pstate,
+ stmt->sortClause,
+ &qry->targetList,
+ EXPR_KIND_ORDER_BY,
+ false /* allow SQL92 rules */ );
+
qry->returningList = transformReturningList(pstate, stmt->returningList);
/* done building the range table and jointree */
@@ -482,6 +488,15 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt)
qry->hasTargetSRFs = pstate->p_hasTargetSRFs;
qry->hasAggs = pstate->p_hasAggs;
+ /* transform LIMIT */
+ qry->limitOffset = transformLimitClause(pstate, stmt->limitOffset,
+ EXPR_KIND_OFFSET, "OFFSET",
+ stmt->limitOption);
+ qry->limitCount = transformLimitClause(pstate, stmt->limitCount,
+ EXPR_KIND_LIMIT, "LIMIT",
+ stmt->limitOption);
+ qry->limitOption = stmt->limitOption;
+
assign_query_collations(pstate, qry);
/* this must be done after collations, for reliable comparison of exprs */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 86ce33bd97..0c6d11c23c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11207,6 +11207,7 @@ returning_clause:
DeleteStmt: opt_with_clause DELETE_P FROM relation_expr_opt_alias
using_clause where_or_current_clause returning_clause
+ opt_sort_clause opt_select_limit
{
DeleteStmt *n = makeNode(DeleteStmt);
n->relation = $4;
@@ -11214,6 +11215,20 @@ DeleteStmt: opt_with_clause DELETE_P FROM relation_expr_opt_alias
n->whereClause = $6;
n->returningList = $7;
n->withClause = $1;
+
+ n->sortClause = $8;
+ if ($9)
+ {
+ n->limitOffset = $9->limitOffset;
+ n->limitCount = $9->limitCount;
+ if (!n->sortClause &&
+ $9->limitOption == LIMIT_OPTION_WITH_TIES)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("WITH TIES cannot be specified without ORDER BY clause")));
+ n->limitOption = $9->limitOption;
+ }
+
$$ = (Node *)n;
}
;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 067138e6b5..48cf65d032 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1610,6 +1610,11 @@ typedef struct DeleteStmt
Node *whereClause; /* qualifications */
List *returningList; /* list of expressions to return */
WithClause *withClause; /* WITH clause */
+
+ List *sortClause; /* sort clause (a list of SortBy's) */
+ Node *limitOffset; /* # of result tuples to skip */
+ Node *limitCount; /* # of result tuples to return */
+ LimitOption limitOption; /* limit type */
} DeleteStmt;
/* ----------------------
On Fri, 17 Dec 2021 09:47:18 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
Hello hackers,
We cannot use ORDER BY or LIMIT/OFFSET in the current
DELETE statement syntax, so all the row matching the
WHERE condition are deleted. However, the tuple retrieving
process of DELETE is basically same as SELECT statement,
so I think that we can also allow DELETE to use ORDER BY
and LIMIT/OFFSET.Attached is the concept patch. This enables the following
operations:
After post this, I noticed that there are several similar
proposals in past:
/messages/by-id/AANLkTi=6fBZh9yZT7f7kKh+zmQngAyHgZWBPM3eiEMj1@mail.gmail.com
/messages/by-id/1393112801.59251.YahooMailNeo@web163006.mail.bf1.yahoo.com
/messages/by-id/CADB9FDf-Vh6RnKAMZ4Rrg_YP9p3THdPbji8qe4qkxRuiOwm=mg@mail.gmail.com
/messages/by-id/CALAY4q9fcrscybax7fg_uojFwjw_Wg0UMuSrf-FvN68SeSAPAA@mail.gmail.com
Anyway, I'll review these threads before progressing it.
================================================================
postgres=# select * from t order by i;
i
----
1
2
2
2
2
5
10
20
33
35
53
(11 rows)postgres=# delete from t where i = 2 limit 2;
DELETE 2
postgres=# select * from t order by i;
i
----
1
2
2
5
10
20
33
35
53
(9 rows)postgres=# delete from t order by i offset 3 limit 3;
DELETE 3
postgres=# select * from t order by i;
i
----
1
2
2
33
35
53
(6 rows)
================================================================Although we can do the similar operations using ctid and a subquery
such asDELETE FROM t WHERE ctid IN (SELECT ctid FROM t WHERE ... ORDER BY ... LIMIT ...),
it is more user friendly and intuitive to allow it in the DELETE syntax
because ctid is a system column and most users may not be familiar with it.Although this is not allowed in the SQL standard, it is supported
in MySQL[1]. DB2 also supports it although the syntax is somewhat
strange.[2]Also, here seem to be some use cases. For example,
- when you want to delete the specified number of rows from a table
that doesn't have a primary key and contains tuple duplicated.
- when you want to delete the bottom 10 items with bad scores
(without using rank() window function).
- when you want to delete only some of rows because it takes time
to delete all of them.[1] https://dev.mysql.com/doc/refman/8.0/en/delete.html
[2] https://www.dba-db2.com/2015/04/delete-first-1000-rows-in-a-db2-table-using-fetch-first.htmlHow do you think it?
--
Yugo NAGATA <nagata@sraoss.co.jp>
--
Yugo NAGATA <nagata@sraoss.co.jp>
Yugo NAGATA <nagata@sraoss.co.jp> writes:
We cannot use ORDER BY or LIMIT/OFFSET in the current
DELETE statement syntax, so all the row matching the
WHERE condition are deleted. However, the tuple retrieving
process of DELETE is basically same as SELECT statement,
so I think that we can also allow DELETE to use ORDER BY
and LIMIT/OFFSET.
Indeed, this is technically possible, but we've rejected the idea
before and I'm not aware of any reason to change our minds.
The problem is that a partial DELETE is not very deterministic
about which rows are deleted, and that does not seem like a
great property for a data-updating command. (The same applies
to UPDATE, which is why we don't allow these options in that
command either.) The core issues are:
* If the sort order is underspecified, or you omit ORDER BY
entirely, then it's not clear which rows will be operated on.
The LIMIT might stop after just some of the rows in a peer
group, and you can't predict which ones.
* UPDATE/DELETE necessarily involve the equivalent of SELECT
FOR UPDATE, which may cause the rows to be ordered more
surprisingly than you expected, ie the sort happens *before*
rows are replaced by their latest versions, which might have
different sort keys.
We live with this amount of indeterminism in SELECT, but that
doesn't make it a brilliant idea to allow it in UPDATE/DELETE.
regards, tom lane
On Thursday, December 16, 2021, Yugo NAGATA <nagata@sraoss.co.jp> wrote:
Also, here seem to be some use cases. For example,
- when you want to delete the specified number of rows from a table
that doesn't have a primary key and contains tuple duplicated.
Not our problem…use the tools correctly; there is always the hack
work-around for the few who didn’t.
- when you want to delete the bottom 10 items with bad scores
(without using rank() window function).
This one doesn’t make sense to me.
- when you want to delete only some of rows because it takes time
to delete all of them.
This seems potentially compelling though I’d be more concerned about the
memory aspects than simply taking a long amount of time. If this is a
problem then maybe discuss it without having a solution-in-hand? But given
the intense I/O cost that would happen spreading this out over time seems
acceptable and it should be an infrequent thing to do. Expecting users to
plan and execute some custom code for their specific need seems reasonable.
So even if Tom’s technical concerns aren’t enough working on this based
upon these uses cases doesn’t seem of high enough benefit.
David J.
On Thu, 16 Dec 2021 22:17:58 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yugo NAGATA <nagata@sraoss.co.jp> writes:
We cannot use ORDER BY or LIMIT/OFFSET in the current
DELETE statement syntax, so all the row matching the
WHERE condition are deleted. However, the tuple retrieving
process of DELETE is basically same as SELECT statement,
so I think that we can also allow DELETE to use ORDER BY
and LIMIT/OFFSET.Indeed, this is technically possible, but we've rejected the idea
before and I'm not aware of any reason to change our minds.
The problem is that a partial DELETE is not very deterministic
about which rows are deleted, and that does not seem like a
great property for a data-updating command. (The same applies
to UPDATE, which is why we don't allow these options in that
command either.) The core issues are:* If the sort order is underspecified, or you omit ORDER BY
entirely, then it's not clear which rows will be operated on.
The LIMIT might stop after just some of the rows in a peer
group, and you can't predict which ones.* UPDATE/DELETE necessarily involve the equivalent of SELECT
FOR UPDATE, which may cause the rows to be ordered more
surprisingly than you expected, ie the sort happens *before*
rows are replaced by their latest versions, which might have
different sort keys.We live with this amount of indeterminism in SELECT, but that
doesn't make it a brilliant idea to allow it in UPDATE/DELETE.
Thank you for your explaining it!
I'm glad to understand why this idea is not good and has been rejected.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 16 Dec 2021 20:56:59 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:
On Thursday, December 16, 2021, Yugo NAGATA <nagata@sraoss.co.jp> wrote:
Also, here seem to be some use cases. For example,
- when you want to delete the specified number of rows from a table
that doesn't have a primary key and contains tuple duplicated.Not our problem…use the tools correctly; there is always the hack
work-around for the few who didn’t.- when you want to delete the bottom 10 items with bad scores
(without using rank() window function).This one doesn’t make sense to me.
- when you want to delete only some of rows because it takes time
to delete all of them.
This seems potentially compelling though I’d be more concerned about the
memory aspects than simply taking a long amount of time. If this is a
problem then maybe discuss it without having a solution-in-hand? But given
the intense I/O cost that would happen spreading this out over time seems
acceptable and it should be an infrequent thing to do. Expecting users to
plan and execute some custom code for their specific need seems reasonable.So even if Tom’s technical concerns aren’t enough working on this based
upon these uses cases doesn’t seem of high enough benefit.
Thank you for your comments.
Ok. I agree that there are not so strong use cases.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 16 Dec 2021 at 22:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* If the sort order is underspecified, or you omit ORDER BY
entirely, then it's not clear which rows will be operated on.
The LIMIT might stop after just some of the rows in a peer
group, and you can't predict which ones.
Meh, that never seemed very compelling to me. I think that's on the
user and there are *lots* of cases where the user can easily know
enough extra context to know that's what she wants. In particular I've
often wanted to delete one of two identical records and it would have
been really easy to just do a DELETE LIMIT 1. I know there are ways to
do it but it's always seemed like unnecessary hassle when there was a
natural way to express it.
* UPDATE/DELETE necessarily involve the equivalent of SELECT
FOR UPDATE, which may cause the rows to be ordered more
surprisingly than you expected, ie the sort happens *before*
rows are replaced by their latest versions, which might have
different sort keys.
This... is a real issue. Or is it? Just to be clear I think what
you're referring to is a case like:
INSERT INTO t values (1),(2)
In another session: BEGIN; UPDATE t set c=0 where c=2
DELETE FROM t ORDER BY c ASC LIMIT 1
<blocks>
In other session: COMMIT
Which row was deleted? In this case it was the row where c=1. Even
though the UPDATE reported success (i.e. 1 row updated) so presumably
it happened "before" the delete. The delete saw the ordering from
before it was blocked and saw the ordering with c=1, c=2 so apparently
it happened "before" the update.
There are plenty of other ways to see the same surprising
serialization failure. If instead of a DELETE you did an UPDATE there
are any number of functions or other features that have been added
which can expose the order in which the updates happened.
The way to solve those cases would be to use serializable isolation
(or even just repeatable read in this case). Wouldn't that also solve
the DELETE serialization failure too?
--
greg
Hello Greg,
On Fri, 17 Dec 2021 01:40:45 -0500
Greg Stark <stark@mit.edu> wrote:
On Thu, 16 Dec 2021 at 22:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* If the sort order is underspecified, or you omit ORDER BY
entirely, then it's not clear which rows will be operated on.
The LIMIT might stop after just some of the rows in a peer
group, and you can't predict which ones.Meh, that never seemed very compelling to me. I think that's on the
user and there are *lots* of cases where the user can easily know
enough extra context to know that's what she wants. In particular I've
often wanted to delete one of two identical records and it would have
been really easy to just do a DELETE LIMIT 1. I know there are ways to
do it but it's always seemed like unnecessary hassle when there was a
natural way to express it.
Out of curiosity, could you please tell me the concrete situations
where you wanted to delete one of two identical records?
* UPDATE/DELETE necessarily involve the equivalent of SELECT
FOR UPDATE, which may cause the rows to be ordered more
surprisingly than you expected, ie the sort happens *before*
rows are replaced by their latest versions, which might have
different sort keys.This... is a real issue. Or is it? Just to be clear I think what
you're referring to is a case like:INSERT INTO t values (1),(2)
In another session: BEGIN; UPDATE t set c=0 where c=2
DELETE FROM t ORDER BY c ASC LIMIT 1
<blocks>In other session: COMMIT
Which row was deleted? In this case it was the row where c=1. Even
though the UPDATE reported success (i.e. 1 row updated) so presumably
it happened "before" the delete. The delete saw the ordering from
before it was blocked and saw the ordering with c=1, c=2 so apparently
it happened "before" the update.
When I tried it using my patch, the DELETE deletes the row where c=1
as same as above, but it did not block. Is that the result of an
experiment using my patch or other RDBMS like MySQL?
There are plenty of other ways to see the same surprising
serialization failure. If instead of a DELETE you did an UPDATE there
are any number of functions or other features that have been added
which can expose the order in which the updates happened.The way to solve those cases would be to use serializable isolation
(or even just repeatable read in this case). Wouldn't that also solve
the DELETE serialization failure too?
Do you mean such serialization failures would be avoided in
SERIALIZABLE or REPATABLE READ by aborting the transaction, such
serialization failures may be accepted in READ COMMITTED?
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
Out of curiosity, could you please tell me the concrete situations
where you wanted to delete one of two identical records?
In my case, there is a table with known duplicates, and we would like to
delete all but the one with the lowest ctid, and then add a unique index to
the table which then allows us to use INSERT ON CONFLICT in a meaningful
way.
The other need for a DELETE...LIMIT or UPDATE...LIMIT is when you're
worried about flooding a replica, so you parcel out the DML into chunks
that don't cause unacceptable lag on the replica.
Both of these can be accomplished via DELETE FROM foo WHERE ctid IN (
SELECT ... FROM foo ... LIMIT 1000), but until recently such a construct
would result in a full table scan, and you'd take the same hit with each
subsequent DML.
I *believe* that the ctid range scan now can limit those scans, especially
if you can order the limited set by ctid, but those techniques are not
widely known at this time.