[PATCH] Add UPDATE WHERE OFFSET IN clause
Hi!
A couple of years ago I was working on PostgreSQL
features and extensions. Some of them was in ECPG,
mainly in the area of improving Informix compatibility.
One of the patchsets I invented was the cursor readahead
support to improve performance in ECPG which used FETCH N.
This work was published at
https://github.com/zboszor/ecpg-readahead/commits
However, there was one use case for this patchset actually
decreased performance, namely if the cursor was used to
modify rows via UPDATE WHERE CURRENT OF. It was because
the cursor position on the server side was different from
the application so the ECPG readahead code had to use
MOVE then UPDATE WHERE CURRENT OF.
I brewed this idea for a while and yesterday I decided
to do something about it and here's the (admittedly, limited)
result. I added a new PostgreSQL extension to the UPDATE syntax:
UPDATE ... WHERE OFFSET n IN cursor;
This new syntax changes the feature disparity between
FETCH N and UPDATE WHERE CURRENT OF that existed in
PostgreSQL forever.
I only implemented this for cursors that use SELECT FOR UPDATE.
This part was quite straightforward to implement.
The behaviour of this syntax is as follows:
* the offset value may be 0 or negative, since it is a virtual
index into the rows returned by the last FETCH statement
and negative indexes are felt natural relative to the
cursor position and the cursor direction
* for offset value 0, the behaviour is identical to WHERE CURRENT OF
* negative indexes allow UPDATEs on previous rows even if
the cursor reached the end of the result set
I need clues for how to extend this for cursors that aren't
using FOR UPDATE queries, if it's possible at all.
Please, review.
Best regards,
Zoltán Böszörményi
Attachments:
0001-Add-UPDATE-.-WHERE-OFFSET-IN-clause.patchtext/x-patch; charset=UTF-8; name=0001-Add-UPDATE-.-WHERE-OFFSET-IN-clause.patchDownload
From 26967c7510c09fe60d5fe54bd17bae919654c75d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Zolt=C3=A1n=20B=C3=B6sz=C3=B6rm=C3=A9nyi?=
<zboszor@gmail.com>
Date: Sun, 6 Feb 2022 19:47:47 +0100
Subject: [PATCH] Add UPDATE ... WHERE OFFSET IN clause
PostgreSQL has an extension for the cursor FETCH statement
to retrieve multiple rows at once but it only supports updating
the last fetched row via a cursor. This is a feature disparity.
To rectify this, add the UPDATE ... WHERE OFFSET IN clause to
support updating any of the rows retrieved in the last FETCH
statement.
---
doc/src/sgml/ref/update.sgml | 30 +++++++--
src/backend/executor/execCurrent.c | 21 ++++--
src/backend/executor/execMain.c | 2 +-
src/backend/executor/nodeLockRows.c | 8 ++-
src/backend/nodes/copyfuncs.c | 2 +
src/backend/nodes/equalfuncs.c | 2 +
src/backend/nodes/outfuncs.c | 2 +
src/backend/nodes/readfuncs.c | 2 +
src/backend/parser/gram.y | 12 ++++
src/include/nodes/execnodes.h | 10 +--
src/include/nodes/primnodes.h | 2 +
src/test/regress/expected/portals.out | 93 +++++++++++++++++++++++++++
src/test/regress/sql/portals.sql | 47 ++++++++++++++
13 files changed, 213 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3fa54e5f70..5c08eae200 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -28,7 +28,7 @@ UPDATE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [
( <replaceable class="parameter">column_name</replaceable> [, ...] ) = ( <replaceable class="parameter">sub-SELECT</replaceable> )
} [, ...]
[ FROM <replaceable class="parameter">from_item</replaceable> [, ...] ]
- [ WHERE <replaceable class="parameter">condition</replaceable> | WHERE CURRENT OF <replaceable class="parameter">cursor_name</replaceable> ]
+ [ WHERE <replaceable class="parameter">condition</replaceable> | WHERE CURRENT OF <replaceable class="parameter">cursor_name</replaceable> | WHERE OFFSET <replaceable class="parameter">offset</replaceable> OF <replaceable class="parameter">cursor_name</replaceable> ]
[ RETURNING * | <replaceable class="parameter">output_expression</replaceable> [ [ AS ] <replaceable class="parameter">output_name</replaceable> ] [, ...] ]
</synopsis>
</refsynopsisdiv>
@@ -199,11 +199,26 @@ UPDATE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [
<listitem>
<para>
The name of the cursor to use in a <literal>WHERE CURRENT OF</literal>
- condition. The row to be updated is the one most recently fetched
- from this cursor. The cursor must be a non-grouping
- query on the <command>UPDATE</command>'s target table.
- Note that <literal>WHERE CURRENT OF</literal> cannot be
- specified together with a Boolean condition. See
+ or in a <literal>WHERE OFFSET IN</literal> condition.
+ The cursor must be a non-grouping query on the
+ <command>UPDATE</command>'s target table and a
+ <literal>FOR UPDATE</literal> query on the same table to use the
+ <literal>WHERE OFFSET IN</literal> condition
+ </para>
+ <para>
+ The row to be updated for <literal>WHERE CURRENT OF</literal>
+ is the one most recently fetched from this cursor.
+ </para>
+ <para>
+ If multiple rows were fetched in the last <command>FETCH</command>
+ statement, <literal>WHERE OFFSET IN</literal> may be used with a
+ negative or zero offset value relative to the cursor's current position and
+ the direction of the last <command>FETCH</command> statement.
+ Zero offset is identical to <literal>WHERE CURRENT OF</literal>.
+ </para>
+ <para>
+ Note that <literal>WHERE CURRENT OF</literal> and <literal>WHERE OFFSET IN</literal>
+ cannot be specified together with a Boolean condition. See
<xref linkend="sql-declare"/>
for more information about using cursors with
<literal>WHERE CURRENT OF</literal>.
@@ -442,7 +457,8 @@ UPDATE films SET kind = 'Dramatic' WHERE CURRENT OF c_films;
<para>
This command conforms to the <acronym>SQL</acronym> standard, except
- that the <literal>FROM</literal> and <literal>RETURNING</literal> clauses
+ that the <literal>FROM</literal>, <literal>RETURNING</literal> and
+ <literal>WHERE OFFSET IN</literal> clauses
are <productname>PostgreSQL</productname> extensions, as is the ability
to use <literal>WITH</literal> with <command>UPDATE</command>.
</para>
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index b34b180bc4..6912bb2a25 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -131,17 +131,22 @@ execCurrentOf(CurrentOfExpr *cexpr,
* The cursor must have a current result row: per the SQL spec, it's
* an error if not.
*/
- if (portal->atStart || portal->atEnd)
+ if ((portal->atStart || portal->atEnd) && cexpr->cursor_offset == 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_CURSOR_STATE),
errmsg("cursor \"%s\" is not positioned on a row",
cursor_name)));
/* Return the currently scanned TID, if there is one */
- if (ItemPointerIsValid(&(erm->curCtid)))
- {
- *current_tid = erm->curCtid;
- return true;
+ if (erm->curCtidList) {
+ int cursor_offset = list_length(erm->curCtidList) - 1 + cexpr->cursor_offset + ((portal->atStart || portal->atEnd) ? 1 : 0);
+ ListCell *ctidCell = NULL;
+ if (cursor_offset >= 0 && cursor_offset < list_length(erm->curCtidList))
+ ctidCell = list_nth_cell(erm->curCtidList, cursor_offset);
+ if (ctidCell && ItemPointerIsValid((ItemPointer)(ctidCell->ptr_value))) {
+ *current_tid = *(ItemPointer)(ctidCell->ptr_value);
+ return true;
+ }
}
/*
@@ -161,6 +166,12 @@ execCurrentOf(CurrentOfExpr *cexpr,
ScanState *scanstate;
bool pending_rescan = false;
+ if (cexpr->cursor_is_offsetof)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_CURSOR_STATE),
+ errmsg("cursor \"%s\" must use FOR UPDATE to use WHERE OFFSET IN",
+ cursor_name)));
+
scanstate = search_plan_tree(queryDesc->planstate, table_oid,
&pending_rescan);
if (!scanstate)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 549d9eb696..58c4da0cdc 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -879,7 +879,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
erm->strength = rc->strength;
erm->waitPolicy = rc->waitPolicy;
erm->ermActive = false;
- ItemPointerSetInvalid(&(erm->curCtid));
+ erm->curCtidList = NIL;
erm->ermExtra = NULL;
Assert(erm->rti > 0 && erm->rti <= estate->es_range_table_size &&
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 1a9dab25dd..a25c193d33 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,6 +79,7 @@ lnext:
Datum datum;
bool isNull;
ItemPointerData tid;
+ ItemPointer *tid_ptr;
TM_FailureData tmfd;
LockTupleMode lockmode;
int lockflags = 0;
@@ -107,7 +108,8 @@ lnext:
{
/* this child is inactive right now */
erm->ermActive = false;
- ItemPointerSetInvalid(&(erm->curCtid));
+ list_free_deep(erm->curCtidList);
+ erm->curCtidList = NIL;
ExecClearTuple(markSlot);
continue;
}
@@ -249,7 +251,9 @@ lnext:
}
/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
- erm->curCtid = tid;
+ tid_ptr = palloc(sizeof(ItemPointerData));
+ memcpy(tid_ptr, &tid, sizeof(ItemPointerData));
+ erm->curCtidList = lappend(erm->curCtidList, tid_ptr);
}
/*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6bd95bbce2..e800a1c3ba 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2178,6 +2178,8 @@ _copyCurrentOfExpr(const CurrentOfExpr *from)
COPY_SCALAR_FIELD(cvarno);
COPY_STRING_FIELD(cursor_name);
COPY_SCALAR_FIELD(cursor_param);
+ COPY_SCALAR_FIELD(cursor_offset);
+ COPY_SCALAR_FIELD(cursor_is_offsetof);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 4126516222..1c9aa6ae27 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -756,6 +756,8 @@ _equalCurrentOfExpr(const CurrentOfExpr *a, const CurrentOfExpr *b)
COMPARE_SCALAR_FIELD(cvarno);
COMPARE_STRING_FIELD(cursor_name);
COMPARE_SCALAR_FIELD(cursor_param);
+ COMPARE_SCALAR_FIELD(cursor_offset);
+ COMPARE_SCALAR_FIELD(cursor_is_offsetof);
return true;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6bdad462c7..41c6d10ba4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1668,6 +1668,8 @@ _outCurrentOfExpr(StringInfo str, const CurrentOfExpr *node)
WRITE_UINT_FIELD(cvarno);
WRITE_STRING_FIELD(cursor_name);
WRITE_INT_FIELD(cursor_param);
+ WRITE_INT_FIELD(cursor_offset);
+ WRITE_BOOL_FIELD(cursor_is_offsetof);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..f953f406d5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1269,6 +1269,8 @@ _readCurrentOfExpr(void)
READ_UINT_FIELD(cvarno);
READ_STRING_FIELD(cursor_name);
READ_INT_FIELD(cursor_param);
+ READ_INT_FIELD(cursor_offset);
+ READ_BOOL_FIELD(cursor_is_offsetof);
READ_DONE();
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c4f3242506..ad14e62f78 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -12754,6 +12754,18 @@ where_or_current_clause:
/* cvarno is filled in by parse analysis */
n->cursor_name = $4;
n->cursor_param = 0;
+ n->cursor_is_offsetof = false;
+ n->cursor_offset = 0;
+ $$ = (Node *) n;
+ }
+ | WHERE OFFSET SignedIconst IN_P cursor_name
+ {
+ CurrentOfExpr *n = makeNode(CurrentOfExpr);
+ /* cvarno is filled in by parse analysis */
+ n->cursor_name = $5;
+ n->cursor_param = 0;
+ n->cursor_is_offsetof = true;
+ n->cursor_offset = $3;
$$ = (Node *) n;
}
| /*EMPTY*/ { $$ = NULL; }
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index dd95dc40c7..65ffc149cd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -664,10 +664,10 @@ typedef struct EState
* subqueries-in-FROM will have an ExecRowMark with relation == NULL. See
* PlanRowMark for details about most of the fields. In addition to fields
* directly derived from PlanRowMark, we store an activity flag (to denote
- * inactive children of inheritance trees), curCtid, which is used by the
- * WHERE CURRENT OF code, and ermExtra, which is available for use by the plan
- * node that sources the relation (e.g., for a foreign table the FDW can use
- * ermExtra to hold information).
+ * inactive children of inheritance trees), curCtidList, which is used by the
+ * WHERE CURRENT OF/WHERE OFFSET IN code, and ermExtra, which is available
+ * for use by the plan node that sources the relation (e.g., for a foreign
+ * table the FDW can use ermExtra to hold information).
*
* EState->es_rowmarks is an array of these structs, indexed by RT index,
* with NULLs for irrelevant RT indexes. es_rowmarks itself is NULL if
@@ -684,7 +684,7 @@ typedef struct ExecRowMark
LockClauseStrength strength; /* LockingClause's strength, or LCS_NONE */
LockWaitPolicy waitPolicy; /* NOWAIT and SKIP LOCKED */
bool ermActive; /* is this mark relevant for current tuple? */
- ItemPointerData curCtid; /* ctid of currently locked tuple, if any */
+ List *curCtidList; /* list of ItemPointerData, ctid of currently locked tuple(s), if any */
void *ermExtra; /* available for use by relation source node */
} ExecRowMark;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index dab5c4ff5d..399b127e5f 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1363,6 +1363,8 @@ typedef struct CurrentOfExpr
Index cvarno; /* RT index of target relation */
char *cursor_name; /* name of referenced cursor, or NULL */
int cursor_param; /* refcursor parameter number, or 0 */
+ int cursor_offset;
+ bool cursor_is_offsetof;
} CurrentOfExpr;
/*
diff --git a/src/test/regress/expected/portals.out b/src/test/regress/expected/portals.out
index 9da74876e1..8ce430cd47 100644
--- a/src/test/regress/expected/portals.out
+++ b/src/test/regress/expected/portals.out
@@ -1536,3 +1536,96 @@ fetch backward all in c2;
(3 rows)
rollback;
+-- Check WHERE OFFSET IN
+-- The cursor ends with no current row
+begin;
+create table cursor (id serial, t text);
+insert into cursor (t) values ('a'),('b'),('c');
+declare c1 cursor for select * from cursor order by id for update;
+fetch all from c1;
+ id | t
+----+---
+ 1 | a
+ 2 | b
+ 3 | c
+(3 rows)
+
+update cursor set t='cc' where offset -1 in c1;
+update cursor set t='bb' where offset -2 in c1;
+update cursor set t='aa' where offset -3 in c1;
+select * from cursor order by id;
+ id | t
+----+----
+ 1 | aa
+ 2 | bb
+ 3 | cc
+(3 rows)
+
+update cursor set t='xx' where offset 0 in c1;
+ERROR: cursor "c1" is not positioned on a row
+rollback;
+-- The cursor ends with current row
+begin;
+create table cursor (id serial, t text);
+insert into cursor (t) values ('a'),('b'),('c');
+declare c1 cursor for select * from cursor order by id for update;
+fetch 3 from c1;
+ id | t
+----+---
+ 1 | a
+ 2 | b
+ 3 | c
+(3 rows)
+
+update cursor set t='cc' where offset 0 in c1;
+update cursor set t='bb' where offset -1 in c1;
+update cursor set t='aa' where offset -2 in c1;
+select * from cursor order by id;
+ id | t
+----+----
+ 1 | aa
+ 2 | bb
+ 3 | cc
+(3 rows)
+
+rollback;
+-- Invalid offsets
+begin;
+create table cursor (id serial, t text);
+insert into cursor (t) values ('a'),('b'),('c');
+declare c1 cursor for select * from cursor order by id for update;
+fetch 3 from c1;
+ id | t
+----+---
+ 1 | a
+ 2 | b
+ 3 | c
+(3 rows)
+
+update cursor set t='cc' where offset 100 in c1;
+update cursor set t='bb' where offset -100 in c1;
+select * from cursor order by id;
+ id | t
+----+---
+ 1 | a
+ 2 | b
+ 3 | c
+(3 rows)
+
+rollback;
+-- Query without FOR UPDATE
+begin;
+create table cursor (id serial, t text);
+insert into cursor (t) values ('a'),('b'),('c');
+declare c1 cursor for select * from cursor order by id;
+fetch 3 from c1;
+ id | t
+----+---
+ 1 | a
+ 2 | b
+ 3 | c
+(3 rows)
+
+update cursor set t='cc' where offset 0 in c1;
+ERROR: cursor "c1" must use FOR UPDATE to use WHERE OFFSET IN
+rollback;
diff --git a/src/test/regress/sql/portals.sql b/src/test/regress/sql/portals.sql
index eadf6ed942..1947826f73 100644
--- a/src/test/regress/sql/portals.sql
+++ b/src/test/regress/sql/portals.sql
@@ -581,3 +581,50 @@ declare c2 scroll cursor for select generate_series(1,3) as g;
fetch all in c2;
fetch backward all in c2;
rollback;
+
+-- Check WHERE OFFSET IN
+
+-- The cursor ends with no current row
+begin;
+create table cursor (id serial, t text);
+insert into cursor (t) values ('a'),('b'),('c');
+declare c1 cursor for select * from cursor order by id for update;
+fetch all from c1;
+update cursor set t='cc' where offset -1 in c1;
+update cursor set t='bb' where offset -2 in c1;
+update cursor set t='aa' where offset -3 in c1;
+select * from cursor order by id;
+update cursor set t='xx' where offset 0 in c1;
+rollback;
+
+-- The cursor ends with current row
+begin;
+create table cursor (id serial, t text);
+insert into cursor (t) values ('a'),('b'),('c');
+declare c1 cursor for select * from cursor order by id for update;
+fetch 3 from c1;
+update cursor set t='cc' where offset 0 in c1;
+update cursor set t='bb' where offset -1 in c1;
+update cursor set t='aa' where offset -2 in c1;
+select * from cursor order by id;
+rollback;
+
+-- Invalid offsets
+begin;
+create table cursor (id serial, t text);
+insert into cursor (t) values ('a'),('b'),('c');
+declare c1 cursor for select * from cursor order by id for update;
+fetch 3 from c1;
+update cursor set t='cc' where offset 100 in c1;
+update cursor set t='bb' where offset -100 in c1;
+select * from cursor order by id;
+rollback;
+
+-- Query without FOR UPDATE
+begin;
+create table cursor (id serial, t text);
+insert into cursor (t) values ('a'),('b'),('c');
+declare c1 cursor for select * from cursor order by id;
+fetch 3 from c1;
+update cursor set t='cc' where offset 0 in c1;
+rollback;
--
2.34.1
Hi!
On 02/07/22 00:59, Böszörményi Zoltán wrote:
UPDATE ... WHERE OFFSET n IN cursor;
If added to UPDATE, should this be added to DELETE also?
Regards,
-Chap
Chapman Flack <chap@anastigmatix.net> writes:
On 02/07/22 00:59, Böszörményi Zoltán wrote:
UPDATE ... WHERE OFFSET n IN cursor;
If added to UPDATE, should this be added to DELETE also?
FWIW, I think this is a really horrid hack. For one thing, it's not
robust against not-strictly-linear FETCH/MOVE of the cursor. It looks
to me like "OFFSET n" really means "the row that we read N reads ago",
not "the row N before the current cursor position". I see that the
documentation does explain it that way, but it fails to account for
optimizations such as whether we implement moves by reading backwards
or rewind-and-read-forwards. I don't think we want to expose that
sort of implementation detail.
I'm also pretty displeased with causing unbounded memory consumption for
every use of nodeLockRows, whether it has anything to do with a cursor or
not (never mind whether the cursor will ever be used for WHERE OFFSET IN).
Yeah, it's only a few bytes per row, but that will add up in queries that
process lots of rows.
regards, tom lane
2022. 02. 08. 2:05 keltezéssel, Tom Lane írta:
Chapman Flack <chap@anastigmatix.net> writes:
On 02/07/22 00:59, Böszörményi Zoltán wrote:
UPDATE ... WHERE OFFSET n IN cursor;
If added to UPDATE, should this be added to DELETE also?
Yes, it should be added, too.
FWIW, I think this is a really horrid hack.
Thanks for your kind words. :-D
For one thing, it's not
robust against not-strictly-linear FETCH/MOVE of the cursor. It looks
to me like "OFFSET n" really means "the row that we read N reads ago",
not "the row N before the current cursor position". I see that the
documentation does explain it that way, but it fails to account for
optimizations such as whether we implement moves by reading backwards
or rewind-and-read-forwards. I don't think we want to expose that
sort of implementation detail.I'm also pretty displeased with causing unbounded memory consumption for
every use of nodeLockRows, whether it has anything to do with a cursor or
not (never mind whether the cursor will ever be used for WHERE OFFSET IN).
Yeah, it's only a few bytes per row, but that will add up in queries that
process lots of rows.
Does PostgreSQL have SQL hints now? I.e. some kind of "pragma"
parsed from SQL comments to indicate the subsequent usage pattern?
Such a hint would allow using either storing the single row information
for DELETE/UPDATE or the list.
Dumping the list to a disk file will be added later so memory
usage is not unbounded.
I was just testing the waters for the idea.
Best regards,
Zoltán Böszörményi
Show quoted text
regards, tom lane