INSERT ON CONFLICT and RETURNING

Started by Konstantin Knizhnikover 5 years ago15 messages
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

Hi hackers,

I am sorry for the question which may be already discussed multiple times.
But I have not found answer for it neither in internet neither in
pgsql-hackers archieve.
UPSERT (INSERT ... IN CONFLICT...) clause was added to the Postgres a
long time ago.
As far as I remember there was long discussions about its syntax and
functionality.
But today I found that there is still no way to perform one of the most
frequently needed operation:
locate record by key and return its autogenerated ID or insert new
record if key is absent.

Something like this:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values (?) on conflict(schema) do
nothing returning id;

But it doesn't work because in case of conflict no value is returned.
It is possible to do something like this:

  with ins as (insert into jsonb_schemas (schema) values (obj_schema)
on conflict(schema) do nothing returning id) select coalesce((select id
from ins),(select id from jsonb_schemas where schema=obj_schema));

but it requires extra lookup.
Or perform update:

  insert into jsonb_schemas (schema) values (?) on conflict(schema) do
update set schema=excluded.schema returning id;

But it is even worse because we have to perform useless update and
produce new version.

May be I missing something, but according to stackoverflow:
https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql
there is no better solution.

I wonder how it can happen that such popular use case ia not covered by
Postgresql UPSERT?
Are there some principle problems with it?
Why it is not possible to add one more on-conflict action: SELECT,
making it possible to return data when key is found?

Thanks in advance,
Konstantin

#2Geoff Winkless
pgsqladmin@geoff.dj
In reply to: Konstantin Knizhnik (#1)
Re: INSERT ON CONFLICT and RETURNING

On Sat, 22 Aug 2020 at 08:16, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

It is possible to do something like this:

with ins as (insert into jsonb_schemas (schema) values (obj_schema)
on conflict(schema) do nothing returning id) select coalesce((select id
from ins),(select id from jsonb_schemas where schema=obj_schema));

but it requires extra lookup.

But if

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
ON CONFLICT (schema) DO NOTHING RETURNING id

were to work then that would _also_ require a second lookup, since
"id" is not part of the conflict key that will be used to perform the
existence test, so the only difference is it's hidden by the syntax.

Geoff

#3Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Geoff Winkless (#2)
Re: INSERT ON CONFLICT and RETURNING

On 24.08.2020 13:37, Geoff Winkless wrote:

On Sat, 22 Aug 2020 at 08:16, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

It is possible to do something like this:

with ins as (insert into jsonb_schemas (schema) values (obj_schema)
on conflict(schema) do nothing returning id) select coalesce((select id
from ins),(select id from jsonb_schemas where schema=obj_schema));

but it requires extra lookup.

But if

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
ON CONFLICT (schema) DO NOTHING RETURNING id

were to work then that would _also_ require a second lookup, since
"id" is not part of the conflict key that will be used to perform the
existence test, so the only difference is it's hidden by the syntax.

Geoff

Sorry, I didn't quite understand it.
If we are doing such query:

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id

Then as far as I understand no extra lookup is used to return ID:

 Insert on jsonb_schemas  (cost=0.00..0.01 rows=1 width=36) (actual
time=0.035..0.036 rows=0 loops=1)
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes:jsonb_schemas_schema_key
   Conflict Filter: false
   Rows Removed by Conflict Filter: 1
   Tuples Inserted: 0
   Conflicting Tuples: 1
   ->  Result  (cost=0.00..0.01 rows=1 width=36) (actual
time=0.002..0.002 rows=1 loops=1)
 Planning Time: 0.034 ms
 Execution Time: 0.065 ms
(10 rows)

So if we are able to efficienty execute query above, why we can not
write query:

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
ON CONFLICT (schema) DO SELECT ID RETURNING id

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#1)
1 attachment(s)
Re: INSERT ON CONFLICT and RETURNING

On 22.08.2020 10:16, Konstantin Knizhnik wrote:

Hi hackers,

I am sorry for the question which may be already discussed multiple
times.
But I have not found answer for it neither in internet neither in
pgsql-hackers archieve.
UPSERT (INSERT ... IN CONFLICT...) clause was added to the Postgres a
long time ago.
As far as I remember there was long discussions about its syntax and
functionality.
But today I found that there is still no way to perform one of the
most frequently needed operation:
locate record by key and return its autogenerated ID or insert new
record if key is absent.

Something like this:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values (?) on conflict(schema) do
nothing returning id;

But it doesn't work because in case of conflict no value is returned.
It is possible to do something like this:

  with ins as (insert into jsonb_schemas (schema) values (obj_schema)
on conflict(schema) do nothing returning id) select coalesce((select
id from ins),(select id from jsonb_schemas where schema=obj_schema));

but it requires extra lookup.
Or perform update:

  insert into jsonb_schemas (schema) values (?) on conflict(schema) do
update set schema=excluded.schema returning id;

But it is even worse because we have to perform useless update and
produce new version.

May be I missing something, but according to stackoverflow:
https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql

there is no better solution.

I wonder how it can happen that such popular use case ia not covered
by Postgresql UPSERT?
Are there some principle problems with it?
Why it is not possible to add one more on-conflict action: SELECT,
making it possible to return data when key is found?

Thanks in advance,
Konstantin

I'm sorry for been intrusive.
But can somebody familiar with Postgres upsert mechanism explain me why
current implementation doesn't support very popular use case:
locate record by some unique key and and return its primary
(autogenerated) key if found otherwise insert new tuple.
I have explained the possible workarounds of the problem above.
But all of them looks awful or inefficient.

What I am suggesting is just add ON CONFLICT DO SELECT clause:

insert into jsonb_schemas (schema) values ('one') on conflict(schema) do
select returning id;

I attached small patch with prototype implementation of this construction.
It seems to be very trivial. What's wring with it?
Are there some fundamental problems which I do not understand?

Below is small illustration of how this patch is working:

postgres=# create table jsonb_schemas(id serial, schema bytea primary key);
CREATE TABLE
postgres=# create index on jsonb_schemas(id);
CREATE INDEX
postgres=# insert into jsonb_schemas (schema) values ('some') on
conflict(schema) do nothing returning id;
 id
----
  1
(1 row)

INSERT 0 1
postgres=# insert into jsonb_schemas (schema) values ('some') on
conflict(schema) do nothing returning id;
 id
----
(0 rows)

INSERT 0 0
postgres=# insert into jsonb_schemas (schema) values ('some') on
conflict(schema) do select returning id;
 id
----
  1
(1 row)

INSERT 0 1

Thanks in advance,
Konstantin

Attachments:

on_conflict_do_select.patchtext/x-patch; name=on_conflict_do_select.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c98c9b5..8a22b8c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3755,8 +3755,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 	if (node->onConflictAction != ONCONFLICT_NONE)
 	{
 		ExplainPropertyText("Conflict Resolution",
-							node->onConflictAction == ONCONFLICT_NOTHING ?
-							"NOTHING" : "UPDATE",
+							node->onConflictAction == ONCONFLICT_NOTHING
+							? "NOTHING"
+							: node->onConflictAction == ONCONFLICT_SELECT
+							? "SELECT" : "UPDATE",
 							es);
 
 		/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c47..8e64061 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -545,6 +545,8 @@ ExecInsert(ModifyTableState *mtstate,
 					else
 						goto vlock;
 				}
+				/* committed conflict tuple found */
+
 				else
 				{
 					/*
@@ -558,11 +560,26 @@ ExecInsert(ModifyTableState *mtstate,
 					 * type. As there's no conflicting usage of
 					 * ExecGetReturningSlot() in the DO NOTHING case...
 					 */
-					Assert(onconflict == ONCONFLICT_NOTHING);
+					Assert(onconflict == ONCONFLICT_NOTHING || onconflict == ONCONFLICT_SELECT);
 					ExecCheckTIDVisible(estate, resultRelInfo, &conflictTid,
 										ExecGetReturningSlot(estate, resultRelInfo));
 					InstrCountTuples2(&mtstate->ps, 1);
-					return NULL;
+					if (onconflict == ONCONFLICT_SELECT && resultRelInfo->ri_projectReturning)
+					{
+						TupleTableSlot *existing = resultRelInfo->ri_onConflict->oc_Existing;
+						TM_FailureData tmfd;
+						TM_Result	   test = table_tuple_lock(resultRelInfo->ri_RelationDesc, &conflictTid,
+															   estate->es_snapshot,
+															   existing, estate->es_output_cid,
+															   LockTupleShare, LockWaitBlock, 0,
+															   &tmfd);
+						if (test == TM_Ok)
+						{
+							result = ExecProcessReturning(resultRelInfo, existing, planSlot);
+						}
+						ExecClearTuple(existing);
+					}
+					return result;
 				}
 			}
 
@@ -2542,7 +2559,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * If needed, Initialize target list, projection and qual for ON CONFLICT
 	 * DO UPDATE.
 	 */
-	if (node->onConflictAction == ONCONFLICT_UPDATE)
+	if (node->onConflictAction == ONCONFLICT_UPDATE || node->onConflictAction == ONCONFLICT_SELECT)
 	{
 		ExprContext *econtext;
 		TupleDesc	relationDesc;
@@ -2566,35 +2583,37 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			table_slot_create(resultRelInfo->ri_RelationDesc,
 							  &mtstate->ps.state->es_tupleTable);
 
-		/*
-		 * Create the tuple slot for the UPDATE SET projection. We want a slot
-		 * of the table's type here, because the slot will be used to insert
-		 * into the table, and for RETURNING processing - which may access
-		 * system attributes.
-		 */
-		tupDesc = ExecTypeFromTL((List *) node->onConflictSet);
-		resultRelInfo->ri_onConflict->oc_ProjSlot =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
-								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
-		/* build UPDATE SET projection state */
-		resultRelInfo->ri_onConflict->oc_ProjInfo =
-			ExecBuildProjectionInfo(node->onConflictSet, econtext,
-									resultRelInfo->ri_onConflict->oc_ProjSlot,
-									&mtstate->ps,
-									relationDesc);
-
-		/* initialize state to evaluate the WHERE clause, if any */
-		if (node->onConflictWhere)
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
 		{
-			ExprState  *qualexpr;
+			/*
+			 * Create the tuple slot for the UPDATE SET projection. We want a slot
+			 * of the table's type here, because the slot will be used to insert
+			 * into the table, and for RETURNING processing - which may access
+			 * system attributes.
+			 */
+			tupDesc = ExecTypeFromTL((List *) node->onConflictSet);
+			resultRelInfo->ri_onConflict->oc_ProjSlot =
+				ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
+									   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
+
+			/* build UPDATE SET projection state */
+			resultRelInfo->ri_onConflict->oc_ProjInfo =
+				ExecBuildProjectionInfo(node->onConflictSet, econtext,
+										resultRelInfo->ri_onConflict->oc_ProjSlot,
+										&mtstate->ps,
+										relationDesc);
+
+			/* initialize state to evaluate the WHERE clause, if any */
+			if (node->onConflictWhere)
+			{
+				ExprState  *qualexpr;
 
-			qualexpr = ExecInitQual((List *) node->onConflictWhere,
+				qualexpr = ExecInitQual((List *) node->onConflictWhere,
 									&mtstate->ps);
-			resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
+				resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
+			}
 		}
 	}
-
 	/*
 	 * If we have any secondary relations in an UPDATE or DELETE, they need to
 	 * be treated like non-locked relations in SELECT FOR UPDATE, ie, the
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index dbb47d4..1e815d1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10874,6 +10874,16 @@ opt_on_conflict:
 					$$->whereClause = NULL;
 					$$->location = @1;
 				}
+			|
+			ON CONFLICT opt_conf_expr DO SELECT
+				{
+					$$ = makeNode(OnConflictClause);
+					$$->action = ONCONFLICT_SELECT;
+					$$->infer = $3;
+					$$->targetList = NIL;
+					$$->whereClause = NULL;
+					$$->location = @1;
+				}
 			| /*EMPTY*/
 				{
 					$$ = NULL;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index fe777c3..e96f568 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3612,6 +3612,11 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
 										rt_entry_relation,
 										parsetree->resultRelation);
 			}
+			if (parsetree->onConflict &&
+				parsetree->onConflict->action == ONCONFLICT_SELECT)
+			{
+				parsetree->onConflict->onConflictSet =parsetree->targetList;
+			}
 		}
 		else if (event == CMD_UPDATE)
 		{
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 60dd80c..cd40985 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -6276,6 +6276,10 @@ get_insert_query_def(Query *query, deparse_context *context)
 		{
 			appendStringInfoString(buf, " DO NOTHING");
 		}
+		else if (confl->action == ONCONFLICT_SELECT)
+		{
+			appendStringInfoString(buf, " DO SELECT");
+		}
 		else
 		{
 			appendStringInfoString(buf, " DO UPDATE SET ");
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b..c23d3d2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -823,7 +823,8 @@ typedef enum OnConflictAction
 {
 	ONCONFLICT_NONE,			/* No "ON CONFLICT" clause */
 	ONCONFLICT_NOTHING,			/* ON CONFLICT ... DO NOTHING */
-	ONCONFLICT_UPDATE			/* ON CONFLICT ... DO UPDATE */
+	ONCONFLICT_UPDATE,			/* ON CONFLICT ... DO UPDATE */
+	ONCONFLICT_SELECT			/* ON CONFLICT ... DO SELECT */
 } OnConflictAction;
 
 /*
#5Marko Tiikkaja
marko@joh.to
In reply to: Konstantin Knizhnik (#4)
Re: INSERT ON CONFLICT and RETURNING

There's prior art on this: https://commitfest.postgresql.org/15/1241/

.m

#6Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Marko Tiikkaja (#5)
Re: INSERT ON CONFLICT and RETURNING

On 03.09.2020 19:30, Marko Tiikkaja wrote:

There's prior art on this: https://commitfest.postgresql.org/15/1241/

.m

Ooops:(
Thank you.
I missed it.

But frankly speaking I still didn't find answer for my question in this
thread: what are the dangerous scenarios with ON CONFLICT DO NOTHING/SELECT.
Yes, record is not exclusively locked. But I just want to obtain value
of some column which is not a source of conflict. I do not understand
what can be wrong if some
other transaction changed this column.

And I certainly can't agree with Peter's statement:

Whereas here, with ON CONFLICT DO SELECT,
I see a somewhat greater risk, and a much, much smaller benefit. A
benefit that might actually be indistinguishable from zero.

From my point of view it is quite common use case when we need to
convert some long key to small autogenerated record identifier.
Without UPSERT we have to perform two queries instead of just one . And
even with current implementation of INSERT ON CONFLICT...
we have to either perform extra lookup, either produce new (useless)
tuple version.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#7Geoff Winkless
pgsqladmin@geoff.dj
In reply to: Konstantin Knizhnik (#3)
Re: INSERT ON CONFLICT and RETURNING

On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

If we are doing such query:

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id

Then as far as I understand no extra lookup is used to return ID:

The conflict resolution checks the unique index on (schema) and
decides whether or not a conflict will exist. For DO NOTHING it
doesn't have to get the actual row from the table; however in order
for it to return the ID it would have to go and get the existing row
from the table. That's the "extra lookup", as you term it. The only
difference from doing it with RETURNING id versus WITH... COALESCE()
as you described is the simpler syntax.

I'm not saying the simpler syntax isn't nice, mind you. I was just
pointing out that it's not inherently any less efficient.

Geoff

#8Alexander Korotkov
aekorotkov@gmail.com
In reply to: Geoff Winkless (#7)
Re: INSERT ON CONFLICT and RETURNING

On Thu, Sep 3, 2020 at 7:56 PM Geoff Winkless <pgsqladmin@geoff.dj> wrote:

On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

If we are doing such query:

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id

Then as far as I understand no extra lookup is used to return ID:

The conflict resolution checks the unique index on (schema) and
decides whether or not a conflict will exist. For DO NOTHING it
doesn't have to get the actual row from the table; however in order
for it to return the ID it would have to go and get the existing row
from the table. That's the "extra lookup", as you term it. The only
difference from doing it with RETURNING id versus WITH... COALESCE()
as you described is the simpler syntax.

As I know, conflict resolution still has to fetch heap tuples, see
_bt_check_unique(). As I understand it, the issues are as follows.
1) Conflict resolution uses the dirty snapshot. It's unclear whether
we can return this tuple to the user, because the query has a
different snapshot. Note, that CTE query by Konstantin at thead start
doesn't handle all the cases correctly, it can return no rows on
conflict. We probably should do the trick similar to the EPQ mechanism
for UPDATE. For instance, UPDATE ... RETURNING old.* can return the
tuple, which doesn't match the query snapshot. But INSERT ON CONFLICT
might have other caveats in this area, it needs careful analysis.
2) Checking unique conflicts inside the index am is already the
encapsulation-breaking hack. Returning the heap tuple for index am
would be even worse hack. We probably should refactor this whole area
before.

------
Regards,
Alexander Korotkov

#9Andreas Karlsson
andreas@proxel.se
In reply to: Konstantin Knizhnik (#6)
Re: INSERT ON CONFLICT and RETURNING

On 9/3/20 6:52 PM, Konstantin Knizhnik wrote:

But frankly speaking I still didn't find answer for my question in this
thread: what are the dangerous scenarios with ON CONFLICT DO
NOTHING/SELECT.
Yes, record is not exclusively locked. But I just want to obtain value
of some column which is not a source of conflict. I do not understand
what can be wrong if some
other transaction changed this column.

And I certainly can't agree with Peter's statement:

Whereas here, with ON CONFLICT DO SELECT,
I see a somewhat greater risk, and a much, much smaller benefit. A
benefit that might actually be indistinguishable from zero.

From my point of view it is quite common use case when we need to
convert some long key to small autogenerated record identifier.
Without UPSERT we have to perform two queries instead of just one . And
even with current implementation of INSERT ON CONFLICT...
we have to either perform extra lookup, either produce new (useless)
tuple version.

I have no idea about the potential risks here since I am not very
familiar with the ON CONFLICT code, but I will chime in and agree that
this is indeed a common use case. Selecting and taking a SHARE lock
would also be a nice feature.

Andreas

#10Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Geoff Winkless (#7)
Re: INSERT ON CONFLICT and RETURNING

On 03.09.2020 19:56, Geoff Winkless wrote:

On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

If we are doing such query:

INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id

Then as far as I understand no extra lookup is used to return ID:

The conflict resolution checks the unique index on (schema) and
decides whether or not a conflict will exist. For DO NOTHING it
doesn't have to get the actual row from the table; however in order
for it to return the ID it would have to go and get the existing row
from the table. That's the "extra lookup", as you term it. The only
difference from doing it with RETURNING id versus WITH... COALESCE()
as you described is the simpler syntax.

Sorry, but there is no exrta lookup in this case.
By "lookup" I mean index search.
What we are doing in case ON CONFLICT SELECT is just fetching tuple from
the buffer.
So we are not even loading any data from the disk.

By in case

   with ins as (insert into jsonb_schemas (schema) values (obj_schema)
on conflict(schema) do nothing returning id)
   select coalesce((select id from ins),(select id from jsonb_schemas
where   schema=obj_schema));

we actually execute extra subquery: select id from jsonb_schemas where
schema=obj_schema:

explain with ins as (insert into jsonb_schemas (schema) values ('some')
on conflict(schema) do nothing returning id) select coalesce((select id
from ins),(select id from jsonb_schemas where schema='some'));
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Result  (cost=8.21..8.21 rows=1 width=4)
   CTE ins
     ->  Insert on jsonb_schemas  (cost=0.00..0.01 rows=1 width=36)
           Conflict Resolution: NOTHING
           Conflict Arbiter Indexes: jsonb_schemas_pkey
           ->  Result  (cost=0.00..0.01 rows=1 width=36)
   InitPlan 2 (returns $2)
     ->  CTE Scan on ins  (cost=0.00..0.02 rows=1 width=4)
   InitPlan 3 (returns $3)
     ->  Index Scan using jsonb_schemas_pkey on jsonb_schemas
jsonb_schemas_1  (cost=0.15..8.17 rows=1 width=4)
           Index Cond: (schema = '\x736f6d65'::bytea)

Is it critical?
At my system average time of executing this query is 104 usec, and with
ON CONFLICT SELECT fix - 82 usec.
The difference is no so large, because we in any case insert speculative
tuple.
But it is incorrect to say that "it's not inherently any less efficient."

I'm not saying the simpler syntax isn't nice, mind you. I was just
pointing out that it's not inherently any less efficient.

Geoff

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#11Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Konstantin Knizhnik (#10)
1 attachment(s)
Re: INSERT ON CONFLICT and RETURNING

I have performed comparison of different ways of implementing UPSERT in
Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create unique index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values ('some') on
conflict(schema) do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
  obj_id integer;
begin
  select id from jsonb_schemas where schema=obj_schema into obj_id;
  if obj_id is null then
    insert into jsonb_schemas (schema) values (obj_schema) on
conflict(schema) do nothing returning id into obj_id;
    if obj_id is null then
      select id from jsonb_schemas where schema=obj_schema into obj_id;
    end if;
  end if;
  return obj_id;
end;
$$ language plpgsql;

------------------------
upsert-plpgsql.sql:
select upsert('some');
------------------------
pgbench -n -T 100 -M prepared -f upsert-plpgsql.sql postgres
tps = 45092.241350

2. Use ON CONFLICT DO UPDATE:

upsert-update.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema)
do update set schema='some' returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-update.sql postgres
tps = 9222.344890

3.  Use ON CONFLICT DO NOTHING + COALESCE:

upsert-coalecsce.sql:
with ins as (insert into jsonb_schemas (schema) values ('some') on
conflict(schema) do nothing returning id) select coalesce((select id
from ins),(select id from jsonb_schemas where schema='some'));
------------------------
pgbench -n -T 100 -M prepared -f upsert-coalesce.sql postgres
tps = 28929.353732

4. Use ON CONFLICT DO SELECT

upsert-select.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema)
do select returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-select.sql postgres
ps = 35788.362302

So, as you can see PLpgSQL version, which doesn't modify database if key
is found is signficantly faster than others.
And version which always do update is  almost five times slower!
Proposed version of upsert with ON CONFLICT DO SELECT is slower than
PLpgSQL version (because it has to insert speculative tuple),
but faster than "user-unfriendly" version with COALESCE:

Upsert implementation
TPS
PLpgSQL
45092
ON CONFLICT DO UPDATE 9222
ON CONFLICT DO NOTHING 28929
ON CONFLICT DO SELECT 35788

Slightly modified version of my ON CONFLICT DO SELECT patch is attached
to this mail.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

on_conflict_do_select-2.patchtext/x-patch; name=on_conflict_do_select-2.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c98c9b5..8a22b8c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3755,8 +3755,10 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
 	if (node->onConflictAction != ONCONFLICT_NONE)
 	{
 		ExplainPropertyText("Conflict Resolution",
-							node->onConflictAction == ONCONFLICT_NOTHING ?
-							"NOTHING" : "UPDATE",
+							node->onConflictAction == ONCONFLICT_NOTHING
+							? "NOTHING"
+							: node->onConflictAction == ONCONFLICT_SELECT
+							? "SELECT" : "UPDATE",
 							es);
 
 		/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c47..07cab68 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -545,6 +545,8 @@ ExecInsert(ModifyTableState *mtstate,
 					else
 						goto vlock;
 				}
+				/* committed conflict tuple found */
+
 				else
 				{
 					/*
@@ -558,11 +560,17 @@ ExecInsert(ModifyTableState *mtstate,
 					 * type. As there's no conflicting usage of
 					 * ExecGetReturningSlot() in the DO NOTHING case...
 					 */
-					Assert(onconflict == ONCONFLICT_NOTHING);
+					Assert(onconflict == ONCONFLICT_NOTHING || onconflict == ONCONFLICT_SELECT);
 					ExecCheckTIDVisible(estate, resultRelInfo, &conflictTid,
 										ExecGetReturningSlot(estate, resultRelInfo));
 					InstrCountTuples2(&mtstate->ps, 1);
-					return NULL;
+					if (onconflict == ONCONFLICT_SELECT && resultRelInfo->ri_projectReturning)
+					{
+						TupleTableSlot *existing = resultRelInfo->ri_onConflict->oc_Existing;
+						if (table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc, &conflictTid, SnapshotAny, existing))
+							result = ExecProcessReturning(resultRelInfo, existing, planSlot);
+					}
+					return result;
 				}
 			}
 
@@ -2542,7 +2550,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * If needed, Initialize target list, projection and qual for ON CONFLICT
 	 * DO UPDATE.
 	 */
-	if (node->onConflictAction == ONCONFLICT_UPDATE)
+	if (node->onConflictAction == ONCONFLICT_UPDATE || node->onConflictAction == ONCONFLICT_SELECT)
 	{
 		ExprContext *econtext;
 		TupleDesc	relationDesc;
@@ -2566,35 +2574,37 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 			table_slot_create(resultRelInfo->ri_RelationDesc,
 							  &mtstate->ps.state->es_tupleTable);
 
-		/*
-		 * Create the tuple slot for the UPDATE SET projection. We want a slot
-		 * of the table's type here, because the slot will be used to insert
-		 * into the table, and for RETURNING processing - which may access
-		 * system attributes.
-		 */
-		tupDesc = ExecTypeFromTL((List *) node->onConflictSet);
-		resultRelInfo->ri_onConflict->oc_ProjSlot =
-			ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
-								   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
-
-		/* build UPDATE SET projection state */
-		resultRelInfo->ri_onConflict->oc_ProjInfo =
-			ExecBuildProjectionInfo(node->onConflictSet, econtext,
-									resultRelInfo->ri_onConflict->oc_ProjSlot,
-									&mtstate->ps,
-									relationDesc);
-
-		/* initialize state to evaluate the WHERE clause, if any */
-		if (node->onConflictWhere)
+		if (node->onConflictAction == ONCONFLICT_UPDATE)
 		{
-			ExprState  *qualexpr;
+			/*
+			 * Create the tuple slot for the UPDATE SET projection. We want a slot
+			 * of the table's type here, because the slot will be used to insert
+			 * into the table, and for RETURNING processing - which may access
+			 * system attributes.
+			 */
+			tupDesc = ExecTypeFromTL((List *) node->onConflictSet);
+			resultRelInfo->ri_onConflict->oc_ProjSlot =
+				ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc,
+									   table_slot_callbacks(resultRelInfo->ri_RelationDesc));
+
+			/* build UPDATE SET projection state */
+			resultRelInfo->ri_onConflict->oc_ProjInfo =
+				ExecBuildProjectionInfo(node->onConflictSet, econtext,
+										resultRelInfo->ri_onConflict->oc_ProjSlot,
+										&mtstate->ps,
+										relationDesc);
+
+			/* initialize state to evaluate the WHERE clause, if any */
+			if (node->onConflictWhere)
+			{
+				ExprState  *qualexpr;
 
-			qualexpr = ExecInitQual((List *) node->onConflictWhere,
+				qualexpr = ExecInitQual((List *) node->onConflictWhere,
 									&mtstate->ps);
-			resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
+				resultRelInfo->ri_onConflict->oc_WhereClause = qualexpr;
+			}
 		}
 	}
-
 	/*
 	 * If we have any secondary relations in an UPDATE or DELETE, they need to
 	 * be treated like non-locked relations in SELECT FOR UPDATE, ie, the
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index dbb47d4..1e815d1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10874,6 +10874,16 @@ opt_on_conflict:
 					$$->whereClause = NULL;
 					$$->location = @1;
 				}
+			|
+			ON CONFLICT opt_conf_expr DO SELECT
+				{
+					$$ = makeNode(OnConflictClause);
+					$$->action = ONCONFLICT_SELECT;
+					$$->infer = $3;
+					$$->targetList = NIL;
+					$$->whereClause = NULL;
+					$$->location = @1;
+				}
 			| /*EMPTY*/
 				{
 					$$ = NULL;
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index fe777c3..e96f568 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3612,6 +3612,11 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
 										rt_entry_relation,
 										parsetree->resultRelation);
 			}
+			if (parsetree->onConflict &&
+				parsetree->onConflict->action == ONCONFLICT_SELECT)
+			{
+				parsetree->onConflict->onConflictSet =parsetree->targetList;
+			}
 		}
 		else if (event == CMD_UPDATE)
 		{
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 60dd80c..cd40985 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -6276,6 +6276,10 @@ get_insert_query_def(Query *query, deparse_context *context)
 		{
 			appendStringInfoString(buf, " DO NOTHING");
 		}
+		else if (confl->action == ONCONFLICT_SELECT)
+		{
+			appendStringInfoString(buf, " DO SELECT");
+		}
 		else
 		{
 			appendStringInfoString(buf, " DO UPDATE SET ");
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b..c23d3d2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -823,7 +823,8 @@ typedef enum OnConflictAction
 {
 	ONCONFLICT_NONE,			/* No "ON CONFLICT" clause */
 	ONCONFLICT_NOTHING,			/* ON CONFLICT ... DO NOTHING */
-	ONCONFLICT_UPDATE			/* ON CONFLICT ... DO UPDATE */
+	ONCONFLICT_UPDATE,			/* ON CONFLICT ... DO UPDATE */
+	ONCONFLICT_SELECT			/* ON CONFLICT ... DO SELECT */
 } OnConflictAction;
 
 /*
#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#11)
Re: INSERT ON CONFLICT and RETURNING

út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> napsal:

I have performed comparison of different ways of implementing UPSERT in
Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema)
do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on
conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;

In parallel execution the plpgsql variant can fail. The possible raise
conditions are not handled.

So maybe this is the reason why this is really fast.

Regards

Pavel

Show quoted text

------------------------
upsert-plpgsql.sql:
select upsert('some');
------------------------
pgbench -n -T 100 -M prepared -f upsert-plpgsql.sql postgres
tps = 45092.241350

2. Use ON CONFLICT DO UPDATE:

upsert-update.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do
update set schema='some' returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-update.sql postgres
tps = 9222.344890

3. Use ON CONFLICT DO NOTHING + COALESCE:

upsert-coalecsce.sql:
with ins as (insert into jsonb_schemas (schema) values ('some') on
conflict(schema) do nothing returning id) select coalesce((select id from
ins),(select id from jsonb_schemas where schema='some'));
------------------------
pgbench -n -T 100 -M prepared -f upsert-coalesce.sql postgres
tps = 28929.353732

4. Use ON CONFLICT DO SELECT

upsert-select.sql:
insert into jsonb_schemas (schema) values ('some') on conflict(schema) do
select returning id;
------------------------
pgbench -n -T 100 -M prepared -f upsert-select.sql postgres
ps = 35788.362302

So, as you can see PLpgSQL version, which doesn't modify database if key
is found is signficantly faster than others.
And version which always do update is almost five times slower!
Proposed version of upsert with ON CONFLICT DO SELECT is slower than
PLpgSQL version (because it has to insert speculative tuple),
but faster than "user-unfriendly" version with COALESCE:

Upsert implementation
TPS
PLpgSQL
45092
ON CONFLICT DO UPDATE 9222
ON CONFLICT DO NOTHING 28929
ON CONFLICT DO SELECT 35788

Slightly modified version of my ON CONFLICT DO SELECT patch is attached to
this mail.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#13Konstantin Knizhnik
k.knizhnik@postgrespro.ru
In reply to: Pavel Stehule (#12)
Re: INSERT ON CONFLICT and RETURNING

On 08.09.2020 12:34, Pavel Stehule wrote:

út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> napsal:

I have performed comparison of different ways of implementing
UPSERT in Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

  create table jsonb_schemas(id serial, schema bytea primary key);
  create unique index on jsonb_schemas(id);
  insert into jsonb_schemas (schema) values ('some') on
conflict(schema) do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
  obj_id integer;
begin
  select id from jsonb_schemas where schema=obj_schema into obj_id;
  if obj_id is null then
    insert into jsonb_schemas (schema) values (obj_schema) on
conflict(schema) do nothing returning id into obj_id;
    if obj_id is null then
      select id from jsonb_schemas where schema=obj_schema into
obj_id;
    end if;
  end if;
  return obj_id;
end;
$$ language plpgsql;

In parallel execution the plpgsql variant can fail. The possible raise
conditions are not handled.

So maybe this is the reason why this is really fast.

With this example I model real use case, where we need to map long key
(json schema in this case) to short  identifier (serial column in this
case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL
implementation (and other implementations of UPSERT as well).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#13)
Re: INSERT ON CONFLICT and RETURNING

út 8. 9. 2020 v 12:34 odesílatel Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> napsal:

On 08.09.2020 12:34, Pavel Stehule wrote:

út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> napsal:

I have performed comparison of different ways of implementing UPSERT in
Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema)
do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on
conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;

In parallel execution the plpgsql variant can fail. The possible raise
conditions are not handled.

So maybe this is the reason why this is really fast.

With this example I model real use case, where we need to map long key
(json schema in this case) to short identifier (serial column in this
case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL
implementation (and other implementations of UPSERT as well).

yes, the performance depends on possibilities - and if you can implement
optimistic or pessimistic locking (or if you know so there is not race
condition possibility)

Pavel

Show quoted text

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#15Pavel Stehule
pavel.stehule@gmail.com
In reply to: Konstantin Knizhnik (#13)
Re: INSERT ON CONFLICT and RETURNING

út 8. 9. 2020 v 12:34 odesílatel Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> napsal:

On 08.09.2020 12:34, Pavel Stehule wrote:

út 8. 9. 2020 v 11:06 odesílatel Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> napsal:

I have performed comparison of different ways of implementing UPSERT in
Postgres.
May be it will be interesting not only for me, so I share my results:

So first of all initialization step:

create table jsonb_schemas(id serial, schema bytea primary key);
create unique index on jsonb_schemas(id);
insert into jsonb_schemas (schema) values ('some') on conflict(schema)
do nothing returning id;

Then I test performance of getting ID of exitsed schema:

1. Use plpgsql script to avoid unneeded database modifications:

create function upsert(obj_schema bytea) returns integer as $$
declare
obj_id integer;
begin
select id from jsonb_schemas where schema=obj_schema into obj_id;
if obj_id is null then
insert into jsonb_schemas (schema) values (obj_schema) on
conflict(schema) do nothing returning id into obj_id;
if obj_id is null then
select id from jsonb_schemas where schema=obj_schema into obj_id;
end if;
end if;
return obj_id;
end;
$$ language plpgsql;

In parallel execution the plpgsql variant can fail. The possible raise
conditions are not handled.

So maybe this is the reason why this is really fast.

With this example I model real use case, where we need to map long key
(json schema in this case) to short identifier (serial column in this
case).
Rows of jsonb_schemas are never updated: it is append-only dictionary.
In this assumption no race condition can happen with this PLpgSQL
implementation (and other implementations of UPSERT as well).

I am not sure, but I think this should be a design and behavior of MERGE
statement - it is designed for OLAP (and speed). Unfortunately, this
feature stalled (and your benchmarks show so there is clean performance
benefit).

Regards

Pavel

Show quoted text

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company