Use ctid in where clause in update from statement
Hello,
I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query:
MERGE INTO test_large d
USING (SELECT ROWID, test_large.*
FROM test_large
WHERE grp = 1) s
ON (d.ROWID = s.ROWID)
WHEN MATCHED
THEN
UPDATE SET d.grp = s.grp;
It has the following execution plan:
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1000 | 16000 | 2006 (0)| 00:00:25 |
| 1 | MERGE | TEST_LARGE | | | | |
| 2 | VIEW | | | | | |
| 3 | NESTED LOOPS | | 1000 | 220K| 2006 (0)| 00:00:25 |
| 4 | TABLE ACCESS BY INDEX ROWID| TEST_LARGE | 1000 | 110K| 1006 (0)| 00:00:13 |
|* 5 | INDEX RANGE SCAN | IX_TL_GRP | 1000 | | 6 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY USER ROWID | TEST_LARGE | 1 | 113 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
In the PostgreSQL version I use the column ctid for this. The above query becomes:
UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s
WHERE d.ctid = s.ctid;
I noticed that the execution plan for such a statement seems to be suboptimal:
Update on test_large d (cost=155346.98..160367.08 rows=996 width=53)
-> Merge Join (cost=155346.98..160367.08 rows=996 width=53)
Merge Cond: (d.ctid = test_large.ctid)
-> Sort (cost=154330.57..156833.16 rows=1001033 width=43)
Sort Key: d.ctid
-> Seq Scan on test_large d (cost=0.00..19366.33 rows=1001033 width=43)
-> Sort (cost=1016.40..1018.89 rows=996 width=10)
Sort Key: test_large.ctid
-> Index Scan using ix_tl_grp on test_large (cost=0.42..966.80 rows=996 width=10)
Index Cond: (grp = 1)
I expected a tid scan to be used instead of a Seq scan. I did a VACUUM test_large to make sure statistics are valid.
I would suspect that a tid scan is even a bit faster than accessing via the primary key, since reading the index is not required.
I know I could change the WHERE clause to use the primary key, which results in the following plan:
Update on test_large d (cost=0.85..3344.12 rows=995 width=57)
-> Nested Loop (cost=0.85..3344.12 rows=995 width=57)
-> Index Scan using ix_tl_grp on test_large (cost=0.42..994.84 rows=995 width=14)
Index Cond: (grp = 1)
-> Index Scan using pk_test_large on test_large d (cost=0.42..2.36 rows=1 width=47)
Index Cond: (id = test_large.id)
So my question is more "Is a tid scan not possible / useful for an update" than "how do I accelerate this query".
BR
Dirk
--
Dirk Mika
Software Developer
[cid:image001_d0a5cf51-5dd3-4ff5-b428-1016a6d95a0f.png]
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
[cid:CCh2U_f27a5ce6-8556-4db1-8aa4-fd610d6e9efe.jpg]<https://youtu.be/qfOFXrpSKLQ>
On 1/7/19 12:13 μ.μ., Dirk Mika wrote:
Hello,
I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the
original query I use the pseudo column ROWID to match a source row with a target row.This is a simplified version of such a query:
Postgresql supports upserts : https://www.postgresql.org/docs/11/sql-insert.html "ON CONFLICT"
BR
Dirk
--
Dirk Mika
/Software Developer/
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
I come from the Oracle world and am trying to convert some queries to
PostgreSQL syntax. One of these queries is a MERGE statement, which I
converted into an UPDATE SET FROM WHERE construct. In the original
query I use the pseudo column ROWID to match a source row with a
target row.In the PostgreSQL version I use the column ctid for this. The above query becomes:
*UPDATE*test_large d
*SET* grp = s.grp
*FROM* (*SELECT* ctid, test_large.*
*FROM* test_large
*WHERE* grp = 1) s
*WHERE* d.ctid = s.ctid;
Why don't you join on the primary key column?
The ctid comparison is typically quite slow.
Thomas
On 1/7/19 12:13 μ.μ., Dirk Mika wrote:
*UPDATE*test_large d
*SET* grp = s.grp
*FROM* (*SELECT* ctid, test_large.*
*FROM* test_large
*WHERE* grp = 1) s
*WHERE* d.ctid = s.ctid;
Besides, what the above does is select for rows that have grp=1 and then set grp to the same value.
BR
Dirk
--
Dirk Mika
/Software Developer/
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
I know I can join using the pk, but in oracle using the rowid is faster so I wanted to know, if this is possible in PostgreSQL as well.
Dirk
--
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
## How2Use
## the ChampionChip by mika:timing
## https://youtu.be/qfOFXrpSKLQ
Am 01.07.19, 11:36 schrieb "Thomas Kellerer" <spam_eater@gmx.net>:
I come from the Oracle world and am trying to convert some queries to
PostgreSQL syntax. One of these queries is a MERGE statement, which I
converted into an UPDATE SET FROM WHERE construct. In the original
query I use the pseudo column ROWID to match a source row with a
target row.In the PostgreSQL version I use the column ctid for this. The above query becomes:
*UPDATE*test_large d
*SET* grp = s.grp
*FROM* (*SELECT* ctid, test_large.*
*FROM* test_large
*WHERE* grp = 1) s
*WHERE* d.ctid = s.ctid;
Why don't you join on the primary key column?
The ctid comparison is typically quite slow.
Thomas
This is just a very simplified sample and doesn’t makes sense, but it shows the execution plan. The original query is more complex. It selects more values and updates more columns.
Dirk
--
Dirk Mika
Software Developer
[cid:image001_d0a5cf51-5dd3-4ff5-b428-1016a6d95a0f.png]
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
[cid:CCh2U_f27a5ce6-8556-4db1-8aa4-fd610d6e9efe.jpg]<https://youtu.be/qfOFXrpSKLQ>
Von: Achilleas Mantzios <achill@matrix.gatewaynet.com>
Datum: Montag, 1. Juli 2019 um 11:50
An: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Betreff: Re: Use ctid in where clause in update from statement
On 1/7/19 12:13 μ.μ., Dirk Mika wrote:
UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s
WHERE d.ctid = s.ctid;
Besides, what the above does is select for rows that have grp=1 and then set grp to the same value.
BR
Dirk
--
Dirk Mika
Software Developer
[cid:part1.F9658A75.C84DBA7E@matrix.gatewaynet.com]
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de<mailto:dirk.mika@mikatiming.de>
www.mikatiming.de<http://www.mikatiming.de>
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
[cid:part2.3A5504AA.38B3DF7A@matrix.gatewaynet.com]<https://youtu.be/qfOFXrpSKLQ>
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Attachments:
image002.jpgimage/jpeg; name=image002.jpgDownload+3-3
CCh2U_f27a5ce6-8556-4db1-8aa4-fd610d6e9efe.jpgimage/jpeg; name=CCh2U_f27a5ce6-8556-4db1-8aa4-fd610d6e9efe.jpgDownload+3-3
Hi,
The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.
In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.
Since the SELECT statement only returns records whose PK already exists, I thought an UPDATE FROM would be the better choice.
As I said, with the PK as JOIN condition it works fine too. I'm only interested in why no tid scan is used.
Dirk
--
Dirk Mika
Software Developer
[cid:image001_d0a5cf51-5dd3-4ff5-b428-1016a6d95a0f.png]
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
[cid:CCh2U_f27a5ce6-8556-4db1-8aa4-fd610d6e9efe.jpg]<https://youtu.be/qfOFXrpSKLQ>
Von: Achilleas Mantzios <achill@matrix.gatewaynet.com>
Datum: Montag, 1. Juli 2019 um 11:27
An: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Betreff: Re: Use ctid in where clause in update from statement
On 1/7/19 12:13 μ.μ., Dirk Mika wrote:
Hello,
I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query:
Postgresql supports upserts : https://www.postgresql.org/docs/11/sql-insert.html "ON CONFLICT"
BR
Dirk
--
Dirk Mika
Software Developer
[cid:part1.679C32D5.E7787ACA@matrix.gatewaynet.com]
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de<mailto:dirk.mika@mikatiming.de>
www.mikatiming.de<http://www.mikatiming.de>
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
[cid:part2.E58C80B6.31D80ADB@matrix.gatewaynet.com]<https://youtu.be/qfOFXrpSKLQ>
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Attachments:
image002.jpgimage/jpeg; name=image002.jpgDownload+3-3
CCh2U_f27a5ce6-8556-4db1-8aa4-fd610d6e9efe.jpgimage/jpeg; name=CCh2U_f27a5ce6-8556-4db1-8aa4-fd610d6e9efe.jpgDownload+3-3
Dirk Mika schrieb am 01.07.2019 um 12:02:
I know I can join using the pk, but in oracle using the rowid is
faster so I wanted to know, if this is possible in PostgreSQL as
well.
Well, in Postgres ctid is not necessarily faster.
Dirk Mika schrieb am 01.07.2019 um 13:18:
The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.
In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.
The insert trigger will only be fired if an INSERT actually takes place.
If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired.
I've tried it with the following trigger:
CREATE TRIGGER tr_tl_test1
BEFORE INSERT
ON public.test_large
FOR EACH ROW
EXECUTE PROCEDURE tf_tr_tl_test1 ()
The trigger function does nothing special:
CREATE OR REPLACE FUNCTION public.tf_tr_tl_test1 ()
RETURNS trigger
LANGUAGE 'plpgsql'
VOLATILE
NOT LEAKPROOF
SECURITY INVOKER
PARALLEL UNSAFE
AS
$$
BEGIN
RAISE NOTICE 'Trigger called with: %', new;
RETURN new;
END;
$$
If I do a
EXPLAIN ANALYZE
INSERT INTO test_large (id)
VALUES (2)
ON CONFLICT
ON CONSTRAINT pk_test_large
DO NOTHING;
I get the following:
NOTICE: Trigger called with: (2,,)
QUERY PLAN
---------------------------------------------------------------------------------------------------
Insert on test_large (cost=0.00..0.01 rows=1 width=40) (actual time=0.153..0.153 rows=0 loops=1)
Conflict Resolution: NOTHING
Conflict Arbiter Indexes: pk_test_large
Tuples Inserted: 0
Conflicting Tuples: 1
-> Result (cost=0.00..0.01 rows=1 width=40) (actual time=0.003..0.004 rows=1 loops=1)
Planning Time: 0.142 ms
Trigger tr_tl_test1: time=0.116 calls=1
Execution Time: 0.180 ms
As you can see the trigger function is called for the row I try to insert, but you can also see that there's no tuple inserted but one conflicting.
Dirk
--
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
## How2Use
## the ChampionChip by mika:timing
## https://youtu.be/qfOFXrpSKLQ
Am 01.07.19, 13:33 schrieb "Thomas Kellerer" <spam_eater@gmx.net>:
Dirk Mika schrieb am 01.07.2019 um 13:18:
The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.
In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.
The insert trigger will only be fired if an INSERT actually takes place.
If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired.
Hello Dirk,
pls don't top post, that's the rule here
On 1/7/19 2:40 μ.μ., Dirk Mika wrote:
I've tried it with the following trigger:
CREATE TRIGGER tr_tl_test1
BEFORE INSERT
ON public.test_large
FOR EACH ROW
EXECUTE PROCEDURE tf_tr_tl_test1 ()The trigger function does nothing special:
CREATE OR REPLACE FUNCTION public.tf_tr_tl_test1 ()
RETURNS trigger
LANGUAGE 'plpgsql'
VOLATILE
NOT LEAKPROOF
SECURITY INVOKER
PARALLEL UNSAFE
AS
$$
BEGIN
RAISE NOTICE 'Trigger called with: %', new;
RETURN new;
END;
$$If I do a
EXPLAIN ANALYZE
INSERT INTO test_large (id)
VALUES (2)
ON CONFLICT
ON CONSTRAINT pk_test_large
DO NOTHING;I get the following:
NOTICE: Trigger called with: (2,,)
QUERY PLAN
---------------------------------------------------------------------------------------------------
Insert on test_large (cost=0.00..0.01 rows=1 width=40) (actual time=0.153..0.153 rows=0 loops=1)
Conflict Resolution: NOTHING
Conflict Arbiter Indexes: pk_test_large
Tuples Inserted: 0
Conflicting Tuples: 1
-> Result (cost=0.00..0.01 rows=1 width=40) (actual time=0.003..0.004 rows=1 loops=1)
Planning Time: 0.142 ms
Trigger tr_tl_test1: time=0.116 calls=1
Execution Time: 0.180 msAs you can see the trigger function is called for the row I try to insert, but you can also see that there's no tuple inserted but one conflicting.
A before trigger cannot know the actual. So a BEFORE INSERT trigger will behave as such even if no INSERT takes place.
If you want to skip your particular "normal" trigger , just run :
set session_replication_role to 'replica';
and run your upsert.
Dirk
--
Dirk Mika
Software Developermika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika## How2Use
## the ChampionChip by mika:timing
## https://youtu.be/qfOFXrpSKLQ
Am 01.07.19, 13:33 schrieb "Thomas Kellerer" <spam_eater@gmx.net>:Dirk Mika schrieb am 01.07.2019 um 13:18:
The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.
In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.
The insert trigger will only be fired if an INSERT actually takes place.
If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired.
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
As you can see the trigger function is called for the row I try to
insert, but you can also see that there's no tuple inserted but one
conflicting.
Ah, right.
Thinking about it, it _has_ to call any BEFORE trigger function
as that might change values of the row to be inserted that
could possibly change the outcome of the test.
Hi
Thinking about it, it _has_ to call any BEFORE trigger function
as that might change values of the row to be inserted that
could possibly change the outcome of the test.
Yeah, that was my thought, too. Unfortunately, the affected trigger changes two columns, so I can't change it to an AFTER ROW trigger.
Dirk
--
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
## How2Use
## the ChampionChip by mika:timing
## https://youtu.be/qfOFXrpSKLQ
Dirk Mika <Dirk.Mika@mikatiming.de> writes:
I know I can join using the pk, but in oracle using the rowid is faster so I wanted to know, if this is possible in PostgreSQL as well.
Existing Postgres releases are not very bright about joins on CTID ---
basically merge join is the only plan type you can get for that.
v12 will improve that somewhat.
regards, tom lane