update/insert, delete/insert efficiency WRT vacuum and MVCC

Started by Mark Woodwardalmost 20 years ago10 messageshackers

pgsql@mohawksoft.com

almost 20 years ago

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or
exec("delete from foo where name = 'xx'");
exec("insert into foo(name, bar) values('xx','blahblah'");

In my session handler code I can do either, but am curious if it makes any
difference. Yes, "name" is unique.

Zdenek Kotala

Zdenek.Kotala@Sun.COM

almost 20 years ago

In reply to: Mark Woodward (#1)

Re: update/insert, delete/insert efficiency WRT vacuum and

Mark,
I don't know how it will exactly works in postgres but my expectations are:

Mark Woodward wrote:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or

The update code generates new tuple in the datafile and pointer has been
changed in the indexfile to the new version of tuple. This action does
not generate B-Tree structure changes. If update falls than insert
command creates new tuple in the datafile and it adds new item into
B-Tree. It should be generate B-Tree node split.

exec("delete from foo where name = 'xx'");
exec("insert into foo(name, bar) values('xx','blahblah'");

Both commands should generate B-Tree structure modification.

I expect that first variant is better, but It should depend on many
others things - for examples triggers, other indexes ...

REPLACE/UPSERT command solves this problem, but It is still in the TODO
list.

Zdenek

Martijn van Oosterhout

kleptog@svana.org

almost 20 years ago

In reply to: Zdenek Kotala (#2)

Re: update/insert, delete/insert efficiency WRT vacuum and

On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:

Mark,
I don't know how it will exactly works in postgres but my expectations are:

Mark Woodward wrote:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or

The update code generates new tuple in the datafile and pointer has been
changed in the indexfile to the new version of tuple. This action does
not generate B-Tree structure changes. If update falls than insert
command creates new tuple in the datafile and it adds new item into
B-Tree. It should be generate B-Tree node split.

Actually, not true. Both versions will generate a row row and create a
new index tuple. The only difference may be that in the update case the
may be a ctid link from the old version to the new one, but that's
about it...

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 20 years ago

In reply to: Martijn van Oosterhout (#3)

Re: update/insert, delete/insert efficiency WRT vacuum and

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'"); or

In pg, this strategy is generally more efficient, since a pk failing
insert would create
a tx abort and a heap tuple. (so in pg, I would choose the insert first
strategy only when
the insert succeeds most of the time (say > 95%))

Note however that the above error handling is not enough, because two
different sessions
can still both end up trying the insert (This is true for all db systems
when using this strategy).

Andreas

Hannu Krosing

hannu@tm.ee

almost 20 years ago

In reply to: Zeugswetter Andreas SB SD (#4)

Re: update/insert, delete/insert efficiency WRT vacuum

Ühel kenal päeval, T, 2006-07-04 kell 14:53, kirjutas Zeugswetter
Andreas DCP SD:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'"); or

In pg, this strategy is generally more efficient, since a pk failing
insert would create
a tx abort and a heap tuple. (so in pg, I would choose the insert first
strategy only when
the insert succeeds most of the time (say > 95%))

Note however that the above error handling is not enough, because two
different sessions
can still both end up trying the insert (This is true for all db systems
when using this strategy).

I think the recommended strategy is to first try tu UPDATE, if not found
then INSERT, if primary key violation on insert, then UPDATE

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

Andrew Dunstan

andrew@dunslane.net

almost 20 years ago

In reply to: Mark Woodward (#1)

Re: update/insert,

Mark Woodward wrote:

On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:

Mark,
I don't know how it will exactly works in postgres but my expectations
are:

Mark Woodward wrote:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or

The update code generates new tuple in the datafile and pointer has been
changed in the indexfile to the new version of tuple. This action does
not generate B-Tree structure changes. If update falls than insert
command creates new tuple in the datafile and it adds new item into
B-Tree. It should be generate B-Tree node split.

Actually, not true. Both versions will generate a row row and create a
new index tuple. The only difference may be that in the update case the
may be a ctid link from the old version to the new one, but that's
about it...

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

OK, but the point of the question is that constantly updating a single row
steadily degrades performance, would delete/insery also do the same?

If that was the point of the question, you should have said so.

And unless I am much mistaken the answer is "of course it will."

cheers

andrew

Import Notes

Reply to msg id not found: 18232.24.91.171.78.1152110735.squirrel@mail.mohawksoft.com

Mark Woodward

pgsql@mohawksoft.com

almost 20 years ago

In reply to: Martijn van Oosterhout (#3)

Re: update/insert,

On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:

Mark,
I don't know how it will exactly works in postgres but my expectations
are:

Mark Woodward wrote:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or

The update code generates new tuple in the datafile and pointer has been
changed in the indexfile to the new version of tuple. This action does
not generate B-Tree structure changes. If update falls than insert
command creates new tuple in the datafile and it adds new item into
B-Tree. It should be generate B-Tree node split.

Actually, not true. Both versions will generate a row row and create a
new index tuple. The only difference may be that in the update case the
may be a ctid link from the old version to the new one, but that's
about it...

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

OK, but the point of the question is that constantly updating a single row
steadily degrades performance, would delete/insery also do the same?

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

almost 20 years ago

In reply to: Mark Woodward (#7)

Re: update/insert,

OK, but the point of the question is that constantly updating
a single row steadily degrades performance, would
delete/insery also do the same?

Yes, there is currently no difference (so you should do the update).
Of course performance only degrades if vaccuum is not setup correctly.

Andreas

Mark Mielke

mark@mark.mielke.cc

almost 20 years ago

In reply to: Zeugswetter Andreas SB SD (#8)

Re: update/insert,

On Wed, Jul 05, 2006 at 04:59:52PM +0200, Zeugswetter Andreas DCP SD wrote:

OK, but the point of the question is that constantly updating
a single row steadily degrades performance, would
delete/insery also do the same?

Yes, there is currently no difference (so you should do the update).
Of course performance only degrades if vaccuum is not setup correctly.

As Martijn pointed out, there are two differences. One almost
insignificant having to do with internal linkage. The other that
multiples queries are being executed. I would presume with separate
query plans, and so on, therefore you should do the update.

For the case you are talking about, the difference is:

1) Delete which will always succeed
2) Insert that will probably succeed

Vs:

1) Update which if it succeeds, will stop
2) Insert that will probably succeed

In the first case, you are always executing two queries. In the second,
you can sometimes get away with only one query.

Note what other people mentioned, though, that neither of the above is
safe against parallel transactions updating or inserting rows with the
same key.

In both cases, a 'safe' implementation should loop if 2) fails and
restart the operation.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#10

Joshua D. Drake

jd@commandprompt.com

almost 20 years ago

In reply to: Mark Woodward (#7)

Re: update/insert,

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

OK, but the point of the question is that constantly updating a single row
steadily degrades performance, would delete/insery also do the same?

Yes. Delete still creates a dead row. There are programatic ways around this
but keeping a delete table that can be truncated at intervals.

Joshua D. Drake

Show quoted text

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings