update/insert, delete/insert efficiency WRT vacuum and MVCC

Started by Mark Woodwardover 19 years ago10 messages
#1Mark Woodward
pgsql@mohawksoft.com

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or
exec("delete from foo where name = 'xx'");
exec("insert into foo(name, bar) values('xx','blahblah'");

In my session handler code I can do either, but am curious if it makes any
difference. Yes, "name" is unique.

#2Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Mark Woodward (#1)
Re: update/insert, delete/insert efficiency WRT vacuum and

Mark,
I don't know how it will exactly works in postgres but my expectations are:

Mark Woodward wrote:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or

The update code generates new tuple in the datafile and pointer has been
changed in the indexfile to the new version of tuple. This action does
not generate B-Tree structure changes. If update falls than insert
command creates new tuple in the datafile and it adds new item into
B-Tree. It should be generate B-Tree node split.

exec("delete from foo where name = 'xx'");
exec("insert into foo(name, bar) values('xx','blahblah'");

Both commands should generate B-Tree structure modification.

I expect that first variant is better, but It should depend on many
others things - for examples triggers, other indexes ...

REPLACE/UPSERT command solves this problem, but It is still in the TODO
list.

Zdenek

#3Martijn van Oosterhout
kleptog@svana.org
In reply to: Zdenek Kotala (#2)
Re: update/insert, delete/insert efficiency WRT vacuum and

On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:

Mark,
I don't know how it will exactly works in postgres but my expectations are:

Mark Woodward wrote:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or

The update code generates new tuple in the datafile and pointer has been
changed in the indexfile to the new version of tuple. This action does
not generate B-Tree structure changes. If update falls than insert
command creates new tuple in the datafile and it adds new item into
B-Tree. It should be generate B-Tree node split.

Actually, not true. Both versions will generate a row row and create a
new index tuple. The only difference may be that in the update case the
may be a ctid link from the old version to the new one, but that's
about it...

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#4Zeugswetter Andreas DCP SD
ZeugswetterA@spardat.at
In reply to: Martijn van Oosterhout (#3)
Re: update/insert, delete/insert efficiency WRT vacuum and

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'"); or

In pg, this strategy is generally more efficient, since a pk failing
insert would create
a tx abort and a heap tuple. (so in pg, I would choose the insert first
strategy only when
the insert succeeds most of the time (say > 95%))

Note however that the above error handling is not enough, because two
different sessions
can still both end up trying the insert (This is true for all db systems
when using this strategy).

Andreas

#5Hannu Krosing
hannu@skype.net
In reply to: Zeugswetter Andreas DCP SD (#4)
Re: update/insert, delete/insert efficiency WRT vacuum

Ühel kenal päeval, T, 2006-07-04 kell 14:53, kirjutas Zeugswetter
Andreas DCP SD:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'"); or

In pg, this strategy is generally more efficient, since a pk failing
insert would create
a tx abort and a heap tuple. (so in pg, I would choose the insert first
strategy only when
the insert succeeds most of the time (say > 95%))

Note however that the above error handling is not enough, because two
different sessions
can still both end up trying the insert (This is true for all db systems
when using this strategy).

I think the recommended strategy is to first try tu UPDATE, if not found
then INSERT, if primary key violation on insert, then UPDATE

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Woodward (#1)
Re: update/insert,

Mark Woodward wrote:

On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:

Mark,
I don't know how it will exactly works in postgres but my expectations
are:

Mark Woodward wrote:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or

The update code generates new tuple in the datafile and pointer has been
changed in the indexfile to the new version of tuple. This action does
not generate B-Tree structure changes. If update falls than insert
command creates new tuple in the datafile and it adds new item into
B-Tree. It should be generate B-Tree node split.

Actually, not true. Both versions will generate a row row and create a
new index tuple. The only difference may be that in the update case the
may be a ctid link from the old version to the new one, but that's
about it...

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

OK, but the point of the question is that constantly updating a single row
steadily degrades performance, would delete/insery also do the same?

If that was the point of the question, you should have said so.

And unless I am much mistaken the answer is "of course it will."

cheers

andrew

#7Mark Woodward
pgsql@mohawksoft.com
In reply to: Martijn van Oosterhout (#3)
Re: update/insert,

On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:

Mark,
I don't know how it will exactly works in postgres but my expectations
are:

Mark Woodward wrote:

Is there a difference in PostgreSQL performance between these two
different strategies:

if(!exec("update foo set bar='blahblah' where name = 'xx'"))
exec("insert into foo(name, bar) values('xx','blahblah'");
or

The update code generates new tuple in the datafile and pointer has been
changed in the indexfile to the new version of tuple. This action does
not generate B-Tree structure changes. If update falls than insert
command creates new tuple in the datafile and it adds new item into
B-Tree. It should be generate B-Tree node split.

Actually, not true. Both versions will generate a row row and create a
new index tuple. The only difference may be that in the update case the
may be a ctid link from the old version to the new one, but that's
about it...

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

OK, but the point of the question is that constantly updating a single row
steadily degrades performance, would delete/insery also do the same?

#8Zeugswetter Andreas DCP SD
ZeugswetterA@spardat.at
In reply to: Mark Woodward (#7)
Re: update/insert,

OK, but the point of the question is that constantly updating
a single row steadily degrades performance, would
delete/insery also do the same?

Yes, there is currently no difference (so you should do the update).
Of course performance only degrades if vaccuum is not setup correctly.

Andreas

#9Noname
mark@mark.mielke.cc
In reply to: Zeugswetter Andreas DCP SD (#8)
Re: update/insert,

On Wed, Jul 05, 2006 at 04:59:52PM +0200, Zeugswetter Andreas DCP SD wrote:

OK, but the point of the question is that constantly updating
a single row steadily degrades performance, would
delete/insery also do the same?

Yes, there is currently no difference (so you should do the update).
Of course performance only degrades if vaccuum is not setup correctly.

As Martijn pointed out, there are two differences. One almost
insignificant having to do with internal linkage. The other that
multiples queries are being executed. I would presume with separate
query plans, and so on, therefore you should do the update.

For the case you are talking about, the difference is:

1) Delete which will always succeed
2) Insert that will probably succeed

Vs:

1) Update which if it succeeds, will stop
2) Insert that will probably succeed

In the first case, you are always executing two queries. In the second,
you can sometimes get away with only one query.

Note what other people mentioned, though, that neither of the above is
safe against parallel transactions updating or inserting rows with the
same key.

In both cases, a 'safe' implementation should loop if 2) fails and
restart the operation.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#10Joshua D. Drake
jd@commandprompt.com
In reply to: Mark Woodward (#7)
Re: update/insert,

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

OK, but the point of the question is that constantly updating a single row
steadily degrades performance, would delete/insery also do the same?

Yes. Delete still creates a dead row. There are programatic ways around this
but keeping a delete table that can be truncated at intervals.

Joshua D. Drake

Show quoted text

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings