on duplicate key

Started by A Bover 17 years ago4 messagesgeneral

gentosaker@gmail.com

over 17 years ago

Hello.
I was just asked by a mysql-user how do you do
insert ..... on duplicate key update ....
(or however they have it in mysql) in postgresql, if you are going to
run commands from the command line?

My solution up till now has been a function with the

BEGIN
insert ....
EXCEPTION WHEN OTHERS THEN
update ...
END;

I think this is not possible to run this kind of commands on the
command line. Correct?

As I see it, there are three ways
1) a function
2) try to rewrite it as two separate queries insert ... ; update
... ; where the insert will fail sometimes
3) try to search and see if there were any result back, but that would
require the IF THEN construct which is also not available outside of
functions, right?

Is it correct to assume that a function that is searching for the key
and then choosing to insert or update depending on what it found, is
about as fast as doing an insert within a begin- exception-end
statement or are there some inherent speed differences?

Comparing method 1 and 2, are there any noticable speed differences
when the exception is triggered?

What should I answer the mysql-user?

By the way, is there any work done on getting this functionality? I
must admit that it would be handy some times ;-)
Shouldn't this kind of question be added to the FAQ?

Scott Marlowe

scott.marlowe@gmail.com

over 17 years ago

In reply to: A B (#1)

Re: on duplicate key

On Thu, Sep 25, 2008 at 10:25 AM, A B <gentosaker@gmail.com> wrote:

Hello.
I was just asked by a mysql-user how do you do
insert ..... on duplicate key update ....
(or however they have it in mysql) in postgresql, if you are going to
Is it correct to assume that a function that is searching for the key
and then choosing to insert or update depending on what it found, is
about as fast as doing an insert within a begin- exception-end
statement or are there some inherent speed differences?

none really. They're both usually quite fast.

What should I answer the mysql-user?

You should ask him what happens if you setup a benchmark on mysql and
postgresq that runs 500 of these queries in parallel thousands of
times, and halfway through pull the plug.

Oh wait, I forgot, he's running mysql, so he probably doesn't actually
care about data integrity. :) P.s. I'm mostly just kidding around,
but in all seriousness, he's putting the cart ahead of the horse here.

By the way, is there any work done on getting this functionality? I
must admit that it would be handy some times ;-)

I don't know, I find running an update followed by an insert if the
update returns 0 rows is pretty easy to do. From external code or
your favorite pl/lang.

Jeff Davis

pgsql@j-davis.com

over 17 years ago

In reply to: A B (#1)

Re: on duplicate key

On Thu, 2008-09-25 at 18:25 +0200, A B wrote:

My solution up till now has been a function with the

BEGIN
insert ....
EXCEPTION WHEN OTHERS THEN
update ...
END;

Here is the appropriate documentation link, where they have an example:
http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

The primary difference is that they use a loop, which is more robust. In
theory, if you delete the record between when the INSERT happens and
when the UPDATE happens, you will get no effect, which isn't what you're
looking for. The loop will correct for this by ensuring that something
happens before it terminates.

I think this is not possible to run this kind of commands on the
command line. Correct?

This specific construct is in the language PL/pgSQL, which only exists
as a procedural language, and can only be used to create functions.

However, you can do the same thing on the command line using
subtransactions, a.k.a. SAVEPOINTs:
http://www.postgresql.org/docs/8.3/static/sql-savepoint.html

This is how PL/pgSQL works internally, and how you can use other
procedural languages to accomplish the same thing.

Using SQL on the command line doesn't allow you to loop directly, so the
best solution is usually to use a function (and PL/pgSQL is a good
language for this).

As I see it, there are three ways
1) a function

This is probably the right approach, if you use the BEGIN...EXCEPTION
method in the first link I provided.

2) try to rewrite it as two separate queries insert ... ; update
... ; where the insert will fail sometimes

There's a race in the case of a DELETE happening between the INSERT and
the UPDATE. PostgreSQL is designed for high concurrency, and this is
great for performance (especially scalable and consistent performance),
but race conditions are more likely with many things happening at once.

3) try to search and see if there were any result back, but that would
require the IF THEN construct which is also not available outside of
functions, right?

The same race condition exists here.

Is it correct to assume that a function that is searching for the key
and then choosing to insert or update depending on what it found, is
about as fast as doing an insert within a begin- exception-end
statement or are there some inherent speed differences?

Again, be careful of race conditions, but the speed should be
comparable.

What should I answer the mysql-user?

To ask questions on pgsql-general, just like you did ;)

By the way, is there any work done on getting this functionality? I
must admit that it would be handy some times ;-)
Shouldn't this kind of question be added to the FAQ?

Here's a starting place:
http://archives.postgresql.org/pgsql-hackers/2008-04/msg01475.php

You can find other threads about the status of the work by browsing the
mailing list archives.

Regards,
Jeff Davis

A B

gentosaker@gmail.com

over 17 years ago

In reply to: Jeff Davis (#3)

Re: on duplicate key

Here is the appropriate documentation link, where they have an example:
http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

The primary difference is that they use a loop, which is more robust. In
theory, if you delete the record between when the INSERT happens and
when the UPDATE happens, you will get no effect, which isn't what you're
looking for. The loop will correct for this by ensuring that something
happens before it terminates.

Yes, it could have been removed.
I seem to remember that it is not possible to lock the table with a
transaction from within a function, this has to be done on the level
from which I call the function?

However, you can do the same thing on the command line using
subtransactions, a.k.a. SAVEPOINTs:
http://www.postgresql.org/docs/8.3/static/sql-savepoint.html

Oh, fun to learn new stuff :-)
Thank you for your reply