most idiomatic way to "update or insert"?

Started by Mark Harrisonover 21 years ago22 messagesgeneral
Jump to latest
#1Mark Harrison
mh@pixar.com

So I have some data that I want to put into a table. If the
row already exists (as defined by the primary key), I would
like to update the row. Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

--
Mark Harrison
Pixar Animation Studios

#2Peter Darley
pdarley@kinesis-cem.com
In reply to: Mark Harrison (#1)
Re: most idiomatic way to "update or insert"?

Mark,
It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

I believe if you put these on the same line it will be a single
transaction. It has the benefit of not updating the row if there aren't
real changes. It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Mark Harrison
Sent: Wednesday, August 04, 2004 4:26 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] most idiomatic way to "update or insert"?

So I have some data that I want to put into a table. If the
row already exists (as defined by the primary key), I would
like to update the row. Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

#3Duane Lee - EGOVX
DLee@mail.maricopa.gov
In reply to: Peter Darley (#2)
Re: most idiomatic way to "update or insert"?

You could always do a SELECT and if the row exists then UPDATE otherwise
INSERT. Or INSERT and if you get an error then UPDATE.

Duane

-----Original Message-----
From: Mark Harrison [mailto:mh@pixar.com]
Sent: Wednesday, August 04, 2004 4:26 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] most idiomatic way to "update or insert"?

So I have some data that I want to put into a table. If the
row already exists (as defined by the primary key), I would
like to update the row. Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

#4Bruce Momjian
bruce@momjian.us
In reply to: Peter Darley (#2)
Re: most idiomatic way to "update or insert"?

I'll mention that often I do exactly what you're doing. I find deleting all
existing records and then inserting what I want to appear to be cleaner than
handling the various cases that can arise if you don't.

This happens most often when I have a list of items and have a UI that allows
the user to edit the entire list and commit a whole new list in one action.
It's much easier to simply delete the old list and insert the entire new list
in a single query than to try to figure out which rows to delete and which to
insert.

--
greg

#5Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Peter Darley (#2)
Re: most idiomatic way to "update or insert"?

I don't think that works - there's a race condition if you do not do any
locking.

Why:
Before a transaction that inserts rows is committed, other transactions are
not aware of the inserted rows, so the select returns no rows.

So:
You can either create a unique index and catch insert duplicate failures.

Or:
lock the relevant tables, then do the select ... update/insert or insert
... select , or whatever it is you want to do.

Or:
both.

Test it out yourself.

At 07:51 AM 8/5/2004, Peter Darley wrote:

Show quoted text

Mark,
It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

I believe if you put these on the same line it will be a single
transaction. It has the benefit of not updating the row if there aren't
real changes. It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

#6Pierre-Frédéric Caillaud
lists@boutiquenumerique.com
In reply to: Lincoln Yeoh (#5)
Re: most idiomatic way to "update or insert"?

I use stored procedures :

create function insertorupdate(....)
UPDATE mytable WHERE ... SET ...
IF NOT FOUND THEN
INSERT INTO mytable ...
END IF;

You lose flecibility in your request though.

I wish Postgresql had an INSERT OR UPDATE like MySQL does. So far it's
the only thing that I regret from MySQL.

#7Richard Huxton
dev@archonet.com
In reply to: Mark Harrison (#1)
Re: most idiomatic way to "update or insert"?

Mark Harrison wrote:

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

The delete+insert isn't quite the same as an update since you might have
foreign keys referring to foo with "ON DELETE CASCADE" - oops, just lost
all your dependant rows. Other people have warned about race conditions
with insert/test/update.

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or updating
something I take a long hard look at my design - it ususally means I've
not thought clearly about something.

For a "running total" table it can make more sense to have an entry with
a total of 0 created automatically via a trigger. Likewise with some
other summary tables.

Can you give an actual example of where you need this?

--
Richard Huxton
Archonet Ltd

#8Tommi Maekitalo
t.maekitalo@epgmbh.de
In reply to: Mark Harrison (#1)
Re: most idiomatic way to "update or insert"?

Hi,

I prefer to update and if the number of updated rows equals 0 do an insert. So
in case of update I need only one roundtrip. If insert is far more common in
this case it might be better try insert and catch the error. But I try to
avoid running on an error intentionally.

First delete and then insert works but needs 2 SQL-statements in every case.
And the database need to update indexes at least once. There might be also
problems with cascaded deletes.

Tommi

Am Donnerstag, 5. August 2004 01:25 schrieb Mark Harrison:

Show quoted text

So I have some data that I want to put into a table. If the
row already exists (as defined by the primary key), I would
like to update the row. Otherwise, I would like to insert
the row.

I've been doing something like

delete from foo where name = 'xx';
insert into foo values('xx',1,2,...);

but I've been wondering if there's a more idiomatic or canonical
way to do this.

Many TIA,
Mark

#9Peter Darley
pdarley@kinesis-cem.com
In reply to: Lincoln Yeoh (#5)
Re: most idiomatic way to "update or insert"?

Lincoln,
It works for me...
I think what you said is wrong because it updates first (if there is a row
to update), then inserts. If there is a row to update the insert won't
insert anything. If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction. Plus, as shown
in the code to follow, I have almost this exact thing in my application and
I know that it does work for me. :)

Code (Perl):
$Neo::DB::Neo->do ("UPDATE Sample_Settings SET Value=" . Quote($Args{Value})
. " WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" .
Quote($Args{Setting}) . "; INSERT INTO Sample_Settings (Sample_ID, Setting,
Value) (SELECT " . Quote($Args{SampleID}) . ", " . Quote($Args{Setting}) .
", " . Quote($Args{Value}) . " WHERE NOT EXISTS (SELECT 1 FROM
Sample_Settings WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting="
. Quote($Args{Setting}) . "));");

Thanks,
Peter Darley

-----Original Message-----
From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my]
Sent: Wednesday, August 04, 2004 6:49 PM
To: Peter Darley; Mark Harrison; pgsql-general@postgresql.org
Subject: Re: [GENERAL] most idiomatic way to "update or insert"?

I don't think that works - there's a race condition if you do not do any
locking.

Why:
Before a transaction that inserts rows is committed, other transactions are
not aware of the inserted rows, so the select returns no rows.

So:
You can either create a unique index and catch insert duplicate failures.

Or:
lock the relevant tables, then do the select ... update/insert or insert
... select , or whatever it is you want to do.

Or:
both.

Test it out yourself.

At 07:51 AM 8/5/2004, Peter Darley wrote:

Show quoted text

Mark,
It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

I believe if you put these on the same line it will be a single
transaction. It has the benefit of not updating the row if there aren't
real changes. It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

#10Mike Mascari
mascarm@mascari.com
In reply to: Peter Darley (#9)
Re: most idiomatic way to "update or insert"?

Peter Darley wrote:

Lincoln, It works for me... I think what you said is wrong
because it updates first (if there is a row to update), then
inserts. If there is a row to update the insert won't insert
anything. If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction.
Plus, as shown in the code to follow, I have almost this exact
thing in my application and I know that it does work for me. :)

You're getting lucky. I suggested the same thing four years ago. The
race condition is still there:

http://groups.google.com/groups?hl=en&amp;lr=&amp;ie=UTF-8&amp;selm=2344.978158285%40sss.pgh.pa.us

HTH,

Mike Mascari

#11Csaba Nagy
nagy@ecircle-ag.com
In reply to: Peter Darley (#9)
Re: most idiomatic way to "update or insert"?

Peter,

The "does not work" part is not refering to the method not working at
all, but to the fact that it is not safe when you have multiple
transactions operating on the same row at the same time.
There are plenty of discussions about the insert-or-update race
conditions on this list, and the final conclusion was always that it is
not possible to solve the race condition without being prepared to catch
exceptions and retry the whole thing until it succedes...
The reason of the race condition: let's say 2 transactions A and B try
to insert-or-update the same row which does not exist. They do the
update statement at the same time, and BOTH OF THEM gets as a result
that no rows were updated, since the row does not exist yet. Now both
transactions try to insert the row, and obviously one of them will fail.
So your code must be prepared that the insert can fail, and in that case
it should retry with the update.
People tried to devise a method to avoid the race condition and throwing
exception, but it is just not possible.
Now the one bad thing in postgres which people complained about in this
context is that the transaction gets rolled back on any error, so
actually instead of just retrying the update, you will have to redo your
whole transaction.

HTH,
Csaba.

Show quoted text

On Thu, 2004-08-05 at 15:28, Peter Darley wrote:

Lincoln,
It works for me...
I think what you said is wrong because it updates first (if there is a row
to update), then inserts. If there is a row to update the insert won't
insert anything. If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction. Plus, as shown
in the code to follow, I have almost this exact thing in my application and
I know that it does work for me. :)

Code (Perl):
$Neo::DB::Neo->do ("UPDATE Sample_Settings SET Value=" . Quote($Args{Value})
. " WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting=" .
Quote($Args{Setting}) . "; INSERT INTO Sample_Settings (Sample_ID, Setting,
Value) (SELECT " . Quote($Args{SampleID}) . ", " . Quote($Args{Setting}) .
", " . Quote($Args{Value}) . " WHERE NOT EXISTS (SELECT 1 FROM
Sample_Settings WHERE Sample_ID=" . Quote($Args{SampleID}) . " AND Setting="
. Quote($Args{Setting}) . "));");

Thanks,
Peter Darley

-----Original Message-----
From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my]
Sent: Wednesday, August 04, 2004 6:49 PM
To: Peter Darley; Mark Harrison; pgsql-general@postgresql.org
Subject: Re: [GENERAL] most idiomatic way to "update or insert"?

I don't think that works - there's a race condition if you do not do any
locking.

Why:
Before a transaction that inserts rows is committed, other transactions are
not aware of the inserted rows, so the select returns no rows.

So:
You can either create a unique index and catch insert duplicate failures.

Or:
lock the relevant tables, then do the select ... update/insert or insert
... select , or whatever it is you want to do.

Or:
both.

Test it out yourself.

At 07:51 AM 8/5/2004, Peter Darley wrote:

Mark,
It's not canonical by any means, but what I do is:

update foo set thing='stuff' where name = 'xx' and thing<>'stuff';
insert into foo (name, thing) (select 'xx' as name, 'stuff' as thing where
not exists (select 1 from foo where name='xx'));

I believe if you put these on the same line it will be a single
transaction. It has the benefit of not updating the row if there aren't
real changes. It's plenty quick too, if name is indexed.

Thanks,
Peter Darley

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#12Peter Darley
pdarley@kinesis-cem.com
In reply to: Mike Mascari (#10)
Re: most idiomatic way to "update or insert"?

Mike,
Ahha! I didn't understand what the objection was. I guess I am getting
lucky. :)
It seems to me that this is true with any concurrent inserts, isn't it?
One will succeed and one will fail.
Thanks,
Peter Darley

-----Original Message-----
From: Mike Mascari [mailto:mascarm@mascari.com]
Sent: Thursday, August 05, 2004 6:51 AM
To: Peter Darley
Cc: Lincoln Yeoh; Mark Harrison; pgsql-general@postgresql.org
Subject: Re: [GENERAL] most idiomatic way to "update or insert"?

Peter Darley wrote:

Lincoln, It works for me... I think what you said is wrong
because it updates first (if there is a row to update), then
inserts. If there is a row to update the insert won't insert
anything. If there is no row to update the insert inserts a row.
Either way, the insert is the last thing in the transaction.
Plus, as shown in the code to follow, I have almost this exact
thing in my application and I know that it does work for me. :)

You're getting lucky. I suggested the same thing four years ago. The
race condition is still there:

http://groups.google.com/groups?hl=en&amp;lr=&amp;ie=UTF-8&amp;selm=2344.978158285%40sss
.pgh.pa.us

HTH,

Mike Mascari

#13Arthur Ward
award@dominionsciences.com
In reply to: Richard Huxton (#7)
Re: most idiomatic way to "update or insert"?

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or updating
something I take a long hard look at my design - it ususally means I've
not thought clearly about something.

...

Can you give an actual example of where you need this?

We have an environment where our data collection occurs by screen scraping
(er, web scraping?). Unfortunately, it takes two passes, once across
search results which provide partial data, then a second time over a
detail page loaded for each item in the search results we were given.
Since time is of the essence, we provide the partial data to our
customers, which means dealing with the insert or update. Additionally,
the process is multithreaded, so search results can be touching things
concurrently with details being loaded, otherwise we can't keep up.

I dealt with the problem by wrapping every touch of an item in a single
transaction with a loop around it, as has been recommended here many times
before. Any DB-exception (Python) inside the loop caused by concurrency
type problems causes a restart. As it turns out, the insert/update race
has yet to result in a retry. The real payoff in this design has proven to
be dealing with FK locking... without putting way more effort into fixing
it than the deadlocks are worth, we get around a dozen deadlocks a day
that are automatically retried.

#14Ron St-Pierre
rstpierre@syscor.com
In reply to: Richard Huxton (#7)
Re: most idiomatic way to "update or insert"?

Richard Huxton wrote:

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or
updating something I take a long hard look at my design - it ususally
means I've not thought clearly about something.

Can you give an actual example of where you need this?

How about stocks for a simple example? Let's say you have a simple table
with the stock symbol, stock exchange, high, low, open, close and
volume. Every day you update the data for each stock. But there are
always new stocks listed on an exchange, so when a new stock shows up
you have to do an insert instead of an update.

Ron

#15Richard Huxton
dev@archonet.com
In reply to: Ron St-Pierre (#14)
Re: most idiomatic way to "update or insert"?

Ron St-Pierre wrote:

Richard Huxton wrote:

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or
updating something I take a long hard look at my design - it ususally
means I've not thought clearly about something.

Can you give an actual example of where you need this?

How about stocks for a simple example? Let's say you have a simple table
with the stock symbol, stock exchange, high, low, open, close and
volume. Every day you update the data for each stock. But there are
always new stocks listed on an exchange, so when a new stock shows up
you have to do an insert instead of an update.

If it is just a simple table then delete all of them and insert from
scratch. If you wanted to track changes over time (perhaps more likely),
you'd have a separate table with the company name/address etc and a log
table. At which point you'll want to know if it's a new company or not...

--
Richard Huxton
Archonet Ltd

#16Ron St-Pierre
rstpierre@syscor.com
In reply to: Richard Huxton (#15)
Re: most idiomatic way to "update or insert"?

Richard Huxton wrote:

Ron St-Pierre wrote:

Richard Huxton wrote:

An "update or insert" would be useful sometimes, but it's not always
necessary. Indeed, if I find I don't know whether I'm adding or
updating something I take a long hard look at my design - it
ususally means I've not thought clearly about something.

Can you give an actual example of where you need this?

How about stocks for a simple example? Let's say you have a simple
table with the stock symbol, stock exchange, high, low, open, close
and volume. Every day you update the data for each stock. But there
are always new stocks listed on an exchange, so when a new stock
shows up you have to do an insert instead of an update.

If it is just a simple table then delete all of them and insert from
scratch. If you wanted to track changes over time (perhaps more
likely), you'd have a separate table with the company name/address etc
and a log table. At which point you'll want to know if it's a new
company or not...

Okay, this simple example really exists, but the simple table also
includes a date that the stock was last traded, so we have:
stock symbol, stock exchange, high, low, open, close, volume, date, plus
a few more fields

But the data isn't always updated at one time, as we can update all
stocks for one exhange and possibly only some of the stocks for a
particular exchange in one go. Even if the data is received for only one
exchange we could delete all stocks for that exchange and insert new
ones, which would work fine. However some stocks are not traded every
day, so we need to show the trading information for the last date that
it was traded, so we can't delete them en masse even for the one exchange.

BTW these updates do take longer than we'd like so I would appreciate
more input on how this setup could be redesigned.

Ron

#17Bruce Momjian
bruce@momjian.us
In reply to: Ron St-Pierre (#16)
Re: most idiomatic way to "update or insert"?

Ron St-Pierre <rstpierre@syscor.com> writes:

BTW these updates do take longer than we'd like so I would appreciate more
input on how this setup could be redesigned.

Where is the input coming from?

One option is to batch changes. If you just insert into a log table whenever
new data is available, and then do a batch update of many records you would
have a few advantages.

1) You could have a single updater and therefore no worries with concurrency.

2) The optimizer could choose a merge join or at least a nested loop and avoid
multiple round trips.

Something like

update current_stock_price
set price = log.price,
timestamp = log.timestamp
from stock_price log
where current_stock_price.stock = stock_price_log.stock
and stock_price_log.timestamp between ? and ?

You can either just use deterministic time ranges like midnight-midnight or
keep careful track of the last time the job was run.

You would first have to insert into current_stock_price any missing stocks,
but if you're batching them then again you don't have to worry about someone
else inserting them in the middle of your query. And it's more efficient to
add lots of them in one shot than one at a time.

--
greg

#18Richard Huxton
dev@archonet.com
In reply to: Ron St-Pierre (#16)
Re: most idiomatic way to "update or insert"?

Ron St-Pierre wrote:

Okay, this simple example really exists, but the simple table also
includes a date that the stock was last traded, so we have:
stock symbol, stock exchange, high, low, open, close, volume, date, plus
a few more fields

[snip more details]

BTW these updates do take longer than we'd like so I would appreciate
more input on how this setup could be redesigned.

Well, I'd probably make the primary key (stock_id, trading_date) and
just insert into a log table. From there I'd update into a summary
table, or use a view.

Of course, that might make things slower in your case.

--
Richard Huxton
Archonet Ltd

#19Matteo Beccati
php@beccati.com
In reply to: Tommi Maekitalo (#8)
Re: most idiomatic way to "update or insert"?

Hi,

I prefer to update and if the number of updated rows equals 0 do an
insert. So in case of update I need only one roundtrip. If insert is far
more common in this case it might be better try insert and catch the
error. But I try to avoid running on an error intentionally.

When logging to a compact table that stores data in an aggregate form, I
used something like that:

BEGIN;
UPDATE ... ;

if (!affected_rows)
{
INSERT ... ;

if (error)
{
ROLLBACK;
UPDATE ... ;
}
}

COMMIT;

I added the error check with a second UPDATE try after INSERT to
increase accuracy. In fact, INSERTs were sometimes failing because of
concurrency, and this was the only viable solution I found to avoid
losing data.

Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

#20Ron St-Pierre
rstpierre@syscor.com
In reply to: Bruce Momjian (#17)
Re: most idiomatic way to "update or insert"?

Greg Stark wrote:

Ron St-Pierre <rstpierre@syscor.com> writes:

BTW these updates do take longer than we'd like so I would appreciate more
input on how this setup could be redesigned.

Where is the input coming from?

One option is to batch changes.

<snip>

Something like

update current_stock_price
set price = log.price,
timestamp = log.timestamp
from stock_price log
where current_stock_price.stock = stock_price_log.stock
and stock_price_log.timestamp between ? and ?

We check for new stocks and add them, and initially were using a
procedure to do something similar to your code:

CREATE OR REPLACE FUNCTION updateData() RETURNS SETOF datatype AS '
DECLARE
rec RECORD;
BEGIN
FOR rec IN SELECT symbol, tradeDate, tickDate, high, low , open,
close, volume FROM exchangedata LOOP
RETURN NEXT rec;
UPDATE stockdata SET high=rec.high, low=rec.low,
open=rec.low, close=rec.close, volume=rec.volume, tradeDate=rec.tradeDate
WHERE symbol=rec.symbol;
END LOOP;
RETURN;
END;
' LANGUAGE 'plpgsql';
... but it took too long. Off hand, do you know if your approach above
would be quicker?

Ron

#21Bruce Momjian
bruce@momjian.us
In reply to: Matteo Beccati (#19)
#22Jerry Sievers
jerry@jerrysievers.com
In reply to: Peter Darley (#2)