someone working to add merge?

Started by Jaime Casanovaover 20 years ago33 messageshackers
Jump to latest
#1Jaime Casanova
jcasanov@systemguards.com.ec

Hi,

there is someone working in add the MERGE statement?

i don't find much about what a good implementation of merge must have...

i think what it needs to do is something like:

- try to lock the rows for update
- if the lock cannot be immediatly acquire ask why
- if the rows are already locked, wait and try again?
- if no rows were found try de insert part
- if there was any other error, abort
- else update

so i suppose we can reuse many of the code breaking the merge in 3
pieces... for now they are just thougths, i will think more in this
and try to implement it...

comments? ideas? suggestions?

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

#2Josh Berkus
josh@agliodbs.com
In reply to: Jaime Casanova (#1)
Re: someone working to add merge?

Jaime,

so i suppose we can reuse many of the code breaking the merge in 3
pieces... for now they are just thougths, i will think more in this
and try to implement it...

comments? ideas? suggestions?

Funny, we were just discussing this at OpenDBCon. Seems that you can't do a
full implementation of MERGE without Predicate Locking (the ability to say
"lock this table against inserts or updates of any row with key=5").
However, Peter suggested that we could do a proof-of-concept implementation,
working out syntax and trigger issues, based on a full table lock and do the
hard work once it was proved to be feasable.

Peter?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#3Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Josh Berkus (#2)
Re: someone working to add merge?

On 11/11/05, Josh Berkus <josh@agliodbs.com> wrote:

Jaime,

so i suppose we can reuse many of the code breaking the merge in 3
pieces... for now they are just thougths, i will think more in this
and try to implement it...

comments? ideas? suggestions?

Funny, we were just discussing this at OpenDBCon. Seems that you can't do a
full implementation of MERGE without Predicate Locking (the ability to say
"lock this table against inserts or updates of any row with key=5").

it isn't what select for update does?

However, Peter suggested that we could do a proof-of-concept implementation,
working out syntax and trigger issues, based on a full table lock and do the
hard work once it was proved to be feasable.

Peter?

--
Josh Berkus
Aglio Database Solutions
San Francisco

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Jaime Casanova (#3)
Re: someone working to add merge?

Jaime Casanova wrote:

Funny, we were just discussing this at OpenDBCon. Seems that you can't do a
full implementation of MERGE without Predicate Locking (the ability to say
"lock this table against inserts or updates of any row with key=5").

it isn't what select for update does?

It won't prevent the insertion of a row with the given predicate.

cheers

andrew

#5Csaba Nagy
nagy@ecircle-ag.com
In reply to: Jaime Casanova (#3)
Re: someone working to add merge?

On Fri, 2005-11-11 at 18:15, Jaime Casanova wrote:

On 11/11/05, Josh Berkus <josh@agliodbs.com> wrote:

Jaime,

so i suppose we can reuse many of the code breaking the merge in 3
pieces... for now they are just thougths, i will think more in this
and try to implement it...

comments? ideas? suggestions?

Funny, we were just discussing this at OpenDBCon. Seems that you can't do a
full implementation of MERGE without Predicate Locking (the ability to say
"lock this table against inserts or updates of any row with key=5").

it isn't what select for update does?

Select for update only works if the row is already there. If there's no
row, you can't lock it. So you want then to insert it, but then it is
possible that somebody inserted it before you, immediately after your
update... so the solution would be more like:

- try insert;
- if insert fails, do update;

You can already do that, but you have to place a save-point before the
insert, so you can continue your transaction even if the insert fails.
Without knowledge of postgres internals, the simplest would be to be
able to do the "continue transaction if insert fails" with the cheapest
prise to pay. This would mean wrap up existing code, except that
"continue transaction after failure of insert" part.

All this might be completely bull*it of course, I don't know too much
about postgres internals.

[snip]

Cheers,
Csaba.

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Josh Berkus (#2)
Re: someone working to add merge?

Josh Berkus wrote:

Funny, we were just discussing this at OpenDBCon. Seems that you
can't do a full implementation of MERGE without Predicate Locking
(the ability to say "lock this table against inserts or updates of
any row with key=5"). However, Peter suggested that we could do a
proof-of-concept implementation, working out syntax and trigger
issues, based on a full table lock and do the hard work once it was
proved to be feasable.

Yes, I've started to work on this. Realizing that the current way to
manually do an UPDATE-else-INSERT or DELETE-then-INSERT involves a
table lock anyway, a MERGE implementation using a table lock would at
least give some convenience benefit to users. (And possibly some
performance, too, if the decision logic is currently run in the
client.)

A predicate locking implementation for MERGE might actually not be all
that complicated, because you only need to look on pk = constant, not
on arbitrary expressions. Nevertheless, I think it's best to write the
MERGE command first and then optimize the locking.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#2)
Re: someone working to add merge?

Josh Berkus <josh@agliodbs.com> writes:

Funny, we were just discussing this at OpenDBCon. Seems that you can't do a
full implementation of MERGE without Predicate Locking (the ability to say
"lock this table against inserts or updates of any row with key=5").
However, Peter suggested that we could do a proof-of-concept implementation,
working out syntax and trigger issues, based on a full table lock and do the
hard work once it was proved to be feasable.

If you don't have any better idea how to do it than a full table lock,
you might as well not do it at all. A "proof of concept" that doesn't
solve the hard part of the problem is no proof :-(

My first guess about a real implementation would involve extending the
index AM API to offer a function "insert this key, or return the
existing match if there already is one". This might tie into
refactoring the existing implementation of unique indexes, in which all
the problem is put on the AM's head (which is why only btree copes at
the moment).

regards, tom lane

#8Csaba Nagy
nagy@ecircle-ag.com
In reply to: Peter Eisentraut (#6)
Re: someone working to add merge?

OK, I'm relatively new on this list, and I might have missed a few
discussions on this topic.
I wonder if doing it this way would not be better than using a table
lock:

- set a save point;
- insert the row;
- on error:
- roll back to the save point;
- update the row;
- on success release the save point;

This would provide less contention while paying the prise for the save
point. In low contention scenarios the table lock would be better, and I
wonder for high contention scenarios which is better, the table lock, or
the save point version...

Of course the table lock version is the future if predicate locking is
going to be implemented later.

Cheers,
Csaba.

Show quoted text

On Fri, 2005-11-11 at 18:37, Peter Eisentraut wrote:

Josh Berkus wrote:

Funny, we were just discussing this at OpenDBCon. Seems that you
can't do a full implementation of MERGE without Predicate Locking
(the ability to say "lock this table against inserts or updates of
any row with key=5"). However, Peter suggested that we could do a
proof-of-concept implementation, working out syntax and trigger
issues, based on a full table lock and do the hard work once it was
proved to be feasable.

Yes, I've started to work on this. Realizing that the current way to
manually do an UPDATE-else-INSERT or DELETE-then-INSERT involves a
table lock anyway, a MERGE implementation using a table lock would at
least give some convenience benefit to users. (And possibly some
performance, too, if the decision logic is currently run in the
client.)

A predicate locking implementation for MERGE might actually not be all
that complicated, because you only need to look on pk = constant, not
on arbitrary expressions. Nevertheless, I think it's best to write the
MERGE command first and then optimize the locking.

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#7)
Re: someone working to add merge?

Tom Lane wrote:

If you don't have any better idea how to do it than a full table
lock, you might as well not do it at all. A "proof of concept" that
doesn't solve the hard part of the problem is no proof :-(

But the problem here is not to break any kind of performance barrier,
but to give people migrating from MySQL and alternative for REPLACE
command.

My first guess about a real implementation would involve extending
the index AM API to offer a function "insert this key, or return the
existing match if there already is one".

This assumes that there are indexes defined for the columns involved in
the merge condition, which is not required anywhere.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#9)
Re: someone working to add merge?

Peter Eisentraut <peter_e@gmx.net> writes:

This assumes that there are indexes defined for the columns involved in
the merge condition, which is not required anywhere.

Surely they require a unique constraint --- else the behavior isn't even
well defined, is it?

regards, tom lane

#11John Hansen
john@geeknet.com.au
In reply to: Tom Lane (#10)
Re: someone working to add merge?

Tom Lane Wrote:

Surely they require a unique constraint --- else the behavior
isn't even well defined, is it?

From the mysql manual:

'REPLACE works exactly like INSERT, except that if an old record in the
table has the same value as a new record for a PRIMARY KEY or a UNIQUE
index, the old record is deleted before the new record is inserted. See
Section 13.2.4, "INSERT Syntax".'

... John

#12John Hansen
john@geeknet.com.au
In reply to: John Hansen (#11)
Re: someone working to add merge?

I Wrote:

From the mysql manual:

'REPLACE works exactly like INSERT, except that if an old
record in the table has the same value as a new record for a
PRIMARY KEY or a UNIQUE index, the old record is deleted
before the new record is inserted. See Section 13.2.4,
"INSERT Syntax".'

It also says:

Note that unless the table has a PRIMARY KEY or UNIQUE index, using a
REPLACE statement makes no sense. It becomes equivalent to INSERT,
because there is no index to be used to determine whether a new row
duplicates another.

... John

#13Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Peter Eisentraut (#9)
Re: someone working to add merge?

On 11/11/05, Peter Eisentraut <peter_e@gmx.net> wrote:

Tom Lane wrote:

If you don't have any better idea how to do it than a full table
lock, you might as well not do it at all. A "proof of concept" that
doesn't solve the hard part of the problem is no proof :-(

But the problem here is not to break any kind of performance barrier,
but to give people migrating from MySQL and alternative for REPLACE
command.

But MERGE isn't REPLACE...

REPLACE will delete old records to insert new ones; MERGE try to
insert and if the record exists then can UPDATE just a few values,
maybe incrementing them with a value (all the calculation are doing by
the MERGE)

My first guess about a real implementation would involve extending
the index AM API to offer a function "insert this key, or return the
existing match if there already is one".

This assumes that there are indexes defined for the columns involved in
the merge condition, which is not required anywhere.

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#10)
Re: someone working to add merge?

Tom Lane wrote:

Surely they require a unique constraint --- else the behavior isn't
even well defined, is it?

They require that the merge condition does not match for more than one
row, but since the merge condition can do just about anything, there is
no guarantee that a unique constraint encompasses it.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#15Peter Eisentraut
peter_e@gmx.net
In reply to: Jaime Casanova (#13)
Re: someone working to add merge?

Jaime Casanova wrote:

REPLACE will delete old records to insert new ones; MERGE try to
insert and if the record exists then can UPDATE just a few values,
maybe incrementing them with a value (all the calculation are doing
by the MERGE)

I'm not the expert on REPLACE, but it would seem that REPLACE is a
special case of MERGE.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#16John Hansen
john@geeknet.com.au
In reply to: Peter Eisentraut (#15)
Re: someone working to add merge?

Jaime Casanova Wrote:

But MERGE isn't REPLACE...

REPLACE will delete old records to insert new ones; MERGE try
to insert and if the record exists then can UPDATE just a few
values, maybe incrementing them with a value (all the
calculation are doing by the MERGE)

That sounds like MySQL's 'INSERT INTO ... ON DUPLICATE KEY UPDATE',
which they recommend over REPLACE anyways.

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#14)
Re: someone working to add merge?

Peter Eisentraut <peter_e@gmx.net> writes:

Tom Lane wrote:

Surely they require a unique constraint --- else the behavior isn't
even well defined, is it?

They require that the merge condition does not match for more than one
row, but since the merge condition can do just about anything, there is
no guarantee that a unique constraint encompasses it.

ISTM to be a reasonable implementation restriction that there be a
constraint by which the system can prove that there is at most one
matching row. Per other comments in this thread, we'd not be the only
implementation making such a restriction.

(Certainly, if I were a DBA and were told that the performance of MERGE
would go to hell in a handbasket if I had no such constraint, I'd make
sure there was one. I don't think there is very much of a use-case for
the general scenario.)

regards, tom lane

#18Bruno Wolff III
bruno@wolff.to
In reply to: Csaba Nagy (#8)
Re: someone working to add merge?

On Fri, Nov 11, 2005 at 18:48:33 +0100,
Csaba Nagy <nagy@ecircle-ag.com> wrote:

OK, I'm relatively new on this list, and I might have missed a few
discussions on this topic.
I wonder if doing it this way would not be better than using a table
lock:

- set a save point;
- insert the row;
- on error:
- roll back to the save point;
- update the row;
- on success release the save point;

This would provide less contention while paying the prise for the save
point. In low contention scenarios the table lock would be better, and I
wonder for high contention scenarios which is better, the table lock, or
the save point version...

You may not be able to update the row after the insert fails. If there is
insert occurring in another transaction, the row may not be visible to
the current transaction. In which case you can neither insert or update the
row. You need to wait for the other transaction to commit or rollback.

#19Csaba Nagy
nagy@ecircle-ag.com
In reply to: Bruno Wolff III (#18)
Re: someone working to add merge?

On Fri, 2005-11-11 at 20:22, Bruno Wolff III wrote:

On Fri, Nov 11, 2005 at 18:48:33 +0100,
Csaba Nagy <nagy@ecircle-ag.com> wrote:

OK, I'm relatively new on this list, and I might have missed a few
discussions on this topic.
I wonder if doing it this way would not be better than using a table
lock:

- set a save point;
- insert the row;
- on error:
- roll back to the save point;
- update the row;
- on success release the save point;

This would provide less contention while paying the prise for the save
point. In low contention scenarios the table lock would be better, and I
wonder for high contention scenarios which is better, the table lock, or
the save point version...

You may not be able to update the row after the insert fails. If there is
insert occurring in another transaction, the row may not be visible to
the current transaction. In which case you can neither insert or update the
row. You need to wait for the other transaction to commit or rollback.

Are you sure ? From what I understand, the insert will only fail when
the other transaction commits, and actively wait for the commit or
rollback. Look at this:

session_1=> create table test (col smallint primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"test_pkey" for table "test"
CREATE TABLE
session_1=> begin;
BEGIN
cnagy=> insert into test values (1);
INSERT 165068987 1

session_2=> begin;
BEGIN
session_2=> insert into test values (1);

[session_2 is now waiting]

session_1=> commit;
COMMIT

[session_2 wakes up]

ERROR: duplicate key violates unique constraint "test_pkey"

So it looks like predicate locking is already in place for primary key
conditions...

Cheers,
Csaba.

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Csaba Nagy (#19)
Re: someone working to add merge?

Csaba Nagy wrote:

session_1=> create table test (col smallint primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"test_pkey" for table "test"
CREATE TABLE
session_1=> begin;
BEGIN
cnagy=> insert into test values (1);
INSERT 165068987 1

session_2=> begin;
BEGIN
session_2=> insert into test values (1);

[session_2 is now waiting]

This only happens because of the unique index. There's no predicate
locking involved. The btree code goes some lengths to make this work;
it would be probably simple to modify this to support MERGE or REPLACE
on the limited cases where there's a UNIQUE index. Tom has already said
this twice (on this thread only; he has already said it before IIRC.)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#21Csaba Nagy
nagy@ecircle-ag.com
In reply to: Alvaro Herrera (#20)
#22Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Alvaro Herrera (#20)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#20)
#24Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#23)
#25Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#24)
#26Martijn van Oosterhout
kleptog@svana.org
In reply to: Peter Eisentraut (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#25)
#28Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Martijn van Oosterhout (#26)
#29Martijn van Oosterhout
kleptog@svana.org
In reply to: Jaime Casanova (#1)
#30Jan Wieck
JanWieck@Yahoo.com
In reply to: Martijn van Oosterhout (#29)
#31Martijn van Oosterhout
kleptog@svana.org
In reply to: Jan Wieck (#30)
#32Jan Wieck
JanWieck@Yahoo.com
In reply to: Martijn van Oosterhout (#31)
#33Martijn van Oosterhout
kleptog@svana.org
In reply to: Jan Wieck (#32)