ask for review of MERGE

bxzhai2010@gmail.com

over 15 years ago

In reply to: Marko Tiikkaja (#3)

Re: ask for review of MERGE

On Thu, Sep 23, 2010 at 7:55 PM, Marko Tiikkaja <
marko.tiikkaja@cs.helsinki.fi> wrote:

On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote:

I have just generate a new patch of MERGE command.

I haven't followed the discussion very closely, but this part in the
regression tests caught my attention:
+-- we now have a duplicate key in Buy, so when we join to
+-- Stock we will generate 2 matching rows, not one.
+-- According to standard this command should fail.
+-- But it suceeds in PostgreSQL implementation by simply ignoring the
second
It doesn't seem like a very good idea to go against the standard here. The
"second" row is not well defined in this case so the results are
unpredictable.

Yes, the result is uncertain. It depends on which row is scanned first,
which is almost out of the control of users.

But, in postgres, this is what the system do for UPDATE.

For example, consider a simple update query like the following:

CREATE TABLE target (id int, val int);
INSERT INTO target VALUES (1, 10);

CREATE TABLE source (id int, add int);
INSERT INTO source VALUES (1, 100);
INSERT INTO source VALUES (1, 100000);

-- DO the update query with source table, which has multiple matched rows

UPDATE target SET val = val + add FROM source
WHERE source.id = target.id;

t=# SELECT * FROM target;
id | val
----+-----
1 | 110
(1 row)

The target tuple has two matched source tuples, but it is only updated once.
And, yet, this query is not forbidden by postgres. The result is also
uncertain.

The patch is also missing a (trivial) change to explain.c.

Sorry, I massed up the files. Here comes the new patch file, with EXPLAIN in
it.

Show quoted text

Regards,
Marko Tiikkaja

gsmith@gregsmith.com

over 15 years ago

In reply to: Boxuan Zhai (#1)

Re: ask for review of MERGE

Finding time for a review as large as this one is a bit tough, but I've
managed to set aside a couple of days for it over the next week. I'm
delivering a large project tonight and intend to start in on the review
work tomorrow onced that's cleared up. If you're ever not sure who is
working on your patch and what state they feel it's in, check
https://commitfest.postgresql.org/action/commitfest_view?id=7 for an
update; that's where we keep track of all that information.

Did you ever end up keeping a current version of this patch in an
alternate repository location, such as github? I thought I saw a
suggestion from you about that, but after looking through the history
here all I see are the diff patches you've been sending to the list.
That's fine, just trying to confirm where everything is at.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

gsmith@gregsmith.com

over 15 years ago

In reply to: Boxuan Zhai (#4)

Re: ask for review of MERGE

Starting looking at the latest MERGE patch from Boxuan here tonight. The
regression tests pass for me here, good starting sign. I expect to move
onto trying to break it more creatively next, then onto performance
tests. Nothing much more exciting than that to report so far.

It had suffered some bit rot, I think because of the security label
changes. Attached is a rebased version against the new git HEAD so
nobody else has to duplicate that to apply the patch. Also, to provide
an alternate interface for anyone who wants to do testing/browsing of
this patch, I've made a Github fork with a merge branch in it. I plan to
commit intermediate stuff to there that keeps up to date with review
changes: http://github.com/greg2ndQuadrant/postgres/tree/merge

Probably easier to read
http://github.com/greg2ndQuadrant/postgres/compare/merge than most local
patch viewers, so I gzip'ed the attached updated patch to save some bytes.

One compiler warning I noticed that needs to get resolved:

src/backend/commands/explain.c:

explain.c: In function ï¿½ExplainMergeActionsï¿½:
explain.c:1528: warning: comparison of distinct pointer types lacks a cast

That is complaining about this section:

if (mt_planstate->operation != CMD_MERGE ||
mt_planstate->mt_mergeActPstates == NIL)
return;

So presumably that comparison with NIL needs a cast. Once I get more
familiar with the code I'll fix that myself if Boxuan doesn't offer a
suggestion first.

The rest of the compiler warnings I saw didn't look related to his code,
maybe stuff my picky Ubuntu compiler is noticing that was done recently
to HEAD. I haven't checked HEAD without this patch yet to confirm, and
am done for the night now. Here's the list if anyone is interested:

Warning in src/backend/parser/scan.c:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../src/include -D_GNU_SOURCE -c -o index.o index.c
-MMD -MP -MF .deps/index.Po
In file included from gram.y:12172:
scan.c: In function ï¿½yy_try_NUL_transï¿½:
scan.c:16256: warning: unused variable ï¿½yygï¿½

Warning in src/backend/utils/error/elog.c:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o ts_cache.o
ts_cache.c -MMD -MP -MF .deps/ts_cache.Po
elog.c: In function ï¿½write_consoleï¿½:
elog.c:1698: warning: ignoring return value of ï¿½writeï¿½, declared with
attribute warn_unused_result
elog.c: In function ï¿½write_pipe_chunksï¿½:
elog.c:2388: warning: ignoring return value of ï¿½writeï¿½, declared with
attribute warn_unused_result
elog.c:2397: warning: ignoring return value of ï¿½writeï¿½, declared with
attribute warn_unused_result

Warning in src/bin/scripts/common.c:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I. -I. -I../../../src/interfaces/libpq
-I../../../src/bin/pg_dump -I../../../src/include -D_GNU_SOURCE -c -o
input.o input.c -MMD -MP -MF .deps/input.Po
common.c: In function ï¿½handle_sigintï¿½:
common.c:247: warning: ignoring return value of ï¿½writeï¿½, declared with
attribute warn_unused_result
common.c:250: warning: ignoring return value of ï¿½writeï¿½, declared with
attribute warn_unused_result
common.c:251: warning: ignoring return value of ï¿½writeï¿½, declared with
attribute warn_unused_result

Josh Kupershmidt

schmiddy@gmail.com

over 15 years ago

In reply to: Greg Smith (#6)

Re: ask for review of MERGE

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:

The rest of the compiler warnings I saw didn't look related to his code,
maybe stuff my picky Ubuntu compiler is noticing that was done recently to
HEAD. I haven't checked HEAD without this patch yet to confirm, and am done
for the night now. Here's the list if anyone is interested:

Warning in src/backend/parser/scan.c:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
-I../../../src/include -D_GNU_SOURCE -c -o index.o index.c -MMD -MP -MF
.deps/index.Po
In file included from gram.y:12172:
scan.c: In function ‘yy_try_NUL_trans’:
scan.c:16256: warning: unused variable ‘yyg’

Known problem: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00657.php

I'm pretty sure I've seen the warn_unused_result warnings on HEAD as well.

Josh

robertmhaas@gmail.com

over 15 years ago

In reply to: Greg Smith (#6)

Re: ask for review of MERGE

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:

One compiler warning I noticed that needs to get resolved:

src/backend/commands/explain.c:

explain.c: In function ‘ExplainMergeActions’:
explain.c:1528: warning: comparison of distinct pointer types lacks a cast

That is complaining about this section:

if (mt_planstate->operation != CMD_MERGE ||
mt_planstate->mt_mergeActPstates == NIL)
return;

So presumably that comparison with NIL needs a cast. Once I get more
familiar with the code I'll fix that myself if Boxuan doesn't offer a
suggestion first.

Possibly NULL was meant instead of NIL. NIL is specifically for a List.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

bxzhai2010@gmail.com

over 15 years ago

In reply to: Robert Haas (#8)

Re: ask for review of MERGE

On Wed, Sep 29, 2010 at 9:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:

One compiler warning I noticed that needs to get resolved:

src/backend/commands/explain.c:

explain.c: In function ‘ExplainMergeActions’:
explain.c:1528: warning: comparison of distinct pointer types lacks a

cast

That is complaining about this section:

if (mt_planstate->operation != CMD_MERGE ||
mt_planstate->mt_mergeActPstates == NIL)
return;

So presumably that comparison with NIL needs a cast. Once I get more
familiar with the code I'll fix that myself if Boxuan doesn't offer a
suggestion first.

Possibly NULL was meant instead of NIL. NIL is specifically for a List.

Yes, it should be NULL instead of NIL.

At first, I designed this filed as a List. But I changed it into an array in
the last editions. This is why I have an unmatched assignment here. Sorry
for that.

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#10

robertmhaas@gmail.com

over 15 years ago

In reply to: Greg Smith (#6)

Re: ask for review of MERGE

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:

Starting looking at the latest MERGE patch from Boxuan here tonight. The
regression tests pass for me here, good starting sign. I expect to move onto
trying to break it more creatively next, then onto performance tests.
Nothing much more exciting than that to report so far.

Greg, are you still working on a review of this patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11

gsmith@gregsmith.com

over 15 years ago

In reply to: Robert Haas (#10)

Re: ask for review of MERGE

Robert Haas wrote:

Greg, are you still working on a review of this patch?

Yes, just had more distractions while coming to speed up on this area
than I'd hoped. I'll get a second round of looking at this done by the
weekend.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

#12

http://github.com/greg2ndQuadrant/postgres/commit/9013ba9e81490e3623add1b029760817021297c0

gsmith@gregsmith.com

over 15 years ago

In reply to: Boxuan Zhai (#9)

Re: ask for review of MERGE

Boxuan Zhai wrote:

Yes, it should be NULL instead of NIL.

OK, I applied that patch to my local copy and pushed it out to github:

That represents what I tested against. However, when I tried to merge
against HEAD once I was finished, I discovered this patch has bit-rotted
significantly. If you have a local copy that works for you, I would not
recommend pulling in the PostgreSQL repo updates done in the last couple
of weeks yet. Friday's "Allow WITH clauses to be attached to INSERT,
UPDATE, DELETE statements" commit in particular conflicts quite a bit
with your changes. Attached is a rebased patch that applies to HEAD now
after a round of fixes to resolve those. But it doesn't work yet,
because of recent change to the ExecUpdate and ExecDelete functions
you're calling from src/backend/executor/nodeModifyTable.c inside
ExecMerge. If you can continue working on the patch without merging
recent repo work, I can hack away at fixing that once I figure out what
got changed there recently. It's taken some painful git work to sort
out what I've done so far, there's more left to do, and I know that's
not an area you specialize in.

Back to the feature review...I dove into how I expected this to work,
relative to what it actually does at the moment. That didn't really go
too well so far, but I don't know that this represents any fundamental
issue with the patch. Just situations the existing code didn't really
anticipate we have to flush out. As a general community FYI here, while
it's taken me a while to get up to speed on this whole feature, I expect
to keep chugging away on this regardless of the CommitFest boundaries.
This feature is both too big and too important to just stop working on
it because a date has passed.

Onto the test cases. The examples that Boxuan has been working with,
and that the regression tests included with the patch exercise, all
involve two tables being joined together using MERGE. The use case I
decided to test instead was when you're trying to simulate an UPSERT
where only a single table is involved. I couldn't get to this to work
correctly. Maybe I'm just using MERGE wrong here, but I tried so many
different variations without success (including one that's basically
copied from Simon's original regression test set suggestions) that I
suspect there may be a subtle problem with the implementation instead.

To replicate the most straightforward variations of what I ran into, you
can start with the same data that's populated by the regression test set:

CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock;

item_id | balance
---------+---------
10 | 2200
20 | 1900

If you now execute the following:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=10) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

This works fine, and updates the matching row:

item_id | balance
---------+---------
20 | 1900
10 | 2201

But if I give it a key that doesn't exist instead:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=30) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

This doesn't execute the NOT MATCHED case and INSERT the way I expected
it to. It just gives back "MERGE 0".

Since I wasn't sure if the whole "subquery in the USING clause" case was
really implemented fully, I then tried to do this with something more
like the working regression test examples. I expected this to do the
same thing as the first example:

MERGE INTO Stock t
USING Stock s
ON s.item_id=10 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

But it gives back this:

ERROR: duplicate key value violates unique constraint "stock_item_id_key"
DETAIL: Key (item_id)=(10) already exists.

Can't tell from that whether it's hitting the MATCHED or NOT MATCHED
side of things to generate that. But it doesn't work any better if you
give it an example that doesn't exist:

MERGE INTO Stock t
USING Stock s
ON s.item_id=30 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

ERROR: duplicate key value violates unique constraint "stock_item_id_key"
DETAIL: Key (item_id)=(30) already exists.

Now that one is really weird, because that key value certainly doesn't
exist yet in the table. There seem to be a couple of issues in the area
of joining a table with itself here. Feel free to tell me I just don't
know how to use MERGE if that's the case in any of these.

The other thing I noticed that may take some work to sort out is that I
haven't had any luck getting MERGE to execute from within a plpgsql
function. I was hoping I could use this to update the pgbench tables:

CREATE OR REPLACE FUNCTION merge_account_balance(key INT, delta NUMERIC)
RETURNS VOID AS
$$
BEGIN
MERGE INTO pgbench_accounts t USING (SELECT * FROM pgbench_accounts
WHERE aid = key) AS s ON t.aid=s.aid WHEN MATCHED THEN UPDATE SET
abalance = s.abalance + delta WHEN NOT MATCHED THEN INSERT VALUES
(key,1+(key / 100000)::integer,delta,'');
END;
$$
LANGUAGE plpgsql;

But I just get this instead:

ERROR: "pgbench_accounts" is not a known variable
LINE 4: MERGE INTO pgbench_accounts t USING (SELECT * FROM p...

The other way I wrote the MERGE statement above (not using a subquery)
does the same thing. I know that error messages is coming from the
changes introduced in
http://archives.postgresql.org/pgsql-committers/2010-01/msg00161.php but
I'm not familiar enough with the whole grammar implementation to know
what that means yet.

That's what I found so far in my second pass over this. Once the first
problem here is sorted out, I've already worked out how to test the
performance of the code using pgbench. Have all the scripts ready to go
once the correct MERGE statement is plugged into them, just ran into
this same class of problems when I tried them. So far I was only able
to see how fast the UPDATE path worked though, which isn't very helpful
yet. My hope here is to test the MERGE implementation vs. the classic
pl/pgsql implementation of UPSERT, calling both within a function so
it's a fair comparison, and see how that goes. This may flush out
concurrency bugs that are in the MERGE code as well.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

#13

robertmhaas@gmail.com

over 15 years ago

In reply to: Greg Smith (#12)

Re: ask for review of MERGE

I think that MERGE is supposed to trigger one rule for each row in the
source data. So:

On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=10) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

This works fine, and updates the matching row:

item_id | balance
---------+---------
20 | 1900
10 | 2201

Here you have one row of source data, and you got one action (the WHEN
MATCHED case).

But if I give it a key that doesn't exist instead:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=30) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

This doesn't execute the NOT MATCHED case and INSERT the way I expected it
to. It just gives back "MERGE 0".

Here you have no rows of source data (the USING (SELECT ...) doesn't
return anything, since no rows exist) so nothing happens.

Since I wasn't sure if the whole "subquery in the USING clause" case was
really implemented fully, I then tried to do this with something more like
the working regression test examples. I expected this to do the same thing
as the first example:

MERGE INTO Stock t
USING Stock s
ON s.item_id=10 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

But it gives back this:

ERROR: duplicate key value violates unique constraint "stock_item_id_key"
DETAIL: Key (item_id)=(10) already exists.

Here you have two rows of source data. The ON clause represents the
join condition. The item_id=10 row matches - so you get an update,
presumably, though we can't see that as things turn out - and the
item_id=20 row doesn't match - so you try to insert (10, 1), which is
a duplicate key, thus the error.

Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of
things to generate that. But it doesn't work any better if you give it an
example that doesn't exist:

MERGE INTO Stock t
USING Stock s
ON s.item_id=30 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

ERROR: duplicate key value violates unique constraint "stock_item_id_key"
DETAIL: Key (item_id)=(30) already exists.

In this case neither row of the source data matches the join condition
(s.item_id=30 might as well be constant FALSE as far as the test data
is concerned) so you attempt to execute the NOT MATCHED side twice.
So this one also looks correct to me.

The other thing I noticed that may take some work to sort out is that I
haven't had any luck getting MERGE to execute from within a plpgsql
function. I was hoping I could use this to update the pgbench tables:

Good catch. Considering the size of this patch, I have no problem
leaving this to the eventual committer to fix, or to a subsequent
commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14

bxzhai2010@gmail.com

over 15 years ago

In reply to: Robert Haas (#13)

Re: ask for review of MERGE

On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think that MERGE is supposed to trigger one rule for each row in the
source data. So:

On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=10) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

This works fine, and updates the matching row:

item_id | balance
---------+---------
20 | 1900
10 | 2201

Here you have one row of source data, and you got one action (the WHEN
MATCHED case).

But if I give it a key that doesn't exist instead:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=30) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

This doesn't execute the NOT MATCHED case and INSERT the way I expected

it

to. It just gives back "MERGE 0".

Here you have no rows of source data (the USING (SELECT ...) doesn't
return anything, since no rows exist) so nothing happens.

Yes.
The MERGE process is based on a left join between the source table and
target table.
Since here the source table is empty, no join is carried, and thus no MERGE
action is taken.

But, is it correct logically? I mean, should we insert some rows in the
above example rather than do nothing?

Since I wasn't sure if the whole "subquery in the USING clause" case was
really implemented fully, I then tried to do this with something more

like

the working regression test examples. I expected this to do the same

thing

as the first example:

MERGE INTO Stock t
USING Stock s
ON s.item_id=10 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

But it gives back this:

ERROR: duplicate key value violates unique constraint

"stock_item_id_key"

DETAIL: Key (item_id)=(10) already exists.

Here you have two rows of source data. The ON clause represents the
join condition. The item_id=10 row matches - so you get an update,
presumably, though we can't see that as things turn out - and the
item_id=20 row doesn't match - so you try to insert (10, 1), which is
a duplicate key, thus the error.

Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side

of

things to generate that. But it doesn't work any better if you give it

an

example that doesn't exist:

MERGE INTO Stock t
USING Stock s
ON s.item_id=30 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

ERROR: duplicate key value violates unique constraint

"stock_item_id_key"

DETAIL: Key (item_id)=(30) already exists.

In this case neither row of the source data matches the join condition
(s.item_id=30 might as well be constant FALSE as far as the test data
is concerned) so you attempt to execute the NOT MATCHED side twice.
So this one also looks correct to me.

Yes, that is what happened in the above two examples.

The other thing I noticed that may take some work to sort out is that I
haven't had any luck getting MERGE to execute from within a plpgsql
function. I was hoping I could use this to update the pgbench tables:

Good catch. Considering the size of this patch, I have no problem
leaving this to the eventual committer to fix, or to a subsequent
commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

robertmhaas@gmail.com

over 15 years ago

In reply to: Boxuan Zhai (#14)

Re: ask for review of MERGE

On Mon, Oct 18, 2010 at 10:09 AM, Boxuan Zhai <bxzhai2010@gmail.com> wrote:

On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think that MERGE is supposed to trigger one rule for each row in the
source data. So:

On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=10) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

This works fine, and updates the matching row:

item_id | balance
---------+---------
20 | 1900
10 | 2201

Here you have one row of source data, and you got one action (the WHEN
MATCHED case).

But if I give it a key that doesn't exist instead:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=30) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

This doesn't execute the NOT MATCHED case and INSERT the way I expected
it
to. It just gives back "MERGE 0".

Here you have no rows of source data (the USING (SELECT ...) doesn't
return anything, since no rows exist) so nothing happens.

Yes.
The MERGE process is based on a left join between the source table and
target table.
Since here the source table is empty, no join is carried, and thus no MERGE
action is taken.
But, is it correct logically? I mean, should we insert some rows in the
above example rather than do nothing?

I don't think so. I think the right way to write UPSERT is something
along the lines of:

MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
s.item_id = t.item_id ...

(untested)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

bxzhai2010@gmail.com

over 15 years ago

In reply to: Robert Haas (#15)

Re: ask for review of MERGE

Hi,

I considered the empty source table situation again. I think it is correct
to upsert nothing in this case.

Back to the original logic of MERGE command, it is main purpose is to add
the supplementary data from the source table into the target table. Thus, an
empty source table means no input data is available, so no upsert is needed
in target table.

Regards,
Boxuan

#17

gsmith@gregsmith.com

over 15 years ago

In reply to: Robert Haas (#15)

Re: ask for review of MERGE

Robert Haas wrote:

I think the right way to write UPSERT is something
along the lines of:

MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
s.item_id = t.item_id ...

That led in the right direction, after a bit more fiddling I was finally
able to get something that does what I wanted: a single table UPSERT
implemented with this MERGE implementation. Here's a log of a test
session, suitable for eventual inclusion in the regression tests:

CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock ORDER BY item_id;

item_id | balance
---------+---------
10 | 2200
20 | 1900

MERGE INTO Stock t
USING (VALUES(10,100)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1

SELECT * FROM Stock ORDER BY item_id;
item_id | balance
---------+---------
10 | 2300
20 | 1900

MERGE INTO Stock t
USING (VALUES(30,2000)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1
SELECT * FROM Stock ORDER BY item_id;
item_id | balance
---------+---------
10 | 2300
20 | 1900
30 | 2000

I'm still a little uncertain as to whether any of my other examples
should have worked under the spec but just didn't work here, but I'll
worry about that later.

Here's what the query plan looks like on a MATCH:

Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166
rows=0 loops=1)
Action 1: Update When Matched
Action 2: Insert When Not Mactched
MainPlan:
-> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual
time=0.050..0.061 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
(actual time=0.009..0.010 rows=1 loops=1)
-> Index Scan using stock_item_id_key on stock t
(cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1)
Index Cond: ("*VALUES*".column1 = item_id)
Total runtime: 0.370 ms

And here's a miss:

Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145
rows=0 loops=1)
Action 1: Update When Matched
Action 2: Insert When Not Mactched
MainPlan:
-> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual
time=0.028..0.033 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
(actual time=0.004..0.005 rows=1 loops=1)
-> Index Scan using stock_item_id_key on stock t
(cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
Index Cond: ("*VALUES*".column1 = item_id)
Total runtime: 0.255 ms

Next steps here:
1) Performance/concurrency tests against trigger-based UPSERT approach.
2) Finish bit rot cleanup against HEAD.
3) Work out more complicated test cases to try and fine more unexpected
behavior edge cases and general bugs.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

#18

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

over 15 years ago

In reply to: Greg Smith (#17)

Re: ask for review of MERGE

On 10/21/2010 08:36 PM, Greg Smith wrote:

Robert Haas wrote:

I think the right way to write UPSERT is something
along the lines of:

MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
s.item_id = t.item_id ...

[...]

Here's what the query plan looks like on a MATCH:

Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 rows=0
loops=1)
Action 1: Update When Matched
Action 2: Insert When Not Mactched

"Mactched"? - is this a c&p error or the actual output of EXPLAIN? :)

Stefan

#19

gsmith@gregsmith.com

over 15 years ago

In reply to: Greg Smith (#17)

Re: ask for review of MERGE

There are now two branches of MERGE code in review progress available.
http://github.com/greg2ndQuadrant/postgres/tree/merge-unstable has the
bit-rotted version that doesn't quite work against HEAD yet, while
http://github.com/greg2ndQuadrant/postgres/tree/merge is what I'm still
testing against until I get that sorted out.

Attached is a tar file containing an initial concurrent MERGE test
case. What it does is create is a pgbench database with a scale of 1.
Then it runs a custom test that does an UPSERT using MERGE against
pgbench_accounts, telling pgbench the database scale is actually 2.
This means that approximately half of the statements executed will hit
the MATCHED side of the merge and update existing rows, while half will
hit the NOT MATCHED one and create new account records. Did some small
tests using pgbench's debug mode where you see all the statements it
executes, and all the output looked correct. Successive tests runs are
not deterministically perfect performance comparisons, but with enough
transactions I hope that averages out.

For comparison sake, there's an almost identical test case that does the
same thing using the pl/pgsql example function from the PostgreSQL
manual for the UPSERT instead also in there. You just run test-merge.sh
or test-function.sh and it runs the whole test, presuming you built and
installed pgbench and don't mind that it will blow away any database
named pgbench you already have.

Since the current MERGE implementation is known not to be optimized for
concurrency, my main goal here wasn't to see how fast it was. That
number is interesting though. With the sole postgresql.conf change of:

checkpoint_settings=32

And a debug/assertion build using 8 concurrent clients, I got 1607 TPS
of UPSERT out of the trigger approach @ 6MB/s of writes to disk, while
the MERGE one delivered 988 TPS @ 4.5MB/s of writes. Will explore this
more later.

This did seem to find a bug in the implementation after running for a while:

TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File:
"execMain.c", Line: 1732)

Line number there is relative to what you can see at
http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
and that's a null pointer check at the beginning of
EvalPlanQualFetchRowMarks. Haven't explored why this happened or how
repeatable this is, but since Boxuan was looking for some bugs to chase
I wanted to deliver him one to chew on.

While the performance doesn't need to be great in V1, there needs to be
at least some minimal protection against concurrency issues before this
is commit quality. Will continue to shake this code out looking for
them now that I have some basic testing that works for doing so.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

#20

Marko Tiikkaja

marko@joh.to

over 15 years ago

In reply to: Greg Smith (#19)

Re: ask for review of MERGE

On 2010-10-23 8:34 AM +0300, Greg Smith wrote:

While the performance doesn't need to be great in V1, there needs to be
at least some minimal protection against concurrency issues before this
is commit quality.

What's been bothering me is that so far there has not been an agreement
on whether we need to protect against concurrency issues or not. In
fact, there has been almost no discussion about the concurrency issues
which AIUI have been the biggest single reason we don't have MERGE
already. Right now, this patch fails in even the simplest scenario:

=# create table foo(a int unique);
NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key"
for table "foo"
CREATE TABLE

T1=# begin;
BEGIN
T1=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched
then update set a=a+1 when not matched then insert values (s.i);
MERGE 1

T2=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched
then update set a=a+1 when not matched then insert values (s.i);
-- blocks

T1=# commit;
COMMIT

.. and T2 gets a unique constraint violation.

As far as I'm concerned, the reason people want MERGE is to avoid these
problems; the nicer syntax is just a bonus. Having to LOCK the target
table makes this feature a lot less useful, even though there are a few
use cases where locking the table would be OK.

Regards,
Marko Tiikkaja

#21