MERGE Specification
The following two files specify the behaviour of the MERGE statement and
how it will work in the world of PostgreSQL. In places, this supercedes
my recent commentary on MERGE, particularly with regard to triggers.
Neither of these files is intended for CVS.
The HTML file was generated from SGML source, though the latter is not
included here for clarity.
The test file shows how I expect a successful test run to look when a
regression test is executed with a working version of final MERGE patch
applied. It has behavioural comments in it also, to make it slightly
more readable.
If anybody has any questions, ask them now please, before I begin
detailed implementation.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Hello Simon,
would you support RETURNING clause?
Regards
Pavel Stehule
Show quoted text
On 21/04/2008, Simon Riggs <simon@2ndquadrant.com> wrote:
The following two files specify the behaviour of the MERGE statement and
how it will work in the world of PostgreSQL. In places, this supercedes
my recent commentary on MERGE, particularly with regard to triggers.Neither of these files is intended for CVS.
The HTML file was generated from SGML source, though the latter is not
included here for clarity.The test file shows how I expect a successful test run to look when a
regression test is executed with a working version of final MERGE patch
applied. It has behavioural comments in it also, to make it slightly
more readable.If anybody has any questions, ask them now please, before I begin
detailed implementation.--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Apr 21, 2008, at 4:08 PM, Simon Riggs wrote:
The following two files specify the behaviour of the MERGE statement
and
how it will work in the world of PostgreSQL. In places, this
supercedes
my recent commentary on MERGE, particularly with regard to triggers.Neither of these files is intended for CVS.
The HTML file was generated from SGML source, though the latter is not
included here for clarity.The test file shows how I expect a successful test run to look when a
regression test is executed with a working version of final MERGE
patch
applied. It has behavioural comments in it also, to make it slightly
more readable.If anybody has any questions, ask them now please, before I begin
detailed implementation.
"MERGE will not invoke Rules." Does this imply that MERGE cannot be
used on views or that the resulting INSERTs or UPDATEs do not work on
views?
Cheers,
M
From books at ejurka.com Mon Apr 21 20:38:15 2008
From: books at ejurka.com (Kris Jurka)
Date: Mon, 21 Apr 2008 16:38:15 -0400 (EDT)
Subject: [Pljava-dev] stack depth limit exceeded - patch possible?
In-Reply-To: <480CBE23.8050006@par.univie.ac.at>
References: <48010499.30000@par.univie.ac.at>
<Pine.BSO.4.64.0804131023000.9928@leary.csoft.net>
<58782.88.116.137.78.1208177811.squirrel@www.par.univie.ac.at>
<Pine.BSO.4.64.0804141112330.5378@leary.csoft.net>
<48060EE6.9040009@par.univie.ac.at>
<4806264F.1060800@ejurka.com> <48074F27.9050606@par.univie.ac.at>
<Pine.BSO.4.64.0804171204310.553@leary.csoft.net>
<480CBE23.8050006@par.univie.ac.at>
Message-ID: <Pine.BSO.4.64.0804211636140.14425@leary.csoft.net>
On Mon, 21 Apr 2008, Alexander W?hrer wrote:
Dear Kris,
thank you very much for building a patched version for me -
unfortunatelly I ran into the following problem after replacing the
pljava.jar and pljava.dll with the patched ones from your archive:java.lang.NoSuchMethodError: _fetch
There's an explicit binding between the C and Java parts of pljava that I
forgot to update when I changed the function signature for move and fetch
to include the thread id. I've done that and put up a new test release:
http://ejurka.com/pgsql/pljava/wohrer
Kris Jurka
On Mon, 2008-04-21 at 22:18 +0200, Pavel Stehule wrote:
would you support RETURNING clause?
I wouldn't rule it out completely, but not in the first implementation
because
- its more work
- its not in the SQL Standard
- neither Oracle nor DB2 support it either, so its only going to provide
incompatibility
- there are some wrinkles with MERGE that means I don't want to
over-complicate it because it looks to me like it will change in future
versions of the standard
- not sure what the use case for that will be
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:
"MERGE will not invoke Rules." Does this imply that MERGE cannot be
used on views or that the resulting INSERTs or UPDATEs do not work on
views?
Yes, that's right. Just like COPY. That seems fine to me because you're
likely to be doing a MERGE immediately after a COPY anyway, so the
restriction just continues.
Rules for Insert, Update and Delete are almost identical to the way
MERGE works anyway, so there's no particular loss of functionality. That
was why I co-opted the ability to DO NOTHING in a WHEN clause from the
way PostgreSQL Rules work.
I'm not taking any explicit decisions to exclude them permanently. I do
think its possible that we could support them and possibly very cheaply,
but I wouldn't make any promises initially.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes:
On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:
"MERGE will not invoke Rules." Does this imply that MERGE cannot be
used on views or that the resulting INSERTs or UPDATEs do not work on
views?
Yes, that's right. Just like COPY.
I find this to be pretty ugly. COPY is a special case because
(a) it is a utility statement not a plannable one, and (b) its only
reason to exist is to transfer data as fast as possible, so bypassing
rules isn't an unreasonable restriction. MERGE has neither of those
properties, and thus arguing that it can or should be like COPY is an
entirely unconvincing proposition. (In fact, I don't even want to think
about what kind of crock you're going to need in order to get it through
the planner without also going through the rewriter.)
Please think a bit harder.
regards, tom lane
On Mon, 2008-04-21 at 20:28 -0400, Tom Lane wrote:
In fact, I don't even want to think
about what kind of crock you're going to need in order to get it through
the planner without also going through the rewriter.
Hmmm, I hadn't thought I might be adding work rather than avoiding it.
I'll give it a go.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Mon, 2008-04-21 at 20:28 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
On Mon, 2008-04-21 at 16:38 -0400, A.M. wrote:
"MERGE will not invoke Rules." Does this imply that MERGE cannot be
used on views or that the resulting INSERTs or UPDATEs do not work on
views?Yes, that's right. Just like COPY.
I find this to be pretty ugly. COPY is a special case because
(a) it is a utility statement not a plannable one, and (b) its only
reason to exist is to transfer data as fast as possible, so bypassing
rules isn't an unreasonable restriction. MERGE has neither of those
properties, and thus arguing that it can or should be like COPY is an
entirely unconvincing proposition.
Unrelated to rule processing, you did read the bit about MERGE and race
conditions? ISTM that MERGE as it stands isn't very useful for anything
other than large data loads since its going to cause problems if used
concurrently.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote:
Unrelated to rule processing, you did read the bit about MERGE and race
conditions? ISTM that MERGE as it stands isn't very useful for anything
other than large data loads since its going to cause problems if used
concurrently.
But that's how the committee designed it, yes?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
"Simon Riggs" <simon@2ndquadrant.com> writes:
Unrelated to rule processing, you did read the bit about MERGE and race
conditions? ISTM that MERGE as it stands isn't very useful for anything
other than large data loads since its going to cause problems if used
concurrently.
If there are race conditions what advantage does it offer over writing plpgsql
or client code to do it?
I thought the whole advantage of having a built-in command is that it could do
the kind of locking our unique constraints do to avoid race conditions.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning
On Mon, 2008-04-21 at 21:57 -0400, Alvaro Herrera wrote:
Simon Riggs wrote:
Unrelated to rule processing, you did read the bit about MERGE and race
conditions? ISTM that MERGE as it stands isn't very useful for anything
other than large data loads since its going to cause problems if used
concurrently.But that's how the committee designed it, yes?
Yes. Not sure if I see your point there, but yes, that's how its been
designed.
Both DB2 and Oracle have additional items to get around the shortcomings
of the command.
The way MERGE works we first test to see if it matches or not, then if
not matched we would activate the NOT MATCHED action, which standard
says must be an insert. The gap between the two actions allows a race
condition to exist.
We could close the gap by taking a lock on the row when we perform the
is-matched test, but that would be expensive for bulk operations. ISTM
the lock should be optional. Not sure what the default should be. Input
welcome.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Tue, Apr 22, 2008 at 08:24:58AM +0100, Simon Riggs wrote:
The way MERGE works we first test to see if it matches or not, then if
not matched we would activate the NOT MATCHED action, which standard
says must be an insert. The gap between the two actions allows a race
condition to exist.We could close the gap by taking a lock on the row when we perform the
is-matched test, but that would be expensive for bulk operations. ISTM
the lock should be optional. Not sure what the default should be. Input
welcome.
ISTM that if the original select does a SELECT FOR UPDATE then it
should work fine for UPDATEs since any update with overwrite the xmax
field anyway.
What you can't do is prevent multiple inserts. Though if its a unique
index you should be able to do the same trick as normal inserts: create
the row, try to insert into the index and if that fails fall back to
doing an update.
What you really need for this though is a non-fatal _bt_check_unique so
you can recover without having a savepoint for every row.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
On Tue, 2008-04-22 at 10:02 +0200, Martijn van Oosterhout wrote:
On Tue, Apr 22, 2008 at 08:24:58AM +0100, Simon Riggs wrote:
The way MERGE works we first test to see if it matches or not, then if
not matched we would activate the NOT MATCHED action, which standard
says must be an insert. The gap between the two actions allows a race
condition to exist.We could close the gap by taking a lock on the row when we perform the
is-matched test, but that would be expensive for bulk operations. ISTM
the lock should be optional. Not sure what the default should be. Input
welcome.ISTM that if the original select does a SELECT FOR UPDATE then it
should work fine for UPDATEs since any update with overwrite the xmax
field anyway.
Yes, agreed, that's what I meant by the lock on the row.
Incidentally, this is essentially the same problem that occurs with
SERIALIZABLE updates.
It should be easy enough to put an optional "LOCK MATCHED ROW" clause
into the MERGE statement, as an extension. The Standard doesn't specify
the lock timing.
What you can't do is prevent multiple inserts. Though if its a unique
index you should be able to do the same trick as normal inserts: create
the row, try to insert into the index and if that fails fall back to
doing an update.
The Standard doesn't really allow that. It's either matched or its not.
MERGE is specifically
1. Match
2. Update or Insert as per step (1), following complex logic
rather than
1. Update
2. if not matched Insert
which is exactly what the MySQL and Teradata upsert statements do, but
only for single row operations, unlike MERGE.
For MERGE, there is no "lets try one of these and if not, I'll switch".
You decide which it is going to be and then do it. Which can fail...
I guess we could just spin through, re-testing the match each time and
re-initiating an action, but I see problems there also, not least of
which is it violates the standard. That may not be that clever, but
there may be reasons we can't see yet, or reasons that would affect
other implementors. Guidance, please, if anybody sees clearly?
What you really need for this though is a non-fatal _bt_check_unique so
you can recover without having a savepoint for every row.
Oracle simply fails in the event of a uniqueness violation, even though
it logs other errors. DB2 fails unconditionally if there is even a
single error. The MySQL and Teradata syntax don't seem to offer any
protection from concurrent inserts either. Teradata and DB2 both use
locking, so they would lock the value prior to the update anyway, so the
update, insert issue would not happen for them at least.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Mon, 2008-04-21 at 22:27 -0400, Gregory Stark wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
Unrelated to rule processing, you did read the bit about MERGE and race
conditions? ISTM that MERGE as it stands isn't very useful for anything
other than large data loads since its going to cause problems if used
concurrently.If there are race conditions what advantage does it offer over writing plpgsql
or client code to do it?
That's an excellent question. I'm not trying to sell you anything here.
MERGE is a SQL Standard command, supported by Oracle, DB2, SQLServer
etc, so there is reason enough to implement it.
There may be also reasons to implement other syntaxes, other behaviours,
which would be non-standard. If people want the latter first/second/not
at all then please speak, its not an either-or situation.
I would expect MERGE to be slightly faster than a well coded PL/pgSQL
function, but there won't be too much in it. It will allow the statement
to be more easily parallelised in the form it currently takes, I would
note.
I thought the whole advantage of having a built-in command is that it could do
the kind of locking our unique constraints do to avoid race conditions.
As I've said elsewhere, we could have it lock each row, its just more
overhead if we do and not necessary at all for bulk data merging.
I'll presume we want locking as an option, unless people say otherwise.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
"Simon Riggs" <simon@2ndquadrant.com> writes:
As I've said elsewhere, we could have it lock each row, its just more
overhead if we do and not necessary at all for bulk data merging.I'll presume we want locking as an option, unless people say otherwise.
It's not so simple. If you look for a row to merge into and don't find one
there's no row to lock. What unique constraints do is hold the lock on the
index page where the entry would have to be added.
That's the trick that plpgsql cannot implement. That's why users are forced to
loop and retry until they manage to do an update successfully or insert
successfully.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
On Apr 22, 2008, at 1:47 PM, Simon Riggs wrote:
On Mon, 2008-04-21 at 22:27 -0400, Gregory Stark wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
Unrelated to rule processing, you did read the bit about MERGE and
race
conditions? ISTM that MERGE as it stands isn't very useful for
anything
other than large data loads since its going to cause problems if
used
concurrently.If there are race conditions what advantage does it offer over
writing plpgsql
or client code to do it?That's an excellent question. I'm not trying to sell you anything
here.
MERGE is a SQL Standard command, supported by Oracle, DB2, SQLServer
etc, so there is reason enough to implement it.There may be also reasons to implement other syntaxes, other
behaviours,
which would be non-standard. If people want the latter first/second/
not
at all then please speak, its not an either-or situation.I would expect MERGE to be slightly faster than a well coded PL/pgSQL
function, but there won't be too much in it. It will allow the
statement
to be more easily parallelised in the form it currently takes, I would
note.I thought the whole advantage of having a built-in command is that
it could do
the kind of locking our unique constraints do to avoid race
conditions.As I've said elsewhere, we could have it lock each row, its just more
overhead if we do and not necessary at all for bulk data merging.
I was hoping to use MERGE alongside the other standard DML. Its
purpose is to set the "truth" regardless of past states.
Why should it be relegated to the bulk-loading basement alongside COPY?
Cheers,
M
On Apr 22, 2008, at 1:17 PM, Gregory Stark wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
As I've said elsewhere, we could have it lock each row, its just more
overhead if we do and not necessary at all for bulk data merging.I'll presume we want locking as an option, unless people say
otherwise.It's not so simple. If you look for a row to merge into and don't
find one
there's no row to lock. What unique constraints do is hold the lock
on the
index page where the entry would have to be added.That's the trick that plpgsql cannot implement. That's why users
are forced to
loop and retry until they manage to do an update successfully or
insert
successfully.
Yeah, hopefully there's a better way to do this other than row locks.
But no matter how this is done, I think we need to handle the race
conditions, and handle them by default. If people *really* know what
they're doing, they can disable the row locking (perhaps one way to
do this would be to grab an explicit lock on the table and have merge
check for that...).
On a different note, if you intend for the SGML to become the doc
page for MERGE, I'd really like to see some more complex examples
showing both delete and more than 2 WHEN cases. Something like the
"multiple actions on single target row" test case would work.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Attachments:
On Tue, Apr 22, 2008 at 02:19:24PM -0500, Decibel! wrote:
But no matter how this is done, I think we need to handle the race
conditions, and handle them by default. If people *really* know what
they're doing, they can disable the row locking (perhaps one way to
do this would be to grab an explicit lock on the table and have merge
check for that...).
I disagree. The spec doesn't require it and MERGE is useful without it.
For a first cut I would say implement as the spec says, race conditions
and all. Later we can think on whether it's worth handling them
directly.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.
On Apr 22, 2008, at 3:20 PM, Martijn van Oosterhout wrote:
On Tue, Apr 22, 2008 at 02:19:24PM -0500, Decibel! wrote:
But no matter how this is done, I think we need to handle the race
conditions, and handle them by default. If people *really* know what
they're doing, they can disable the row locking (perhaps one way to
do this would be to grab an explicit lock on the table and have merge
check for that...).I disagree. The spec doesn't require it and MERGE is useful without
it.
For a first cut I would say implement as the spec says, race
conditions
and all. Later we can think on whether it's worth handling them
directly.
That really strikes me as taking the "MySQL route". If push comes to
shove, I'll take a MERGE with race conditions over no merge at all,
but I think it's very important that it does the right thing. Just
because the spec doesn't say anything about it doesn't mean it's ok.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Attachments:
Decibel! <decibel@decibel.org> writes:
That really strikes me as taking the "MySQL route". If push comes to
shove, I'll take a MERGE with race conditions over no merge at all,
but I think it's very important that it does the right thing. Just
because the spec doesn't say anything about it doesn't mean it's ok.
Agreed. It seems to me that in the last set of discussions, we rejected
implementing MERGE precisely because it failed to provide a solution to
the race-condition problem. I'm not satisfied with a version that
doesn't handle that, because I think that is *exactly* what most people
will try to use it for. The non-concurrent bulk update case that Simon
is arguing for is the uncommon usage.
regards, tom lane