UPSERT wiki page, and SQL MERGE syntax

Started by Peter Geogheganover 11 years ago57 messageshackers
Jump to latest

Having been surprisingly successful at advancing our understanding of
arguments for and against various approaches to "value locking", I
decided to try the same thing out elsewhere. I have created a
general-purpose UPSERT wiki page.

The page is: https://wiki.postgresql.org/wiki/UPSERT

Right now, I would like to address the less contentious but still
unresolved question of whether or not we should use the SQL-standard
MERGE syntax instead of the non-standard INSERT ... ON CONFLICT UPDATE
syntax that my patch proposes.

Note that I'm trying to direct the conversation along certain lines:
Is MERGE the right syntax for this particular effort ("UPSERT")? And,
in particular, not: Is SQL MERGE useful? I think that it is useful,
but is a largely unrelated feature.

Please correct the Wiki page if I have failed to credit SQL MERGE with
some advantage that someone stated at some point. I don't recall any
arguments for it, other than that it is in the SQL standard, but maybe
I missed one.

In general, add to this page, and edit it as you see fit. It'll be
useful to centralize the references, discussion and state of the patch
in one agreed upon place, as the patch continues to evolve.
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Peter Geoghegan (#1)
Re: UPSERT wiki page, and SQL MERGE syntax

On 10/02/2014 07:52 AM, Peter Geoghegan wrote:

Having been surprisingly successful at advancing our understanding of
arguments for and against various approaches to "value locking", I
decided to try the same thing out elsewhere. I have created a
general-purpose UPSERT wiki page.

The page is: https://wiki.postgresql.org/wiki/UPSERT

Thanks. That'll help keep things moving forward rather than around in
circles.

In general, add to this page, and edit it as you see fit. It'll be
useful to centralize the references, discussion and state of the patch
in one agreed upon place, as the patch continues to evolve.

I added a summary of the status quo of upsert in Pg as it stands, and a
brief discussion of the state in other RDBMSes.

I'd love it if someone who knows MySQL better could add info on MySQL's
ON DUPLICATE KEY feature - advantages/problems, etc.

I've added a few points to the goals section:

- Any new upsert approach must be a usability improvement on the status
quo; we don't want to introduce subtle behaviour or unnecessary foot-guns.

- if possible, upsert of multiple values is desirable. We currently have
to loop on a per-value basis.

... and some miscellaneous edits/formatting changes.

I've also added sections for the other options:

* A custom syntax
https://wiki.postgresql.org/wiki/UPSERT#Custom_syntax

and

* Adopting an existing non-standard syntax
https://wiki.postgresql.org/wiki/UPSERT#Adopting_an_existing_non-standard_syntax

which I'd appreciate it if you could fill out based on your existing
research and notes. I don't think it makes sense for me to write those
when you've already done the required study/note taking and just need to
transfer it over.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#1)
Re: UPSERT wiki page, and SQL MERGE syntax

On 10/02/2014 02:52 AM, Peter Geoghegan wrote:

Having been surprisingly successful at advancing our understanding of
arguments for and against various approaches to "value locking", I
decided to try the same thing out elsewhere. I have created a
general-purpose UPSERT wiki page.

The page is: https://wiki.postgresql.org/wiki/UPSERT

Thanks!

Could you write down all of the discussed syntaxes, using a similar
notation we use in the manual, with examples on how to use them? And
some examples on what is possible with some syntaxes, and not with
others? That would make it a lot easier to compare them.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Heikki Linnakangas (#3)
Re: UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 2, 2014 at 12:07 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Could you write down all of the discussed syntaxes, using a similar notation
we use in the manual, with examples on how to use them? And some examples on
what is possible with some syntaxes, and not with others? That would make it
a lot easier to compare them.

I've started off by adding varied examples of the use of the existing
proposed syntax. I'll expand on this soon.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Peter Geoghegan (#4)
Re: UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 2, 2014 at 12:54 AM, Peter Geoghegan <pg@heroku.com> wrote:

I've started off by adding varied examples of the use of the existing
proposed syntax. I'll expand on this soon.

I spent some time today expanding on the details, and commenting on
the issues around the custom syntax (exactly what it does, issues
around ambiguous conflict-on unique indexes, etc).

Also, "Challenges, issues" has been updated - I've added some
"secondary concerns". These are non-contentious items that I think are
of concern, and would like to highlight now.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#1)
Re: UPSERT wiki page, and SQL MERGE syntax

Peter Geoghegan <pg@heroku.com> wrote:

The page is: https://wiki.postgresql.org/wiki/UPSERT

Thanks!

I have two questions I hope you can clarify. I'm having trouble
parsing what this statement means:

... the SQL standard does not require that MERGE be atomic in the
sense of atomically providing either an INSERT or UPDATE, ...

My understanding is that the standard logically requires (without
concern for implementation details) that the second specified table
(via table name or subquery -- which could be a VALUES statement)
is scanned, and for each row it attempts to match a row in the
target table. That will either match or not, according to the
boolean expression in the ON clause. You can have one WHEN MATCHED
THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
If both clauses are present, I believe that it is guaranteed that
one or the other (but not both) will fire for each row in the
second table. The usual guarantees for each isolation level must
not be violated (although an implementation is not required to
generate anomalies which could not happen with serial execution).
So as usual for a transaction involving multiple tables,
serialization anomalies are possible if the transaction isolation
level is REPEATABLE READ or less. Is that what you're getting at,
or something else?

Regarding this example:

-- Predicate within UPDATE auxiliary statement
-- (row is still locked when the UPDATE predicate
-- isn't satisfied):
INSERT INTO upsert(key, val) VALUES(1, 'insert')
-- CONFLICTING() carries forward effects of both INSERT and UPDATE BEFORE row-level triggers
ON CONFLICT UPDATE SET val = CONFLICTING(val)
-- Predicate has interesting semantics visibility-wise:
WHERE val != 'delete';

Can you explain what the WHERE clause there does?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Kevin Grittner (#6)
Re: UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

I have two questions I hope you can clarify. I'm having trouble
parsing what this statement means:

... the SQL standard does not require that MERGE be atomic in the
sense of atomically providing either an INSERT or UPDATE, ...

My understanding is that the standard logically requires (without
concern for implementation details) that the second specified table
(via table name or subquery -- which could be a VALUES statement)
is scanned, and for each row it attempts to match a row in the
target table. That will either match or not, according to the
boolean expression in the ON clause. You can have one WHEN MATCHED
THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
If both clauses are present, I believe that it is guaranteed that
one or the other (but not both) will fire for each row in the
second table. The usual guarantees for each isolation level must
not be violated (although an implementation is not required to
generate anomalies which could not happen with serial execution).

That seems like a statement that isn't getting to the heart of the
matter. Which is: simple UPSERT cases will get things like duplicate
key violations under concurrency with MERGE, in SQL Server, Oracle,
and DB2. I think serialization failures are inevitable and appropriate
when not using READ COMMITTED mode with MVCC, but these duplicate
violations are sort of like READ COMMITTED serialization failures in
disguise. Or they would be, if MERGE (as described by the standard, or
in practice) promised to care about atomicity in that sense. It
doesn't, so they're off the hook. I think we should care about
atomicity (in the sense of always getting an insert or update, never
an error at RC) for the simple reason that that's what people want.

It seems like your remarks here don't acknowledge the reality: That
there are slightly different ideas of "each row" that are in play
here.

So as usual for a transaction involving multiple tables,
serialization anomalies are possible if the transaction isolation
level is REPEATABLE READ or less. Is that what you're getting at,
or something else?

My complaint is quite straightforward, really. I don't want to have to
tell users to do this: http://stackoverflow.com/a/22777749

At the same time, I also don't want to have to live with the
consequences of implementing a MERGE that does not exhibit that
behavior. Which is to say, the consequences of doing something like
selectively using different types of snapshots (MVCC or dirty - the
two different ideas of "each row" that are in tension) based on the
exact clauses used. That seems like asking for trouble, TBH.

-- Predicate within UPDATE auxiliary statement
-- (row is still locked when the UPDATE predicate
-- isn't satisfied):
INSERT INTO upsert(key, val) VALUES(1, 'insert')
-- CONFLICTING() carries forward effects of both INSERT and UPDATE BEFORE row-level triggers
ON CONFLICT UPDATE SET val = CONFLICTING(val)
-- Predicate has interesting semantics visibility-wise:
WHERE val != 'delete';

Can you explain what the WHERE clause there does?

Sure. Having already locked the tuple on the basis on finding a
conflict TID, but before an UPDATE, the qual is evaluated using
EvalPlanQual rescanning. The predicate must be satisfied in order for
the UPDATE to actually proceed. If, in this instance, the locked tuple
happened to have a val of 'delete', then we would not UPDATE at all.
It would still be locked, though (possibly without being visible to
our MVCC snapshot, just like when we might do that when we do UPDATE).

There are some interesting implications visibility-wise here that must
be considered. These are discussed on the Wiki page:
https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29

Obviously higher isolation levels just throw an error here instead.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#7)
Re: UPSERT wiki page, and SQL MERGE syntax

Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

I'm having trouble parsing what this statement means:

... the SQL standard does not require that MERGE be atomic in
the sense of atomically providing either an INSERT or UPDATE, ...

... always getting an insert or update, never an error ...

I've never seen "atomic" used to mean "you never get an error"
before. Perhaps it would be clearer to replace "atomic" with
"error-free" or some such. It certainly would be less confusing
for me. "Atomic" in most cases is a property that can be enforced
by generating an error where it would otherwise be violated.

For example: http://en.wikipedia.org/wiki/Atomicity_%28database_systems%29

"Although implementations vary depending on factors such as
concurrency issues, the principle of atomicity — i.e. complete
success or complete failure — remain."

When you are saying "atomic" you mean something quite different.

My complaint is quite straightforward, really. I don't want to
have to tell users to do this: http://stackoverflow.com/a/22777749

I think you are confusing syntax with semantics. I grant that a
reasonable person could have concerns about using the MERGE syntax
to implement the semantics you want in the special case that an
appropriate unique index exists, but pretending that the semantics
can't be achieved with that syntax is just silly.

At the same time, I also don't want to have to live with the
consequences of implementing a MERGE that does not exhibit that
behavior. Which is to say, the consequences of doing something
like selectively using different types of snapshots (MVCC or
dirty - the two different ideas of "each row" that are in
tension) based on the exact clauses used. That seems like asking
for trouble, TBH.

Now *that* is getting more to a real issue. We routinely pick very
different plans based on the presence or absence of an index, and
we use special snapshots in the course of executing many DML
statements (if FK triggers are fired), but this would be the first
place I can think of that the primary DML statement behaves that
way. You and a couple others have expressed concern about that,
but it seems pretty vague and hand-wavey. If a different execution
node is used for the special behavior, and that node is generated
based on the unique index being used, I'm really having problems
seeing where the problem lies.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Kevin Grittner (#8)
Re: UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 3, 2014 at 2:44 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

I've never seen "atomic" used to mean "you never get an error"
before.

When you are saying "atomic" you mean something quite different.

Perhaps I should have been more careful on that point. Just to be
crystal clear: I don't intend that you'll never get an error (some
cases clearly *should* error). But I want confidence that reasonable,
simple UPSERTs will never error due to a duplicate violation that they
thought the statement was supposed to avoid. That's the point of
UPSERT. Guarantees matter. I would like to make guarantees that are at
least as good as the upsert looping subxact pattern.

Teradata refers to an "atomic UPSERT". It also has a MERGE command.

My complaint is quite straightforward, really. I don't want to
have to tell users to do this: http://stackoverflow.com/a/22777749

I think you are confusing syntax with semantics. I grant that a
reasonable person could have concerns about using the MERGE syntax
to implement the semantics you want in the special case that an
appropriate unique index exists, but pretending that the semantics
can't be achieved with that syntax is just silly.

I am not pretending that it can't be done - just, as I said, that to
do so would have bad consequences when time came to generalize the
syntax to support more advanced cases.

At the same time, I also don't want to have to live with the
consequences of implementing a MERGE that does not exhibit that
behavior. Which is to say, the consequences of doing something
like selectively using different types of snapshots (MVCC or
dirty - the two different ideas of "each row" that are in
tension) based on the exact clauses used. That seems like asking
for trouble, TBH.

Now *that* is getting more to a real issue.

Yes, it is. I am opposed to using the MERGE syntax for this *because*
MERGE is useful (independently useful, when done properly), not
because it is not useful.

We routinely pick very
different plans based on the presence or absence of an index, and
we use special snapshots in the course of executing many DML
statements (if FK triggers are fired), but this would be the first
place I can think of that the primary DML statement behaves that
way. You and a couple others have expressed concern about that,
but it seems pretty vague and hand-wavey. If a different execution
node is used for the special behavior, and that node is generated
based on the unique index being used, I'm really having problems
seeing where the problem lies.

We need to avoid duplicate violations. The authoritative source of
truth here is a unique index, because the presence or absence of a
conclusively-visible conflict tuple controls whether our insert fails
or succeeds respectively. We need a dirty snapshot to see tuples not
in our MVCC snapshot, since they need to be handled too. Otherwise,
when we fail to handle them, the MERGE's INSERT has a dup violation
(which is not what I want, for practical reasons).

If we're seq scanning a table, even with a dirty snapshot, there is no
authoritative source of truth. Heap tuples will be inserted
concurrently, and we'll have no idea what to do about it without
reference to a unique index as a choke point/single source of final
truth about what is and isn't going to be duplicated.

As implemented, my UPSERT patch will have UPSERTs not really use their
MVCC snapshot for simple cases, just like INSERTs in general. UPDATEs
always use their MVCC snapshot, as the ModifyTable node pulls up
tuples from an ordinary relation scan. UPSERT needs a DirtySnapshot
scan of the unique index to find duplicates, because that's always how
we find duplicates. At the same time, MERGE is not at liberty to not
work just because of the absence of a unique index (plus even row
locking must be prepared for conflicts). And since the MERGE is driven
by an outer join - *not* an INSERT-like search for duplicates - we'll
find ourselves providing one behavior when detecting one case, and
another when detecting the other, with subtle and confusing
differences between each case. If we "fake" an outer join by doing an
INSERT-like search for duplicates in a unique index (to make sure we
see duplicates), then we get into trouble elsewhere. Like, we cannot
let users not ask for an INSERT in their MERGE, because we'd be
conclusively deciding to *not* (say) UPDATE on the basis of the
absence of a value in a unique index, as opposed to the presence.

Suddenly, we're in the business of proving a negative....or are we?.
My head hurts.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Peter Geoghegan (#9)
Re: UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan <pg@heroku.com> wrote:

Yes, it is. I am opposed to using the MERGE syntax for this *because*
MERGE is useful (independently useful, when done properly), not
because it is not useful.

As I've mentioned, there is also the practical argument: No one else
does this. That suggests to me that it's hard.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Peter Geoghegan (#9)
Re: UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan <pg@heroku.com> wrote:

We routinely pick very
different plans based on the presence or absence of an index, and
we use special snapshots in the course of executing many DML
statements (if FK triggers are fired)

Apart from FK snapshots, we also use dirty snapshots for unique index
enforcement, obviously. That doesn't mean that it's the "command/xact
snapshot", though. It's just a special constraint enforcement
mechanism.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#6)
Re: UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 3, 2014 at 4:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

... the SQL standard does not require that MERGE be atomic in the
sense of atomically providing either an INSERT or UPDATE, ...

My understanding is that the standard logically requires (without
concern for implementation details) that the second specified table
(via table name or subquery -- which could be a VALUES statement)
is scanned, and for each row it attempts to match a row in the
target table. That will either match or not, according to the
boolean expression in the ON clause. You can have one WHEN MATCHED
THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
If both clauses are present, I believe that it is guaranteed that
one or the other (but not both) will fire for each row in the
second table. The usual guarantees for each isolation level must
not be violated (although an implementation is not required to
generate anomalies which could not happen with serial execution).
So as usual for a transaction involving multiple tables,
serialization anomalies are possible if the transaction isolation
level is REPEATABLE READ or less. Is that what you're getting at,
or something else?

I think the problem is that it's not possible to respect the "usual
guarantees" even at READ COMMITTED level when performing an INSERT OR
UPDATE operation (however spelled). You may find that there's a tuple
with the same PK which is committed but not visible to the snapshot
you took at the beginning of the statement. An INSERT would fail; an
UPDATE would see no rows and do nothing. IOW, *neither* operation
succeeds according to its classic semantics; to succeed, the INSERT OR
UPDATE has to do something altogether novel.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#12)
Re: UPSERT wiki page, and SQL MERGE syntax

On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the problem is that it's not possible to respect the "usual
guarantees" even at READ COMMITTED level when performing an INSERT OR
UPDATE operation (however spelled).

That's definitely the main problem. Also, there could be garden
variety race conditions.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#12)
Re: UPSERT wiki page, and SQL MERGE syntax

On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the problem is that it's not possible to respect the "usual
guarantees" even at READ COMMITTED level when performing an INSERT OR
UPDATE operation (however spelled). You may find that there's a tuple
with the same PK which is committed but not visible to the snapshot
you took at the beginning of the statement.

Can you please comment on this, Kevin? It would be nice to converge on
an agreement on syntax here (without necessarily working out the exact
details right away, including for example the exact spelling of what
I'm calling WITHIN). Since Simon has changed his mind on MERGE (while
still wanting to retain a superficial flavor of MERGE, a flavor that
does not include the MERGE keyword), I think that leaves you as the
only advocate for the MERGE syntax.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#14)
Re: UPSERT wiki page, and SQL MERGE syntax

Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the problem is that it's not possible to respect the "usual
guarantees" even at READ COMMITTED level when performing an INSERT OR
UPDATE operation (however spelled). You may find that there's a tuple
with the same PK which is committed but not visible to the snapshot
you took at the beginning of the statement.

Can you please comment on this, Kevin? It would be nice to converge on
an agreement on syntax here

Robert said "however spelled" -- which I take to mean that he at
least sees that the MERGE-like UPSERT syntax can be turned into the
desired semantics. I have no idea why anyone would think otherwise.

Although the last go-around does suggest that there is at least one
point of difference on the semantics. You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE. That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command. That doesn't
seem as clean to me as saying (for example):

UPSERT targettable t
USING (VALUES (123, 100)) x(id, quantity)
ON t.id = x.id
WHEN NOT MATCHED
INSERT (id, quantity, inserted_at) VALUES (x.id, x.quantity, now())
WHEN MATCHED
UPDATE SET quantity = t.quantity + x.quantity, updated_at = now();

As I understand it you are proposing that would be:

INSERT INTO targettable(key, quantity, inserted_at)
VALUES(123, quantity, now())
ON CONFLICT WITHIN targettable_pkey
UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now();

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Kevin Grittner (#15)
Re: UPSERT wiki page, and SQL MERGE syntax

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics. You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE. That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.

As I understand it you are proposing that would be:

INSERT INTO targettable(key, quantity, inserted_at)
VALUES(123, quantity, now())
ON CONFLICT WITHIN targettable_pkey
UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now();

That's right, yes.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Marti Raudsepp
marti@juffo.org
In reply to: Peter Geoghegan (#16)
Re: UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics. You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE. That seems like a bad idea to me

Indeed, the current behavior breaks even the canonical "keep track of
how many posts are in a thread" trigger example because INSERT
triggers are fired for an effective row UPDATE. I can't see how that's
acceptable.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative.

I proposed an alternative that avoids this surprise and might allow
some other benefits. Can you please look into that? Even a "clarify
this", "this couldn't possibly work" or "you're not a major
contributor so your arguments are invalid" response. ;)

I admit I initially misunderstood some details about the current
behavior. I can dig into the code and write a clearer and/or more
detailed spec if I had even *some* feedback.

Regards,
Marti

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Marti Raudsepp (#17)
Re: UPSERT wiki page, and SQL MERGE syntax

On Wed, Oct 8, 2014 at 4:30 PM, Marti Raudsepp <marti@juffo.org> wrote:

On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics. You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE. That seems like a bad idea to me

Indeed, the current behavior breaks even the canonical "keep track of
how many posts are in a thread" trigger example because INSERT
triggers are fired for an effective row UPDATE. I can't see how that's
acceptable.

DB2 had the foresight to add the following restrictions to BEFORE ROW
triggers - they cannot:

"""
Contain any INSERT, DELETE, or UPDATE operations, nor invoke any
routine defined with MODIFIES SQL DATA, if it is not a compound SQL.
Contain any DELETE or UPDATE operations on the trigger subject table,
nor invoke any routine containing such operations, if it is a compound
SQL.
Reference a materialized query table defined with REFRESH IMMEDIATE
Reference a generated column other than the identity column in the NEW
transition variable.
"""

To get a sense of how doing fancy things in before triggers leads to
trouble, considering the hardening Kevin added in commit 6868ed74. In
short, use an AFTER trigger for this kind of thing, and all of these
issues go away.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative.

I proposed an alternative that avoids this surprise and might allow
some other benefits. Can you please look into that?

I looked at that. You're talking about making the case where before
row triggers clearly are appropriate (when you just want to change the
tuple being inserted) fail, in order to make an arguably inappropriate
case for before row triggers work (when you don't change the tuple to
be inserted, but you do want to do some other thing). That seems
backwards. My proposed behavior seems far less surprising.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Marti Raudsepp
marti@juffo.org
In reply to: Peter Geoghegan (#18)
Re: UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 9, 2014 at 2:46 AM, Peter Geoghegan <pg@heroku.com> wrote:

Indeed, the current behavior breaks even the canonical "keep track of
how many posts are in a thread" trigger example

use an AFTER trigger for this kind of thing, and all of these
issues go away.

In the latest patches from CommitFest, the UPDATE case invokes
BEFORE&AFTER INSERT triggers, not AFTER UPDATE. Is that going to be
changed? If so, I concede that.

I proposed an alternative that avoids this surprise and might allow
some other benefits. Can you please look into that?

I looked at that.

Thanks!

You're talking about making the case where before
row triggers clearly are appropriate (when you just want to change the
tuple being inserted) fail

Only in case the trigger changes *key* columns necessary for atomicity
(i.e. from the WITHIN index). Other columns are fair game. The
restriction seems justifiable to me: it's unreasonable to be atomic
with respect to values that change mid-way.

* It can distinguish between BEFORE INSERT and BEFORE UPDATE triggers,
which your approach fundamentally cannot.
* It can probably avoid evaluating column defaults in the UPDATE case,
like nextval() for unrelated serial columns => reduced overhead
* The semantics match better with MERGE-like syntax that seems to come
up again and again.
* And it appears to solve (part?) of the problem you had with naming
key *colums* in the WITHIN clause, rather than using index name.

If you don't see any reasons why it can't be done, these benefits seem
clear to me. I think the tradeoffs at least warrant wider discussion.

Regards,
Marti

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Marti Raudsepp (#19)
Re: UPSERT wiki page, and SQL MERGE syntax

On Wed, Oct 8, 2014 at 5:37 PM, Marti Raudsepp <marti@juffo.org> wrote:

Only in case the trigger changes *key* columns necessary for atomicity
(i.e. from the WITHIN index). Other columns are fair game. The
restriction seems justifiable to me: it's unreasonable to be atomic
with respect to values that change mid-way.

If you don't see any reasons why it can't be done, these benefits seem
clear to me. I think the tradeoffs at least warrant wider discussion.

I don't. That's very surprising. One day, it will fail unexpectedly.
As proposed, the way BEFORE INSERT triggers fire almost forces users
to consider the issues up-front.

Note that the CONFLICTING() behavior with respect to BEFORE INSERT
triggers work's the same as MySQL's "INSERT ON DUPLICATE KEY UPDATE
foo = VALUES(foo)" thing. There was agreement that that was the right
behavior, it seemed.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Marti Raudsepp
marti@juffo.org
In reply to: Peter Geoghegan (#20)
In reply to: Marti Raudsepp (#21)
#23Marti Raudsepp
marti@juffo.org
In reply to: Peter Geoghegan (#22)
#24Marti Raudsepp
marti@juffo.org
In reply to: Marti Raudsepp (#23)
In reply to: Marti Raudsepp (#24)
#26Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Geoghegan (#16)
#27Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Jim Nasby (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#15)
In reply to: Robert Haas (#28)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#28)
In reply to: Kevin Grittner (#30)
#32Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#29)
In reply to: Peter Geoghegan (#31)
In reply to: Kevin Grittner (#32)
#35Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#31)
#36Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#33)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#29)
#38Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#34)
In reply to: Kevin Grittner (#35)
In reply to: Kevin Grittner (#36)
In reply to: Robert Haas (#37)
#42Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#41)
#43Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Gavin Flower (#27)
In reply to: Jim Nasby (#43)
In reply to: Kevin Grittner (#42)
#46Matthew Woodcraft
matthew@woodcraft.me.uk
In reply to: Kevin Grittner (#32)
#47Marko Tiikkaja
marko@joh.to
In reply to: Matthew Woodcraft (#46)
#48Matthew Woodcraft
matthew@woodcraft.me.uk
In reply to: Marko Tiikkaja (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#41)
In reply to: Robert Haas (#49)
#51Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#50)
In reply to: Robert Haas (#51)
In reply to: Peter Geoghegan (#52)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#52)
In reply to: Robert Haas (#54)
#56Bruce Momjian
bruce@momjian.us
In reply to: Peter Geoghegan (#55)
In reply to: Bruce Momjian (#56)