MERGE SQL Statement for PG11
I'm working on re-submitting MERGE for PG11
Earlier thoughts on how this could/could not be done were sometimes
imprecise or inaccurate, so I have gone through the command per
SQL:2011 spec and produced a definitive spec in the form of an SGML
ref page. This is what I intend to deliver for PG11.
MERGE will use the same mechanisms as INSERT ON CONFLICT, so
concurrent behavior does not require further infrastructure changes,
just detailed work on the statement itself.
I'm building up the code from scratch based upon the spec, rather than
trying to thwack earlier attempts into shape. This looks more likely
to yield a commitable patch.
Major spanners or objections, please throw them in now cos I don't see any.
Questions?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
sql-merge.htmltext/html; charset=UTF-8; name=sql-merge.htmlDownload
Hey,
It looks quite nice. Personally I'd like to also have the returning
statement, and have the number of deleted and inserted rows as separate
numbers in the output message.
regards
Szymon Lipiński
pt., 27.10.2017, 10:56 użytkownik Simon Riggs <simon@2ndquadrant.com>
napisał:
Show quoted text
I'm working on re-submitting MERGE for PG11
Earlier thoughts on how this could/could not be done were sometimes
imprecise or inaccurate, so I have gone through the command per
SQL:2011 spec and produced a definitive spec in the form of an SGML
ref page. This is what I intend to deliver for PG11.MERGE will use the same mechanisms as INSERT ON CONFLICT, so
concurrent behavior does not require further infrastructure changes,
just detailed work on the statement itself.I'm building up the code from scratch based upon the spec, rather than
trying to thwack earlier attempts into shape. This looks more likely
to yield a commitable patch.Major spanners or objections, please throw them in now cos I don't see any.
Questions?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Simon Riggs wrote:
Earlier thoughts on how this could/could not be done were sometimes
imprecise or inaccurate, so I have gone through the command per
SQL:2011 spec and produced a definitive spec in the form of an SGML
ref page. This is what I intend to deliver for PG11.
Nice work. I didn't verify the SQL spec, just read your HTML page;
some very minor comments based on that:
* use "and" not "where" as initial words in "when_clause" and
"merge_update" clause definitions
* missing word here: "the DELETE privilege on the if you specify"
* I think the word "match." is leftover from some editing in the phrase
" that specifies which rows in the data_source match rows in the
target_table_name. match." In the same paragraph, it is not clear
whether all columns must be matched or it can be a partial match.
* In the when_clause note, it is not clear whether you can have multiple
WHEN MATCHED and WHEN NOT MATCHED clauses. Obviously you can have one
of each, but I think your doc says it is possible to have more than one of
each, with different conditions (WHEN MATCHED AND foo THEN bar WHEN
MATCHED AND baz THEN qux). No example shows more than one.
On the same point: Is there short-circuiting of such conditions, i.e.
will the execution will stop looking for further WHEN matches if some
rule matches, or will it rather check all rules and raise an error if
more than one WHEN rules match each given row?
* Your last example uses ELSE but that appears nowhere in the synopsys.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Questions?
I think one of the reasons why Peter Geoghegan decided to pursue
INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL
syntax, he felt free to mandate a non-standard SQL requirement, namely
the presence of a unique index on the arbiter columns. If MERGE's
join clause happens to involve equality conditions on precisely the
same set of columns as some unique index on the target table, then I
think you can reuse the INSERT .. ON CONFLICT UPDATE infrastructure
and I suspect there won't be too many problems. However, if it
doesn't, then what? You could decree that such cases will fail, but
that might not meet your use case for developing the feature. Or you
could try to soldier on without the INSERT .. ON CONFLICT UPDATE
machinery, but that means, I think, that sometimes you will get
serialization anomalies - at least, I think, you will sometimes obtain
results that couldn't have been obtained under any serial order of
execution, and maybe it would end up being possible to fail with
serialization errors or unique index violations.
In the past, there have been objections to implementations of MERGE
which would give rise to such serialization anomalies, but I'm not
sure we should feel bound by those discussions. One thing that's
different is that the common and actually-useful case can now be made
to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE;
if less useful cases are vulnerable to some weirdness, maybe it's OK
to just document the problems.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27 October 2017 at 15:24, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Oct 27, 2017 at 10:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Questions?
I think one of the reasons why Peter Geoghegan decided to pursue
INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL
syntax, he felt free to mandate a non-standard SQL requirement, namely
the presence of a unique index on the arbiter columns. If MERGE's
join clause happens to involve equality conditions on precisely the
same set of columns as some unique index on the target table, then I
think you can reuse the INSERT .. ON CONFLICT UPDATE infrastructure
and I suspect there won't be too many problems.
Agreed
However, if it
doesn't, then what? You could decree that such cases will fail, but
that might not meet your use case for developing the feature. Or you
could try to soldier on without the INSERT .. ON CONFLICT UPDATE
machinery, but that means, I think, that sometimes you will get
serialization anomalies - at least, I think, you will sometimes obtain
results that couldn't have been obtained under any serial order of
execution, and maybe it would end up being possible to fail with
serialization errors or unique index violations.In the past, there have been objections to implementations of MERGE
which would give rise to such serialization anomalies, but I'm not
sure we should feel bound by those discussions. One thing that's
different is that the common and actually-useful case can now be made
to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE;
if less useful cases are vulnerable to some weirdness, maybe it's OK
to just document the problems.
Good points.
I didn't say it but my intention was to just throw an ERROR if no
single unique index can be identified.
It could be possible to still run MERGE in that situaton but we would
need to take a full table lock at ShareRowExclusive. It's quite likely
that such statements would throw duplicate update errors, so I
wouldn't be aiming to do anything with that for PG11.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 7:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Good points.
I didn't say it but my intention was to just throw an ERROR if no
single unique index can be identified.
You'd also throw an error when there was no "upsert compatible" join
quals, I take it?
I don't see the point in that. That's just mapping one syntax on to another.
It could be possible to still run MERGE in that situaton but we would
need to take a full table lock at ShareRowExclusive. It's quite likely
that such statements would throw duplicate update errors, so I
wouldn't be aiming to do anything with that for PG11.
I would avoid mixing up ON CONFLICT DO UPDATE and MERGE. The
"anomalies" you describe in MERGE are not really anomalies IMV.
They're simply how the feature is supposed to operate, and how it's
possible to make MERGE use alternative join algorithms based only on
the underlying cost. You might use a merge join for a bulk load
use-case, for example.
I think an SQL MERGE feature would be compelling, but I don't think
that it should take much from ON CONFLICT. As I've said many times
[1]: /messages/by-id/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com -- Peter Geoghegan
similar to each, for example). I suggest that you come up with
something that has the semantics that the standard requires, and
therefore makes none of the ON CONFLICT guarantees about an outcome
under concurrency (INSERT or UPDATE). Those guarantees are basically
incompatible with how MERGE needs to work.
In case it matters, I think that the idea of varying relation
heavyweight lock strength based on subtle semantics within a DML
statement is a bad one. Frankly, I think that that's going to be a
nonstarter.
[1]: /messages/by-id/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 6:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I think one of the reasons why Peter Geoghegan decided to pursue
INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL
syntax, he felt free to mandate a non-standard SQL requirement, namely
the presence of a unique index on the arbiter columns.
That's true, but what I was really insistent on, more than anything
else, was that the user would get a practical guarantee about an
insert-or-update outcome under concurrency. There could be no
"unprincipled deadlocks", nor could there be spurious unique
violations.
This is the kind of thing that the SQL standard doesn't really concern
itself with, and yet it's of significant practical importance to
users. Both Oracle and SQL Server allow these things that I
specifically set out to avoid. I think that that's mostly a good
thing, though; they do a really bad job of explaining what's what, and
don't provide for a very real need ("upsert") in some other way, but
their MERGE semantics do make sense to me.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I didn't say it but my intention was to just throw an ERROR if no
single unique index can be identified.It could be possible to still run MERGE in that situaton but we would
need to take a full table lock at ShareRowExclusive. It's quite likely
that such statements would throw duplicate update errors, so I
wouldn't be aiming to do anything with that for PG11.
Like Peter, I think taking such a strong lock for a DML statement
doesn't sound like a very desirable way forward. It means, for
example, that you can only have one MERGE in progress on a table at
the same time, which is quite limiting. It could easily be the case
that you have multiple MERGE statements running at once but they touch
disjoint groups of rows and therefore everything works. I think the
code should be able to cope with concurrent changes, if nothing else
by throwing an ERROR, and then if the user wants to ensure that
doesn't happen by taking ShareRowExclusiveLock they can do that via an
explicit LOCK TABLE statement -- or else they can prevent concurrency
by any other means they see fit.
Other problems with taking ShareRowExclusiveLock include (1) probable
lock upgrade hazards and (2) do you really want MERGE to kick
autovacuum off of your giant table? Probably not.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Simon,
Nice writeup.
While the standard may not require a unique index for the ON clause I have
never seen a MERGE statement that did not have this property. So IMHO this
is a reasonable restrictions.
In fact I have only ever seen two flavors of usage:
* Single row source (most often simply a VALUES clause) in OLTP
In that case there was lots of concurrency
* Massive source which affects a significant portion of the target table in
DW.
In this case there were no concurrent MERGEs
I believe support for returning rows at a later stage would prove to be very
powerful, especially in combination with chaining MERGE statements in CTEs.
To do that would require language extensions to pass the coloring of the
source row through, especially for rows that fell into "do nothing".
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 2:13 PM, srielau <serge@rielau.com> wrote:
While the standard may not require a unique index for the ON clause I have
never seen a MERGE statement that did not have this property. So IMHO this
is a reasonable restrictions.
The Oracle docs on MERGE say nothing about unique indexes or
constraints. They don't even mention them in passing. They do say
"This statement is a convenient way to combine multiple operations. It
lets you avoid multiple INSERT, UPDATE, and DELETE DML statements."
SQL Server's MERGE docs do mention unique indexes, but only in
passing, saying something about unique violations, and that unique
violations *cannot* be suppressed in MERGE, even though that's
possible with other DML statements (with something called
IGNORE_DUP_KEY).
What other systems *do* have this restriction? I've never seen one that did.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.8.79&pv=10.12.6&source=email_footer_2]
On Fri, Oct 27, 2017 at 2:42 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Oct 27, 2017 at 2:13 PM, srielau <serge@rielau.com> wrote:
While the standard may not require a unique index for the ON clause I have
never seen a MERGE statement that did not have this property. So IMHO this
is a reasonable restrictions.
The Oracle docs on MERGE say nothing about unique indexes or
constraints. They don't even mention them in passing. They do say
"This statement is a convenient way to combine multiple operations. It
lets you avoid multiple INSERT, UPDATE, and DELETE DML statements."
SQL Server's MERGE docs do mention unique indexes, but only in
passing, saying something about unique violations, and that unique
violations *cannot* be suppressed in MERGE, even though that's
possible with other DML statements (with something called
IGNORE_DUP_KEY).
What other systems *do* have this restriction? I've never seen one that did. Not clear what you are leading up to here. When I did MERGE in DB2 there was also no limitation: " Each row in the target can only be operated on once. A row in the target can only be identified as MATCHED with one row in the result table of the table-reference” What there was however was a significant amount of code I had to write and test to enforce the above second sentence. IIRC it involved, in the absence of a proof that the join could not expand, adding a row_number() over() AS rn over the target leg of the join and then a row_number() over(partition by rn) > 1 THEN RAISE_ERROR() to catch violators. Maybe in PG there is a trivial way to detect an expanding join and block it at runtime.
So the whole point I’m trying to make is that I haven’t seen the need for the extra work I had to do once the feature appeared in the wild.
Cheers Serge
On Fri, Oct 27, 2017 at 9:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I didn't say it but my intention was to just throw an ERROR if no
single unique index can be identified.It could be possible to still run MERGE in that situaton but we would
need to take a full table lock at ShareRowExclusive. It's quite likely
that such statements would throw duplicate update errors, so I
wouldn't be aiming to do anything with that for PG11.Like Peter, I think taking such a strong lock for a DML statement
doesn't sound like a very desirable way forward. It means, for
example, that you can only have one MERGE in progress on a table at
the same time, which is quite limiting. It could easily be the case
that you have multiple MERGE statements running at once but they touch
disjoint groups of rows and therefore everything works. I think the
code should be able to cope with concurrent changes, if nothing else
by throwing an ERROR, and then if the user wants to ensure that
doesn't happen by taking ShareRowExclusiveLock they can do that via an
explicit LOCK TABLE statement -- or else they can prevent concurrency
by any other means they see fit.
+1, I would suspect users to run this query in parallel of the same
table for multiple data sets.
Peter has taken some time to explain me a bit his arguments today, and
I agree that it does not sound much appealing to have constraint
limitations for MERGE. Particularly using the existing ON CONFLICT
structure gets the feeling of having twice a grammar for what's
basically the same feature, with pretty much the same restrictions.
By the way, this page sums up nicely the situation about many
implementations of UPSERT taken for all systems:
https://en.wikipedia.org/wiki/Merge_(SQL)
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 02:13:27PM -0700, srielau wrote:
While the standard may not require a unique index for the ON clause I have
never seen a MERGE statement that did not have this property. So IMHO this
is a reasonable restrictions.
I don't understand how one could have a conflict upon which to turn
INSERT into UPDATE without having a UNIQUE constraint violated...
The only question is whether one should have control over -or have to
specify- which constraint violations lead to UPDATE vs. which ones lead
to failure vs. which ones lead to doing nothing.
The row to update is the one that the to-be-inserted row conflicted with
-- there can only have been one if the constraint violated was a PRIMARY
KEY constraint, or if there is a PRIMARY KEY at all, but if there's no
PRIMARY KEY, then there can have been more conflicting rows because of
NULL columns in the to-be-inserted row. If the to-be-inserted row
conflicts with multiple rows, then just fail, or don't allow MERGE on
tables that have no PK (as you know, many think it makes no sense to not
have a PK on a table in SQL).
In the common case one does not care about which UNIQUE constraint is
violated because there's only one that could have been violated, or
because if the UPDATE should itself cause some other UNIQUE constraint
to be violated, then the whole statement should fail.
PG's UPSERT is fantastic -- it allows very fine-grained control, but it
isn't as pithy as it could be when the author doesn't care to specify
all that detail.
Also, something like SQLite3's INSERT OR REPLACE is very convenient:
pithy, INSERT syntax, upsert-like semantics[*].
I'd like to have this in PG:
INSERT INTO ..
ON CONFLICT DO UPDATE; -- I.e., update all columns of the existing
-- row to match the ones from the row that
-- would have been inserted had there not been
-- a conflict.
--
-- If an INSERTed row conflicts and then the
-- UPDATE it devolves to also conflicts, then
-- fail.
and
INSERT INTO ..
ON CONFLICT DO UPDATE -- I.e., update all columns of the existing
-- row to match the ones from the row that
-- would have been inserted had there not been
-- a conflict.
--
ON CONFLICT DO NOTHING; -- If an INSERTed row conflicts and then the
-- UPDATE it devolves to also conflicts, then
-- DO NOTHING.
[*] SQLite3's INSERT OR REPLACE is NOT an insert-or-update, but an
insert-or-delete-and-insert, and any deletions that occur in the
process do fire triggers. INSERT OR UPDATE would be much more
useful.
Nico
--
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28 October 2017 at 00:31, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Oct 27, 2017 at 9:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I didn't say it but my intention was to just throw an ERROR if no
single unique index can be identified.It could be possible to still run MERGE in that situaton but we would
need to take a full table lock at ShareRowExclusive. It's quite likely
that such statements would throw duplicate update errors, so I
wouldn't be aiming to do anything with that for PG11.Like Peter, I think taking such a strong lock for a DML statement
doesn't sound like a very desirable way forward. It means, for
example, that you can only have one MERGE in progress on a table at
the same time, which is quite limiting. It could easily be the case
that you have multiple MERGE statements running at once but they touch
disjoint groups of rows and therefore everything works. I think the
code should be able to cope with concurrent changes, if nothing else
by throwing an ERROR, and then if the user wants to ensure that
doesn't happen by taking ShareRowExclusiveLock they can do that via an
explicit LOCK TABLE statement -- or else they can prevent concurrency
by any other means they see fit.+1, I would suspect users to run this query in parallel of the same
table for multiple data sets.Peter has taken some time to explain me a bit his arguments today, and
I agree that it does not sound much appealing to have constraint
limitations for MERGE. Particularly using the existing ON CONFLICT
structure gets the feeling of having twice a grammar for what's
basically the same feature, with pretty much the same restrictions.By the way, this page sums up nicely the situation about many
implementations of UPSERT taken for all systems:
https://en.wikipedia.org/wiki/Merge_(SQL)
That Wikipedia article is badly out of date and regrettably does NOT
sum up the current situation nicely any more since MERGE has changed
in definition in SQL:2011 since its introduction in SQL:2003.
I'm proposing a MERGE statement for PG11 that
i) takes a RowExclusiveLock on rows, so can be run concurrently
ii) uses the ON CONFLICT infrastructure to do that
and so requires a unique constraint.
The above is useful behaviour that will be of great benefit to
PostgreSQL users. There are no anomalies remaining.
SQL:2011 specifically states "The extent to which an
SQL-implementation may disallow independent changes that are not
significant is implementation-defined”, so in my reading the above
behaviour would make us fully spec compliant. Thank you to Peter for
providing the infrastructure on which this is now possible for PG11.
Serge puts this very nicely by identifying two different use cases for MERGE.
Now, I accept that you might also want a MERGE statement that
continues to work even if there is no unique constraint, but it would
need to have different properties to the above. I do not in any way
argue against adding that. I also agree that adding RETURNING at a
later stage would be fine as well. I am proposing that those and any
other additional properties people come up with can be added in later
releases once we have the main functionality in core in PG11.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 28, 2017 at 3:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
SQL:2011 specifically states "The extent to which an
SQL-implementation may disallow independent changes that are not
significant is implementation-defined”, so in my reading the above
behaviour would make us fully spec compliant. Thank you to Peter for
providing the infrastructure on which this is now possible for PG11.Serge puts this very nicely by identifying two different use cases for MERGE.
MERGE benefits from having a join that is more or less implemented in
the same way as any other join. It can be a merge join, hash join, or
nestloop join. ON CONFLICT doesn't work using a join.
Should I to take it that you won't be supporting any of these
alternative join algorithms? If not, then you'll have something that
really isn't comparable to MERGE as implemented in Oracle, SQL Server,
or DB2. They *all* do this.
Would the user be able to omit WHEN NOT MATCHED/INSERT, as is the case
with every existing MERGE implementation? If so, what actually happens
under the hood when WHEN NOT MATCHED is omitted? For example, would
you actually use a regular "UPDATE FROM" style join, as opposed to the
ON CONFLICT infrastructure? And, if that is so, can you justify the
semantic difference for rows that are updated in each scenario
(omitted vs. not omitted) in READ COMMITTED mode? Note that this could
be the difference between updating a row when *no* version is visible
to our MVCC snapshot, as opposed to doing the EPQ stuff and updating
the latest row version if possible. That's a huge, surprising
difference. On top of all this, you risk live-lock if INSERT isn't a
possible outcome (this is also why ON CONFLICT can never accept a
predicate on its INSERT portion -- again, quite unlike MERGE).
Why not just follow what other systems do? It's actually easier to go
that way, and you get a better outcome. ON CONFLICT involves what you
could call a sleight of hand, and I fear that you don't appreciate
just how specialized the internal infrastructure is.
Now, I accept that you might also want a MERGE statement that
continues to work even if there is no unique constraint, but it would
need to have different properties to the above. I do not in any way
argue against adding that.
Maybe you *should* be arguing against it, though, and arguing against
ever supporting anything but equijoins, because these things will
*become* impossible if you go down that road. By starting with the ON
CONFLICT infrastructure, while framing no-unique-index-support as work
for some unspecified future release, you're leaving it up to someone
else to resolve the problems. Someone else must square the circle of
mixing ON CONFLICT semantics with fully generalized MERGE semantics.
But who?
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28 October 2017 at 20:39, Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Oct 28, 2017 at 3:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
SQL:2011 specifically states "The extent to which an
SQL-implementation may disallow independent changes that are not
significant is implementation-defined”, so in my reading the above
behaviour would make us fully spec compliant. Thank you to Peter for
providing the infrastructure on which this is now possible for PG11.Serge puts this very nicely by identifying two different use cases for MERGE.
MERGE benefits from having a join that is more or less implemented in
the same way as any other join. It can be a merge join, hash join, or
nestloop join. ON CONFLICT doesn't work using a join.Should I to take it that you won't be supporting any of these
alternative join algorithms? If not, then you'll have something that
really isn't comparable to MERGE as implemented in Oracle, SQL Server,
or DB2. They *all* do this.Would the user be able to omit WHEN NOT MATCHED/INSERT, as is the case
with every existing MERGE implementation? If so, what actually happens
under the hood when WHEN NOT MATCHED is omitted? For example, would
you actually use a regular "UPDATE FROM" style join, as opposed to the
ON CONFLICT infrastructure? And, if that is so, can you justify the
semantic difference for rows that are updated in each scenario
(omitted vs. not omitted) in READ COMMITTED mode? Note that this could
be the difference between updating a row when *no* version is visible
to our MVCC snapshot, as opposed to doing the EPQ stuff and updating
the latest row version if possible. That's a huge, surprising
difference. On top of all this, you risk live-lock if INSERT isn't a
possible outcome (this is also why ON CONFLICT can never accept a
predicate on its INSERT portion -- again, quite unlike MERGE).Why not just follow what other systems do? It's actually easier to go
that way, and you get a better outcome. ON CONFLICT involves what you
could call a sleight of hand, and I fear that you don't appreciate
just how specialized the internal infrastructure is.Now, I accept that you might also want a MERGE statement that
continues to work even if there is no unique constraint, but it would
need to have different properties to the above. I do not in any way
argue against adding that.Maybe you *should* be arguing against it, though, and arguing against
ever supporting anything but equijoins, because these things will
*become* impossible if you go down that road. By starting with the ON
CONFLICT infrastructure, while framing no-unique-index-support as work
for some unspecified future release, you're leaving it up to someone
else to resolve the problems. Someone else must square the circle of
mixing ON CONFLICT semantics with fully generalized MERGE semantics.
But who?
Nothing I am proposing blocks later work.
Everything you say makes it clear that a fully generalized solution is
going to be many years in the making, assuming we agree.
"The extent to which an SQL-implementation may disallow independent
changes that are not significant is implementation-defined”.
So we get to choose. I recommend that we choose something practical.
We're approaching the 10 year anniversary of my first serious attempt
to do MERGE. I say that its time to move forwards with useful
solutions, rather than wait another 10 years for the perfect one, even
assuming it exists.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 3:00 PM, Serge Rielau <serge@rielau.com> wrote:
What other systems *do* have this restriction? I've never seen one that did.
Not clear what you are leading up to here.
When I did MERGE in DB2 there was also no limitation:
"Each row in the target can only be operated on once. A row in the target can only be identified as MATCHED with one row in the result table of the table-reference”
What there was however was a significant amount of code I had to write and test to enforce the above second sentence.
Then it seems that we were talking about two different things all along.
Maybe in PG there is a trivial way to detect an expanding join and block it at runtime.
There is for ON CONFLICT. See the cardinality violation logic within
ExecOnConflictUpdate(). (There are esoteric cases where this error can
be raised due to a wCTE that does an insert "from afar", which is
theoretically undesirable but not actually a problem.)
The MERGE implementation that I have in mind would probably do almost
the same thing, and make the "HeapTupleSelfUpdated" case within
ExecUpdate() raise an error when the caller happened to be a MERGE,
rather than following the historic UPDATE behavior. (The behavior is
to silently suppress a second or subsequent UPDATE attempt from the
same command, a behavior that Simon's mock MERGE documentation
references.)
So the whole point I’m trying to make is that I haven’t seen the need for the extra work I had to do once the feature appeared in the wild.
That seems pretty reasonable to me.
My whole point is that I think it's a mistake to do things like lock
rows ahead of evaluating any UPDATE predicate, in the style of ON
CONFLICT, in order to replicate the ON CONFLICT guarantees [1]https://wiki.postgresql.org/wiki/UPSERT#Goals_for_implementation.
I'm arguing for implementation simplicity, too. Trying to implement
MERGE in a way that extends ON CONFLICT seems like a big mistake to
me, because ON CONFLICT updates rows on the basis of a would-be
duplicate violation, along with all the baggage that that carries.
This is actually enormously different to an equi-join that is fed by a
scan using an MVCC snapshot. The main difference is that there
actually is no MVCC snapshot in play in most cases [2]https://www.postgresql.org/docs/devel/static/transaction-iso.html#xact-read-committed -- Peter Geoghegan. If *no* row
with the PK value of 5 is visible to our MVCC snapshot, but an xact
committed having inserted such a row, that still counts as a CONFLICT
with READ COMMITTED.
[1]: https://wiki.postgresql.org/wiki/UPSERT#Goals_for_implementation
[2]: https://www.postgresql.org/docs/devel/static/transaction-iso.html#xact-read-committed -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 28, 2017 at 12:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Nothing I am proposing blocks later work.
Actually, many things will block future work if you go down that road.
You didn't respond to the specific points I raised, but that doesn't
mean that they're not real.
Everything you say makes it clear that a fully generalized solution is
going to be many years in the making, assuming we agree.
I think that it's formally impossible as long as you preserve the ON
CONFLICT guarantees, unless you somehow define the problems out of
existence. Those are guarantees which no other MERGE implementation
has ever made, and which the SQL standard says nothing about. And for
good reasons.
"The extent to which an SQL-implementation may disallow independent
changes that are not significant is implementation-defined”.So we get to choose. I recommend that we choose something practical.
We're approaching the 10 year anniversary of my first serious attempt
to do MERGE. I say that its time to move forwards with useful
solutions, rather than wait another 10 years for the perfect one, even
assuming it exists.
As far as I'm concerned, you're the one arguing for an unobtainable
solution over a good one, not me. I *don't* think you should solve the
problems that I raise -- you should instead implement MERGE without
any of the ON CONFLICT guarantees, just like everyone else has.
Building MERGE on top of the ON CONFLICT guarantees, and ultimately
arriving at something that is comparable to other implementations over
many releases might be okay if anyone had the slightest idea of what
that would look like. You haven't even _described the semantics_,
which you could do by addressing the specific points that I raised.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28 October 2017 at 22:04, Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Oct 28, 2017 at 12:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Nothing I am proposing blocks later work.
Actually, many things will block future work if you go down that road.
You didn't respond to the specific points I raised, but that doesn't
mean that they're not real.Everything you say makes it clear that a fully generalized solution is
going to be many years in the making, assuming we agree.I think that it's formally impossible as long as you preserve the ON
CONFLICT guarantees, unless you somehow define the problems out of
existence. Those are guarantees which no other MERGE implementation
has ever made, and which the SQL standard says nothing about. And for
good reasons."The extent to which an SQL-implementation may disallow independent
changes that are not significant is implementation-defined”.So we get to choose. I recommend that we choose something practical.
We're approaching the 10 year anniversary of my first serious attempt
to do MERGE. I say that its time to move forwards with useful
solutions, rather than wait another 10 years for the perfect one, even
assuming it exists.As far as I'm concerned, you're the one arguing for an unobtainable
solution over a good one, not me. I *don't* think you should solve the
problems that I raise -- you should instead implement MERGE without
any of the ON CONFLICT guarantees, just like everyone else has.
Building MERGE on top of the ON CONFLICT guarantees, and ultimately
arriving at something that is comparable to other implementations over
many releases might be okay if anyone had the slightest idea of what
that would look like. You haven't even _described the semantics_,
which you could do by addressing the specific points that I raised.
I have no objection to you writing a better version than me and if my
work inspires you to complete that in a reasonable timescale then we
all win. I'm also very happy to work together on writing the version
described - you have already done much work in this area.
I don't see any major problems in points you've raised so far, though
obviously there is much detail.
All of this needs to be written and then committed, so I'll get on and
write my proposal. We can then see whether that is an 80% solution or
something less. There are more obvious barriers to completion at this
point, like time and getting on with it.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Oct 29, 2017 at 4:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
I have no objection to you writing a better version than me and if my
work inspires you to complete that in a reasonable timescale then we
all win.
My whole point is that the way that you seem determined to go on this
is a dead end. I don't think that *anyone* can go improve on what you
come up with if that's based heavily on ON CONFLICT, for the simple
reason that the final user visible design is totally unclear. There is
an easy way to make me shut up - come up with a design for MERGE that
more or less builds on how UPDATE FROM works, rather than building
MERGE on ON CONFLICT. (You might base things like RLS handling on ON
CONFLICT, but in the main MERGE should be like an UPDATE FROM with an
outer join, that can do INSERTs and DELETEs, too.)
The original effort to add MERGE didn't do anything upsert-like, which
Heikki (the GSOC mentor of the project) was perfectly comfortable
with. I'm too lazy to go search the archives right now, but it's
there. Heikki cites the SQL standard.
This is what MERGE *actually is*, which you can clearly see from the
Oracle/SQL Server/DB2 docs. It says this in the first paragraph of
their MERGE documentation. It's crystal clear from their docs -- "This
statement is a convenient way to combine multiple operations. It lets
you avoid multiple INSERT, UPDATE, and DELETE DML statements."
I'm also very happy to work together on writing the version
described - you have already done much work in this area.
You seem to want to preserve the ON CONFLICT guarantees at great cost.
But you haven't even defended that based on a high level goal, or a
use case, or something that makes sense to users (describing how it is
possible is another matter). You haven't even tried to convince me.
I don't see any major problems in points you've raised so far, though
obviously there is much detail.
Did you even read them? They are not mere details. They're fundamental
to the semantics of the feature (if you base it on ON CONFLICT). It's
not actually important that you understand them all; the important
message is that generalizing ON CONFLICT has all kinds of terrible
problems.
All of this needs to be written and then committed, so I'll get on and
write my proposal. We can then see whether that is an 80% solution or
something less. There are more obvious barriers to completion at this
point, like time and getting on with it.
Getting on with *what*, exactly?
In general, I have nothing against an 80% solution, or even a 50%
solution, provided there is a plausible path to a 100% solution. I
don't think that you have such a path, but only because you're tacitly
inserting requirements that no other MERGE implementation has to live
with, that I doubt any implementation *could* live with. Again, I'm
not the one making this complicated, or adding requirements that will
be difficult for you to get in to your v1 -- you're the one doing
that.
The semantics that I suggest (the SQL standard's semantics) will
require less code, and will be far simpler. Right now, I simply don't
understand why you're insisting on using ON CONFLICT without even
saying why. I can only surmise that you think that doing so will
simplify the implementation, but I can guarantee you that it won't.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers