Making joins involving ctid work for the benefit of UPSERT
I'm working on UPSERT again. I think that in order to make useful
progress, I'll have to find a better way of overcoming the visibility
issues (i.e. the problem of what to do about a
still-in-progress-to-our-snapshot row being locked at READ COMMITTED
isolation level [1]/messages/by-id/AANLkTineR-rDFWENeddLg=GrkT+epMHk2j9X0YqpiTY8@mail.gmail.com[2]http://www.pgcon.org/2014/schedule/attachments/327_upsert_weird.pdf , slide 25 -- Peter Geoghegan).
I've made some tentative additions to my earlier patch to change the syntax:
postgres=# explain analyze
with c as (
insert into ints(key, val) values (1, 'a'), (2, 'b')
on conflict
select * for update)
update ints set val = c.val from c conflicts;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Update on ints (cost=0.03..62.34 rows=2 width=104) (actual
time=0.095..0.095 rows=0 loops=1)
CTE c
-> Insert on ints ints_1 (cost=0.00..0.03 rows=2 width=36)
(actual time=0.039..0.049 rows=2 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=36) (actual time=0.001..0.002 rows=2 loops=1)
-> Nested Loop (cost=0.00..62.31 rows=2 width=104) (actual
time=0.063..0.080 rows=2 loops=1)
Join Filter: (ints.ctid = c.ctid)
Rows Removed by Join Filter: 6
-> CTE Scan on c (cost=0.00..0.04 rows=2 width=100) (actual
time=0.044..0.055 rows=2 loops=1)
-> Materialize (cost=0.00..28.45 rows=1230 width=10)
(actual time=0.005..0.006 rows=4 loops=2)
-> Seq Scan on ints (cost=0.00..22.30 rows=1230
width=10) (actual time=0.009..0.009 rows=4 loops=1)
Planning time: 0.132 ms
Execution time: 0.159 ms
(12 rows)
This works, as far as it goes. Parse analysis adds the ctid join to
the top-level UPDATE based on the fact that a magical UPDATE statement
(magical due to having a CONFLICTS clause) references a CTE in its
FromList (this is mandated by CONFLICTS during parse analysis), a CTE
containing an INSERT ON CONFLICT, and itself projecting a magical set
of rejected rows sufficient to get a locked c.ctid implicitly. (FWIW,
DELETEs may also accept a CONFLICTS clause and be used like this).
The optimizer does not support joins involving ctid, which is why
there is a seqscan plan - even setting "enable_seqscan = off" does not
alter the plan right now.
Apart from the obvious fact that we're doing an unnecessary Seq Scan
on the target table, which is unacceptable on performance grounds
(since we do in fact already know exactly what list of tids to pass to
the top-level ModifyTable node to UPDATE), there is another problem:
Since I cannot use a tid scan, I cannot play tricks with visibility
within tidscans for the benefit of solving the basic problem of doing
the right thing for READ COMMITTED. Clearly I need to add a new "MVCC
violation" in the spirit of EvalPlanQual() to avoid this problem
(since READ COMMITTED serialization failures are unacceptable to
users), but I'd like this new MVCC violation to be as selective as
possible.
Let me take a step back, though. Presently UPDATE WHERE CURRENT OF
does something that isn't a million miles from what I have in mind to
address the problem. That feature exists for "sensitive cursors",
where we lock rows (since FOR UPDATE was specified with the DECLARE
CURSOR statement) as they're fetched from the cursor. The upshot is
that in general, rows can change "out from under us" since they're not
yet locked, including in ways that might result in a new row version
not satisfying the original SELECT FOR UPDATE query predicate -
clearly users don't want to have to deal with that when they
subsequently go to UPDATE WHERE CURRENT OF. Special handling is
required. Purely because UPDATE WHERE CURRENT OF was specified, as
part of this special handling that update must use a tid scan. There
is a little bit of special machinery within the optimizer (within
cost_tidscan()) to force this for the isCurrentOf case.
The tid scan code within the executor also has special UPDATE WHERE
CURRENT OF handling. TidListCreate() specially handles the case by
asking execCurrentOf() to look up the cursor's portal, and to get a
tid from its QueryDesc for the current cursor position. TidNext()
specially fetches the most recent version visible to our estate's
Snapshot when we UPDATE WHERE CURRENT OF, without considering the
original SELECT FOR UPDATE predicate (and whether or not it would in
any sense continue to be satisfied). Each tid is then fetched from the
heap and returned. I think I ought to be able to do something similar
with this new CONFLICTS clause, by similarly marshaling some list of
tids ultimately originating from the ModifyTable node that locked a
potentially-not-yet-visible row (or maybe the optimizer recognizes the
internal special case and creates the tid scan node "visibility
credulous"). I'd then return heap tuples by using a dirty snapshot.
Since this can only happen for the UPDATE's tid scan, and there must
be a tid scan, it should be correct (on its own fairly reasonable
terms, for READ COMMITTED). After all, at this juncture for the
purposes of UPSERT it will already certainly be the case that the
tuple is locked.
I would like to:
* Get feedback on the hazards of taking this direction, and indeed the
advisability of the whole approach in general.
For example, I think we'd always want to use a DirtySnapshot, and
never try to just get the latest version visible to our esate
snapshot, but I might be wrong about that. Since my magical UPDATE
could have additional boolean conditions in its predicate (that is,
additions to the implicit ctid join, in contrast to the existing
magical WHERE CURRENT OF variant of UPDATE where that's always the
only thing in the predicate), the "morality" of *which* tid should get
to the UPDATE should certainly be considered. Of particular concern is
the fact that having Postgres do one or the other could make the
difference between an UPDATE qual being satisfied or not being
satisfied. Maybe this is an argument against being able to tack on
additional boolean conditions in the following fashion (which I'd
prefer to support without also supporting more that one join (the
magical implicit one), even though as I mentioned WHERE CURRENT OF
does not support boolean conditions):
with c as (
insert into ints(key, val) values (1, 'a'), (2, 'b')
on conflict
select * for update)
update ints set val = c.val from c conflicts where c.key != 1;
* Get some direction on what it would take to make joins involving
ctid work, since all of this hinges upon that infrastructure becoming
available. This appears to be a Simple Matter of Programming (at least
for someone that happens to already have a good understanding of the
optimizer), and is anticipated by this comment within tidpath.c:
* There is currently no special support for joins involving CTID; in
* particular nothing corresponding to best_inner_indexscan(). Since it's
* not very useful to store TIDs of one table in another table, there
* doesn't seem to be enough use-case to justify adding a lot of code
* for that.
[1]: /messages/by-id/AANLkTineR-rDFWENeddLg=GrkT+epMHk2j9X0YqpiTY8@mail.gmail.com
[2]: http://www.pgcon.org/2014/schedule/attachments/327_upsert_weird.pdf , slide 25 -- Peter Geoghegan
, slide 25
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 17, 2014 at 6:18 PM, Peter Geoghegan <pg@heroku.com> wrote:
This appears to be a Simple Matter of Programming (at least
for someone that happens to already have a good understanding of the
optimizer), and is anticipated by this comment within tidpath.c:* There is currently no special support for joins involving CTID; in
* particular nothing corresponding to best_inner_indexscan(). Since it's
* not very useful to store TIDs of one table in another table, there
* doesn't seem to be enough use-case to justify adding a lot of code
* for that.
By the way, this comment is obsolete since best_inner_indexscan() was
removed in 2012 by the parameterized paths patch.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2014-07-17 18:18:41 -0700, Peter Geoghegan wrote:
I'm working on UPSERT again. I think that in order to make useful
progress, I'll have to find a better way of overcoming the visibility
issues (i.e. the problem of what to do about a
still-in-progress-to-our-snapshot row being locked at READ COMMITTED
isolation level [1][2]).
Agreed.
I've made some tentative additions to my earlier patch to change the syntax:
postgres=# explain analyze
with c as (
insert into ints(key, val) values (1, 'a'), (2, 'b')
on conflict
select * for update)
update ints set val = c.val from c conflicts;
I still don't agree that this is a sane syntax for upsert. Avoiding it
would reduce the scope of the problems you have measureably. We should
provide a syntax that does the UPSERT itself, instead of delegating that
to the user as you're proposing above. Afaics nearly none of the
problems you're debating in your email would exist if upsert were
entirely done internally.
I think the things that use "wierd" visibility semantics are pretty much
all doing that internally (things being EvalPlanQual stuff for
INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
see sufficient reason why we should break away from that here. Yes,
there's stuff that cannot be done when it's done internally, but we can
live with those not being possible.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 18, 2014 at 10:46 AM, Andres Freund <andres@2ndquadrant.com> wrote:
I think the things that use "wierd" visibility semantics are pretty much
all doing that internally (things being EvalPlanQual stuff for
INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
see sufficient reason why we should break away from that here. Yes,
there's stuff that cannot be done when it's done internally, but we can
live with those not being possible.
The whole point of what I was proposing was that those semantics would
only apply to a special tid scan node. Perhaps I missed something, but
I'm not sure why you'd consider that I was breaking away from that
here at all.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-07-18 10:53:36 -0700, Peter Geoghegan wrote:
On Fri, Jul 18, 2014 at 10:46 AM, Andres Freund <andres@2ndquadrant.com> wrote:
I think the things that use "wierd" visibility semantics are pretty much
all doing that internally (things being EvalPlanQual stuff for
INSERT/UPDATE/DELETE and the referential integrity triggers). I don't
see sufficient reason why we should break away from that here. Yes,
there's stuff that cannot be done when it's done internally, but we can
live with those not being possible.The whole point of what I was proposing was that those semantics would
only apply to a special tid scan node. Perhaps I missed something, but
I'm not sure why you'd consider that I was breaking away from that
here at all.
I don't see why you'd need such a node at all if we had a fully builtin
UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
barely anybody will understand, let alone use correctly in queries they
write themselves.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 18, 2014 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
I don't see why you'd need such a node at all if we had a fully builtin
UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
barely anybody will understand, let alone use correctly in queries they
write themselves.
I accept that there will be a need for certain restrictions. Most
obviously, if you update the target table referencing a CTE like this,
not using the special CONFLICTS clause in the UPDATE (or DELETE) is an
error. And as I mentioned, you may only join the projected duplicates
to the UPDATE ModifyTable - an attempt to join any more relations is
an error. In short, this *is* a fully built-in upsert.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-07-18 11:14:34 -0700, Peter Geoghegan wrote:
On Fri, Jul 18, 2014 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
I don't see why you'd need such a node at all if we had a fully builtin
UPSERT. The whole stuff with ON CONFLICT SELECT FOR UPDATE and then
UPDATE ... FROM c CONFLICTS is too complicated and exposes stuff that
barely anybody will understand, let alone use correctly in queries they
write themselves.I accept that there will be a need for certain restrictions. Most
obviously, if you update the target table referencing a CTE like this,
not using the special CONFLICTS clause in the UPDATE (or DELETE) is an
error. And as I mentioned, you may only join the projected duplicates
to the UPDATE ModifyTable - an attempt to join any more relations is
an error. In short, this *is* a fully built-in upsert.
Meh. A understandable syntax wouldn't require the pullups with a special
scan node and such. I think you're attempting a sort of genericity
that's making your (important!) goal much harder to reach.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Meh. A understandable syntax wouldn't require the pullups with a special
scan node and such. I think you're attempting a sort of genericity
that's making your (important!) goal much harder to reach.
Well, maybe. If the genericity of this syntax isn't what people want,
I may go with something else. At the suggestion of Robert I did start
a dedicated thread on that question months back (explicitly putting
aside the nitty-gritty details of the implementation), but that didn't
go anywhere.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 18, 2014 at 11:32 AM, Peter Geoghegan <pg@heroku.com> wrote:
Well, maybe. If the genericity of this syntax isn't what people want,
I may go with something else.
By the way, I didn't mention that there is now also the optional
ability to specify a "merge on" unique index directly in DML. It would
be much nicer to specify a sort key instead, but that might be tricky
in the general case. I imagine that other systems solve the problem by
being very restrictive in what will be (implicitly) accepted as a
merge-on index. Seemingly there are problems with all major SQL MERGE
implementations, so I don't necessarily expect to be able to draw
lessons from them in any way here.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Meh. A understandable syntax wouldn't require the pullups with a special
scan node and such.
Well, in general ExecModifyTable()/ExecUpdate() trusts the tid passed
to match the qual in the query. Unless you're willing to give up on
the idea of having a qual on the update (other than something like an
ON CONFLICTS qual, obviously) I think you'll probably end up with some
kind of pseudo-scan node anyway, if only for consistency with how
things already work, to give you somewhere to show the Filter in
explain output and so on. ExecScan() probably needs to ExecQual().
Inserts don't have scan nodes, but updates do, and so it seems pretty
odd to make the "Insert" node (a ModifyTable node) pullup from a
result node (as always), and the "Update" node (also a ModifyTable
node) treat the first ModifyTable node as its own pseudo-scan node,
when in fact we are scanning the heap in a manner not unlike a
conventional update. Or do you envision a second result node where an
update qual might be evaluated? I am not enthused about either
possibility.
The idea of not having the ability to put a predicate on the update at
all, much like the MySQL thing isn't completely outrageous, but it
certainly isn't going to go down well those that have already
expressed concerns about how much of a foot gun this could be.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jul 19, 2014 at 10:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Meh. A understandable syntax wouldn't require the pullups with a special
scan node and such.Well, in general ExecModifyTable()/ExecUpdate() trusts the tid passed
to match the qual in the query. Unless you're willing to give up on
the idea of having a qual on the update (other than something like an
ON CONFLICTS qual, obviously) I think you'll probably end up with some
kind of pseudo-scan node anyway, if only for consistency with how
things already work, to give you somewhere to show the Filter in
explain output and so on. ExecScan() probably needs to ExecQual().
Inserts don't have scan nodes, but updates do, and so it seems pretty
odd to make the "Insert" node (a ModifyTable node) pullup from a
result node (as always), and the "Update" node (also a ModifyTable
node) treat the first ModifyTable node as its own pseudo-scan node,
when in fact we are scanning the heap in a manner not unlike a
conventional update. Or do you envision a second result node where an
update qual might be evaluated? I am not enthused about either
possibility.The idea of not having the ability to put a predicate on the update at
all, much like the MySQL thing isn't completely outrageous, but it
certainly isn't going to go down well those that have already
expressed concerns about how much of a foot gun this could be.
This all seems completely to one side of Andres's point. I think what
he's saying is: don't implement an SQL syntax of the form INSERT ON
CONFLICT and let people use that to implement upsert. Instead,
directly implement an SQL command called UPSERT or similar. That's
long been my intuition as well. I think generality here is not your
friend.
I'd suggest something like:
UPSERT table SET col = value [, ...] WHERE predicate;
with semantics like this:
1. Search the table, using any type of scan you like, for a row
matching the given predicate.
2. If you find more than one tuple that is visible to your scan, error.
3. If you find exactly one tuple that is visible to your scan, update it. Stop.
4. If you find no tuple that is visible to your scan, but you do find
at least one tuple whose xmin has committed and whose xmax has not
committed, then (4a) if the isolation level is REPEATABLE READ or
higher, error; (4b) if there is more than one such tuple, error; else
(4b) update it, in violation of normal MVCC visibility rules.
5. Construct a new tuple using the col/value pairs from the SET clause
and try to insert it. If this would fail with a unique index
violation, back out and go back to step 1.
Having said all that, I believe the INSERT ON CONFLICT syntax is more
easily comprehensible than previous proposals. But I still tend to
agree with Andres that an explicit UPSERT syntax or something like it,
that captures all of the MVCC games inside itself, is likely
preferable from a user standpoint, whatever the implementation ends up
looking like.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
This all seems completely to one side of Andres's point. I think what
he's saying is: don't implement an SQL syntax of the form INSERT ON
CONFLICT and let people use that to implement upsert. Instead,
directly implement an SQL command called UPSERT or similar. That's
long been my intuition as well. I think generality here is not your
friend.
Sure, I was just making one fairly narrow point in relation to Andres'
concern about executor structuring of the UPSERT.
I'd suggest something like:
UPSERT table SET col = value [, ...] WHERE predicate;
Without dismissing the overall concern shared by you and Andres, that
particular update-driven syntax doesn't strike me as something that
should be pursued. I would like to be able to insert or update more
than a single row at a time, for one thing. For another, what exactly
appears in the predicate? Is it more or less the same as an UPDATE's
predicate?
with semantics like this:
1. Search the table, using any type of scan you like, for a row
matching the given predicate.
Perhaps I've misunderstood, but this is fundamentally different to
what I'd always thought would need to happen. I always understood that
the UPSERT should be insert-driven, and so not really have an initial
scan at all in the same sense that every Insert lacks one. Moreover,
I've always thought that everyone agreed on that. We go to insert, and
then in the course of inserting index tuples do something with dirty
snapshots. That's how we get information on conflicts.
For one thing we cannot use any kind of scan unless there is a new
mechanism for seeing not-yet-visible tuples, like some kind of
non-MVCC snapshot that those scan nodes can selectively use. Now, I
guess that you intend that in this scenario you go straight to 5, and
then your insert finds the not-yet-visible conflict. But it's not
clear that when 5 fails, and we pick up from 1 that we then get a new
MVCC Snapshot or something else that gives us a hope of succeeding
this time around. And it seems quite wasteful to not use knowledge of
conflicts from the insert at all - that's one thing I'm trying to
address here; removing unnecessary index scans when we actually know
what our conflict TIDs are.
2. If you find more than one tuple that is visible to your scan, error.
This point seems to concern making the UPSERT UPDATE only affect zero
or one rows. Is that right? If so, why do you think that's a useful
constraint to impose?
3. If you find exactly one tuple that is visible to your scan, update it. Stop.
4. If you find no tuple that is visible to your scan, but you do find
at least one tuple whose xmin has committed and whose xmax has not
committed, then (4a) if the isolation level is REPEATABLE READ or
higher, error; (4b) if there is more than one such tuple, error; else
(4b) update it, in violation of normal MVCC visibility rules.
Point 4b ("if there is more than one such tuple...") seems like it
needs some practical or theoretical justification - do you just not
want to follow an update chain? If so, why? What would the error
message actually be? And, to repeat myself: Why should an MVCC
snapshot see more than one such tuple in the ordinary course of
scanning a relation, since there is by definition a unique constraint
that prevents this? Or maybe I'm wrong; maybe you don't even require
that. That isn't clear.
As you know, I've always opposed any type of READ COMMITTED
serialization failure. If you're worried about lock starvation hazards
- although I don't think strong concerns here are justified - we can
always put in a fail-safe number of retries before throwing an error.
I'm comfortable with that because I think based on experimentation
that it's going to be virtually impossible to hit. Apparently other
snapshot isolation databases acquire a new snapshot, and re-do the
command rather than using an EvalPlanQual()-like mechanism and thereby
violating snapshot isolation. Those other systems have just such a
hard limit on retries before erroring, and it seems to work out okay
for them.
5. Construct a new tuple using the col/value pairs from the SET clause
and try to insert it. If this would fail with a unique index
violation, back out and go back to step 1.
Again, I can't see why you'd do this step last and not first, since
this is naturally the place to initially "see into the future" with a
dirty snapshot.
Having said all that, I believe the INSERT ON CONFLICT syntax is more
easily comprehensible than previous proposals. But I still tend to
agree with Andres that an explicit UPSERT syntax or something like it,
that captures all of the MVCC games inside itself, is likely
preferable from a user standpoint, whatever the implementation ends up
looking like.
Okay then. If you both feel that way, I will come up with something
closer to what you sketch. But for now I still strongly feel it ought
to be driven by an insert. Perhaps I've misunderstood you entirely,
though.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 23, 2014 at 5:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I'd suggest something like:
UPSERT table SET col = value [, ...] WHERE predicate;
I don't think this is actually good enough. Typical use cases are
things like "increment this counter or insert a new one starting at
0".
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan <pg@heroku.com> wrote:
I still strongly feel it ought to be driven by an insert
Could you clarify that? Does this mean that you feel that we
should write to the heap before reading the index to see if the row
will be a duplicate? If so, I think that is a bad idea, since this
will sometimes be used to apply a new data set which hasn't changed
much from the old, and that approach will perform poorly for this
use case, causing a lot of bloat. It certainly would work well for
the case that most of the rows are expected to be INSERTs rather
than DELETEs, but I'm not sure that's justification for causing
extreme bloat in the other cases.
Also, just a reminder that I'm going to squawk loudly if the
implementation does not do something fairly predictable and sane
for the case that the table has more than one UNIQUE index and you
attempt to UPSERT a row that is a duplicate of one row on one of
the indexes and a different row on a different index. The example
discussed during your PGCon talk was something like a city table
with two column, each with a UNIQUE constraint, containing:
city_id | city_name
---------+-----------
1 | Toronto
2 | Ottawa
... and an UPSERT comes through for (1, 'Ottawa'). We would all
like for that never to happen, but it will. There must be sane and
documented behavior in that case.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 23, 2014 at 4:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
This all seems completely to one side of Andres's point. I think what
he's saying is: don't implement an SQL syntax of the form INSERT ON
CONFLICT and let people use that to implement upsert. Instead,
directly implement an SQL command called UPSERT or similar. That's
long been my intuition as well. I think generality here is not your
friend.Sure, I was just making one fairly narrow point in relation to Andres'
concern about executor structuring of the UPSERT.I'd suggest something like:
UPSERT table SET col = value [, ...] WHERE predicate;
Without dismissing the overall concern shared by you and Andres, that
particular update-driven syntax doesn't strike me as something that
should be pursued. I would like to be able to insert or update more
than a single row at a time, for one thing. For another, what exactly
appears in the predicate? Is it more or less the same as an UPDATE's
predicate?
To the last question, yes. To the first point, I'm not bent on this
particular syntax, but I am +1 for the idea that the command is
something specifically upsert-like rather than something more generic
like an ON CONFLICT SELECT that lets you do arbitrary things with the
returned rows.
with semantics like this:
1. Search the table, using any type of scan you like, for a row
matching the given predicate.Perhaps I've misunderstood, but this is fundamentally different to
what I'd always thought would need to happen. I always understood that
the UPSERT should be insert-driven, and so not really have an initial
scan at all in the same sense that every Insert lacks one. Moreover,
I've always thought that everyone agreed on that. We go to insert, and
then in the course of inserting index tuples do something with dirty
snapshots. That's how we get information on conflicts.
It's certain arguable whether you should INSERT and then turn failures
into an update or try to UPDATE and then turn failures into an INSERT;
we might even want to have both options available, though that smells
a little like airing too much of our dirty laundry. But I think I
generally favor optimizing for the UPDATE case for more or less the
same reasons Kevin articulated.
For one thing we cannot use any kind of scan unless there is a new
mechanism for seeing not-yet-visible tuples, like some kind of
non-MVCC snapshot that those scan nodes can selectively use. Now, I
guess that you intend that in this scenario you go straight to 5, and
then your insert finds the not-yet-visible conflict. But it's not
clear that when 5 fails, and we pick up from 1 that we then get a new
MVCC Snapshot or something else that gives us a hope of succeeding
this time around. And it seems quite wasteful to not use knowledge of
conflicts from the insert at all - that's one thing I'm trying to
address here; removing unnecessary index scans when we actually know
what our conflict TIDs are.
Here you seem to be suggested that I intended to propose your existing
design rather than something else, which I didn't. In this design,
you find the conflict (at most one) but scanning for the tuple to be
updated.
2. If you find more than one tuple that is visible to your scan, error.
This point seems to concern making the UPSERT UPDATE only affect zero
or one rows. Is that right? If so, why do you think that's a useful
constraint to impose?
Because nobody wants an operation to either insert 1 tuple or update
n>=1 tuples. The intention is that the predicate should probably be
something like WHERE unique_key = 'some_value', but you can use
something else. So it's kinda like saying which index you care about
for a particular operation, except without having to explicitly name
an index. But in any event you should use a predicate that uniquely
identifies the tuple you want to update.
3. If you find exactly one tuple that is visible to your scan, update it. Stop.
4. If you find no tuple that is visible to your scan, but you do find
at least one tuple whose xmin has committed and whose xmax has not
committed, then (4a) if the isolation level is REPEATABLE READ or
higher, error; (4b) if there is more than one such tuple, error; else
(4b) update it, in violation of normal MVCC visibility rules.Point 4b ("if there is more than one such tuple...") seems like it
needs some practical or theoretical justification - do you just not
want to follow an update chain? If so, why? What would the error
message actually be? And, to repeat myself: Why should an MVCC
snapshot see more than one such tuple in the ordinary course of
scanning a relation, since there is by definition a unique constraint
that prevents this? Or maybe I'm wrong; maybe you don't even require
that. That isn't clear.
In case (4b), like in case (2), you've done something like UPSERT tab
SET ... WHERE non_unique_index = 'value_appearing_more_than_once'.
You shouldn't do that, because you have only one replacement tuple
(embodied in the SET clause).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan-3 wrote
with semantics like this:
1. Search the table, using any type of scan you like, for a row
matching the given predicate.Perhaps I've misunderstood, but this is fundamentally different to
what I'd always thought would need to happen. I always understood that
the UPSERT should be insert-driven, and so not really have an initial
scan at all in the same sense that every Insert lacks one. Moreover,
I've always thought that everyone agreed on that. We go to insert, and
then in the course of inserting index tuples do something with dirty
snapshots. That's how we get information on conflicts.
SQL standard not-withstanding there really is no way to pick one
implementation and not have it be sub-optimal in some situations. Unless
there is a high barrier why not introduce syntax:
SCAN FIRST; INSERT FIRST
that allows the user to specify the behavior that they expect would be most
efficient given their existing/new data ratio?
Having said all that, I believe the INSERT ON CONFLICT syntax is more
easily comprehensible than previous proposals. But I still tend to
agree with Andres that an explicit UPSERT syntax or something like it,
that captures all of the MVCC games inside itself, is likely
preferable from a user standpoint, whatever the implementation ends up
looking like.Okay then. If you both feel that way, I will come up with something
closer to what you sketch. But for now I still strongly feel it ought
to be driven by an insert. Perhaps I've misunderstood you entirely,
though.
Getting a little syntax crazy here but having all of:
UPSERT [SCAN|INSERT] FIRST
INSERT ON CONFLICT UPDATE - same as INSERT FIRST
UPDATE ON MISSING INSERT - same as SCAN FIRST
with the corresponding 2 implementations would make the user interface
slightly more complicated but able to be conformed to the actual data that
the user has.
You could basically perform a two-phase pass where you run the
user-requested algorithm and then for all failures attempt the alternate
algorithm and then error if both fail.
I am not at all fluent on the concurrency issues here, and the MVCC
violations and re-tries that might be considered, but at a high-level there
is disagreement here simply because both answers are "correct" and ideally
both can be provided to the user.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Making-joins-involving-ctid-work-for-the-benefit-of-UPSERT-tp5811919p5812640.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 23, 2014 at 3:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Could you clarify that? Does this mean that you feel that we
should write to the heap before reading the index to see if the row
will be a duplicate? If so, I think that is a bad idea, since this
will sometimes be used to apply a new data set which hasn't changed
much from the old, and that approach will perform poorly for this
use case, causing a lot of bloat. It certainly would work well for
the case that most of the rows are expected to be INSERTs rather
than DELETEs, but I'm not sure that's justification for causing
extreme bloat in the other cases.
No, I think we should stagger ordinary index insertion in a way that
locks indexes - we lock indexes, then if successful insert a heap
tuple before finally inserting index tuples using the existing
heavyweight page-level index locks. My design doesn't cause bloat
under any circumstances. Heikki's design, which he sketched with an
actual POC implementation involved possible bloating in the event of a
conflict. He also had to go and delete the promise tuple (from within
ExecInsert()) in the event of the conflict before row locking in order
to prevent unprincipled deadlocking. Andres wanted to do something
else similarly involving "promise tuples", where the xid on the
inserter was stored in indexes with a special flag. That could also
cause bloat. I think that could be particularly bad when conflicts
necessitate visiting indexes one by one to kill promise tuples, as
opposed to just killing one heap tuple as in the case of Heikki's
design.
Anyway, both of those designs, and my own design are insert-driven.
The main difference between the design that Heikki sketched and my own
is that mine does not cause bloat, but is more invasive to the nbtree
code (but less invasive to a lot of other places to make the
deadlocking-ultimately-conflicting tuple killing work). But I believe
that Heikki's design is identical to my own in terms of user-visible
semantics. That said, his design was just a sketched and it wouldn't
be fair to hold him to it.
Also, just a reminder that I'm going to squawk loudly if the
implementation does not do something fairly predictable and sane
for the case that the table has more than one UNIQUE index and you
attempt to UPSERT a row that is a duplicate of one row on one of
the indexes and a different row on a different index.
Duly noted. :-)
I think that it's going to have to support that one way or the other.
It might be the case that I'll want to make the choice of unique index
optionally "implicit", but it's clear that we want to be able to
specify a specific unique index in one form or another. Actually, I've
already added that. It's just optional right now. I haven't found a
better way than by just specifying the name of the unique index in
DML, which is ugly, which is the main reason I want to make it
optional. Perhaps we can overcome this.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote:
On Wed, Jul 23, 2014 at 4:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
2. If you find more than one tuple that is visible to your scan, error.
This point seems to concern making the UPSERT UPDATE only affect zero
or one rows. Is that right? If so, why do you think that's a useful
constraint to impose?Because nobody wants an operation to either insert 1 tuple or update
n>=1 tuples. The intention is that the predicate should probably be
something like WHERE unique_key = 'some_value', but you can use
something else. So it's kinda like saying which index you care about
for a particular operation, except without having to explicitly name
an index. But in any event you should use a predicate that uniquely
identifies the tuple you want to update.
This seemed a nice idea when I first read it earlier today, but now I'm
not so sure. Are you saying that it wouldn't be allowed to use an
UPSERT with some sort of join, such that each joined row would produce
either one insert or one update? To clarify: suppose I import some
external data into a temp table, then run UPSERT "USING" that table so
that the rows end up in a permanent table; some of the rows might be
already in the permanent table, some others might not. I would hope
that by the time UPSERT is done, all the rows are in the permanent
table. Would that raise an error, with your proposed design?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 23, 2014 at 3:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
To the last question, yes. To the first point, I'm not bent on this
particular syntax, but I am +1 for the idea that the command is
something specifically upsert-like rather than something more generic
like an ON CONFLICT SELECT that lets you do arbitrary things with the
returned rows.
FWIW I wasn't proposing that you'd be able to do arbitrary things.
There'd be a few restrictions, such as forbidding unexpected DML
commands, and requiring that only a special update reference the
special rejected-projecting CTE. They just wouldn't be arbitrary
restrictions. But that's not that interesting to me anymore; you and
Andres want a dedicated DML command with the update part built in,
that isn't as flexible. Okay, fine. I don't think it makes the
implementation any easier, but that's what I'll do.
It's certain arguable whether you should INSERT and then turn failures
into an update or try to UPDATE and then turn failures into an INSERT;
we might even want to have both options available, though that smells
a little like airing too much of our dirty laundry. But I think I
generally favor optimizing for the UPDATE case for more or less the
same reasons Kevin articulated.
I don't see the connection between this and Kevin's remarks. And FWIW,
I don't see a reason to favor inserts or updates. Fortunately, what I
have balances both cases very well, and doesn't cause bloat. The work
of descending the index to lock it isn't wasted if an update is
required. My implementation decides to either insert or update at
literally the latest possible moment.
For one thing we cannot use any kind of scan unless there is a new
mechanism for seeing not-yet-visible tuples, like some kind of
non-MVCC snapshot that those scan nodes can selectively use. Now, I
guess that you intend that in this scenario you go straight to 5, and
then your insert finds the not-yet-visible conflict. But it's not
clear that when 5 fails, and we pick up from 1 that we then get a new
MVCC Snapshot or something else that gives us a hope of succeeding
this time around. And it seems quite wasteful to not use knowledge of
conflicts from the insert at all - that's one thing I'm trying to
address here; removing unnecessary index scans when we actually know
what our conflict TIDs are.Here you seem to be suggested that I intended to propose your existing
design rather than something else, which I didn't. In this design,
you find the conflict (at most one) but scanning for the tuple to be
updated.
Yes, but what if you don't see a conflict because it isn't visible to
your snapshot, and then you insert, and only then (step 5), presumably
with a dirty snapshot, you find a conflict? How does the loop
terminate if that brings you back to step 1 with the same MVCC
snapshot feeding the update?
2. If you find more than one tuple that is visible to your scan, error.
This point seems to concern making the UPSERT UPDATE only affect zero
or one rows. Is that right? If so, why do you think that's a useful
constraint to impose?Because nobody wants an operation to either insert 1 tuple or update
n>=1 tuples. The intention is that the predicate should probably be
something like WHERE unique_key = 'some_value', but you can use
something else. So it's kinda like saying which index you care about
for a particular operation, except without having to explicitly name
an index. But in any event you should use a predicate that uniquely
identifies the tuple you want to update.
I agree that you want to uniquely identify each tuple. What I meant
was, why should we not be able to upsert multiple rows in a single
command? What's wrong with that?
Point 4b ("if there is more than one such tuple...") seems like it
needs some practical or theoretical justification - do you just not
want to follow an update chain? If so, why? What would the error
message actually be? And, to repeat myself: Why should an MVCC
snapshot see more than one such tuple in the ordinary course of
scanning a relation, since there is by definition a unique constraint
that prevents this? Or maybe I'm wrong; maybe you don't even require
that. That isn't clear.In case (4b), like in case (2), you've done something like UPSERT tab
SET ... WHERE non_unique_index = 'value_appearing_more_than_once'.
You shouldn't do that, because you have only one replacement tuple
(embodied in the SET clause).
Oh, I totally agree that we should throw an error if any one row is
affected more than once by the same command. Indeed, SQL MERGE
requires this, since to do otherwise would leave the final value of
the row unspecified. It seemed to me from your original explanation of
point 4b that you were saying something much stronger. But if that's
all you meant, then I agree.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I'd suggest something like:
UPSERT table SET col = value [, ...] WHERE predicate;
I think I see another problem with this. The UPDATE must provide a
WHERE clause Var on a unique indexed column (let's say it's
constrained to provide a "(Var [unique-indexed opclass support
function 3 op] Const)" qual during parse analysis because you asked
for UPSERT. But it can also affect 0 rows for other reasons, since
this UPDATE qual can have arbitrary other expressions. So in general
you have any number of reasons for not updating, which implies that
you must insert, which might not be possible because there might even
still be duplicate key violation in a not-yet-visible-to-you row (even
just in the unique index you intended to merge on). Whereas, when
inserting, there is exactly one reason (per row) to go and update - a
conflict (i.e. a would-be duplicate violation). And this is a conflict
that is meaningful to every transaction, regardless of their snapshot,
since it represents an objective fact about the physical presence of
an index tuple.
So, do you make the UPDATE differentiate between different reasons for
the UPDATE failing, only inserting when an UPSERT would have happened
had you omitted the extra stuff in your UPSERT predicate? Can you
really differentiate anyway? And isn't the choice to do the insert
based on what your MVCC snapshot happens to see - rather than
something that there is necessarily objective truth on, the physical
presence of a duplicate tuple in an index - rather arbitrary? It's a
different situation to my implementation, where you do an insert-like
thing, and then find a conflict row to lock and then update, because
at that point the row is successfully locked, and the WHERE clause may
be evaluated against rejects and the *most recent version* of the
locked, conflict-causing row. The most recent, locked version is not
arbitrary at all. That's the version you ought to evaluate a predicate
against before finally deciding to UPDATE.
I assume you agree with my view that UPSERT should always insert or
update. This kind of business sounds closer to SQL MERGE, where an
insert may not drive things, and you accept these kinds of anomalies
in exchange for a lot more flexibility in not having inserts drive
things. Did you happen to read my post on that?
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers