INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
this development justifies a new thread, though.)
This features a new way of organizing the code. Heikki and I are now
in agreement that the best way of incrementally committing the work is
to do ON CONFLICT IGNORE first (perhaps someone else can review ON
CONFLICT UPDATE). This patch series is organized so that the first
commit adds tests, documentation and code that only relates to the
IGNORE variant. RLS support is still split out as a separate commit,
which is not intended to actually be committed separately (from the
main ON CONFLICT UPDATE commit). As before, I've just done that to
highlight that part for the benefit of RLS subject matter experts. The
RTE permissions split patch is also broken out, but I believe that
there is consensus that that could sensibly be committed on its own.
There are some minor changes to the code/documentation itself:
* Ran the code through pg_indent.
* Documented ON CONFLICT UPDATE as a MERGE alternative under
"unsupported SQL features" (sql_features.txt).
* Minor tweaks of comments here and there.
* Logical decoding fixes.
* Call ExecCheckPlanOutput() for an ON CONFLICT UPDATE auxiliary query, too.
I should point out that I'm aware of the following open issues around
the patch series (most of these apply to the base IGNORE commit, so
Heikki should of course look out for them):
* Andres and I are still hashing out details of whether or not certain
assumptions made around handling super deletion within logical
decoding are safe [1]/messages/by-id/20150303111342.GF2579@alap3.anarazel.de. Consider the decoding stuff suspect for now.
* There is still an unresolved semantics issue with unique index
inference and non-default opclasses. I think it's sufficient that the
available/defined unique indexes dictate our idea of a unique
violation (that necessitates taking the alternative path). Even in a
world where there exists a non-default opclass with an "equals"
operator that does not match that of the default opclass (that is not
really the world we live in, because the only counter-example known is
made that way specifically to *discourage* its use by users), this
seems okay to me. It seems okay to me because surely the relevant
definition of equality is the one actually chosen for the available
unique index. If there exists an ambiguity for some strange reason
(i.e. there are two unique indexes, on the same attribute(s), but with
different "equals" operators), then its a costing issue, so the
behavior given is essentially non-predictable (it could end up being
either...but hey, those are the semantics). I have a very hard time
imagining how that could ever be the case, even when we have (say)
case insensitive opclasses for the text type. A case insensitive
opclass is stricter than a case sensitive opclass. Why would a user
ever want both on the same attribute(s) of the same table? Is the user
really more or less expecting to never get a unique violation on the
non-arbitrating unique index, despite all this?
If reviewers are absolutely insistent that this theoretical ambiguity
is a problem, we can add an optional CREATE INDEX style opclass
specification (I'm already using the IndexElems representation from
CREATE INDEX for the inference specification, actually, so that would
be easy enough). I really have a hard time believing that the ugliness
is a problem for those hypothetical users that eventually consider
"equals" operator ambiguity among opclasses among available unique
indexes to be a problem. I haven't just gone and implemented this
already because I didn't want to document that an opclass
specification will be accepted. Since there is literally no reason why
anyone would care today, I see no reason to add what IMV would really
just be cruft.
* I'm flying blind with the SEPostgres changes. Unfortunately, it's
not very possible to test SEPostgres on my machine. (Does not apply to
IGNORE variant.)
As always, the jjanes_upsert tool [2]https://github.com/petergeoghegan/jjanes_upsert -- Peter Geoghegan has proven invaluable with
testing this patch (Thanks Jeff!). Reviewers should look to that tool
for ideas on how the patch might be broken. It caught a number of race
conditions with exclusion constraints in the past, for example. We're
in good shape with those now (and have been since V2.3 - see "livelock
insurance" comments within the IGNORE commit).
Thoughts?
[1]: /messages/by-id/20150303111342.GF2579@alap3.anarazel.de
[2]: https://github.com/petergeoghegan/jjanes_upsert -- Peter Geoghegan
--
Peter Geoghegan
Attachments:
0004-RLS-support-for-ON-CONFLICT-UPDATE.patchtext/x-patch; charset=US-ASCII; name=0004-RLS-support-for-ON-CONFLICT-UPDATE.patchDownload+357-33
0003-Support-INSERT-.-ON-CONFLICT-UPDATE.patchtext/x-patch; charset=US-ASCII; name=0003-Support-INSERT-.-ON-CONFLICT-UPDATE.patchDownload+2399-177
0002-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchtext/x-patch; charset=US-ASCII; name=0002-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchDownload+177-116
0001-Support-INSERT-.-ON-CONFLICT-IGNORE.patchtext/x-patch; charset=US-ASCII; name=0001-Support-INSERT-.-ON-CONFLICT-IGNORE.patchDownload+2311-137
On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan <pg@heroku.com> wrote:
Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
this development justifies a new thread, though.)
Bruce Momjian kindly made available a server for stress-testing [1]http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012 -- Peter Geoghegan.
I'm using jjanes_upsert for this task (I stopped doing this for a
little while, and was in need of a new server).
At the very highest client count I'm testing (128), I see unprincipled
deadlocks for the exclusion constraint case very infrequently:
"""""
2015-03-05 14:09:36 EST [ 64987901 ]: ERROR: deadlock detected
2015-03-05 14:09:36 EST [ 64987901 ]: DETAIL: Process 7044 waits for
ShareLock on promise tuple with token 1 of transaction 64987589;
blocked by process 7200.
Process 7200 waits for ShareLock on transaction 64987901;
blocked by process 7044.
Process 7044: insert into upsert_race_test (index, count)
values ('541','-1') on conflict
update set count=TARGET.count + EXCLUDED.count
where TARGET.index = EXCLUDED.index
returning count
Process 7200: insert into upsert_race_test (index, count)
values ('541','-1') on conflict
update set count=TARGET.count + EXCLUDED.count
where TARGET.index = EXCLUDED.index
returning count
2015-03-05 14:09:36 EST [ 64987901 ]: HINT: See server log for query details.
2015-03-05 14:09:36 EST [ 64987901 ]: STATEMENT: insert into
upsert_race_test (index, count) values ('541','-1') on conflict
update set count=TARGET.count + EXCLUDED.count
where TARGET.index = EXCLUDED.index
returning count
"""""
(Reminder: Exclusion constraints doing UPSERTs, and not just IGNOREs,
are artificial; this has been enabled within the parser solely for the
benefit of this stress-testing. Also, the B-Tree AM does not have nor
require "livelock insurance").
This only happens after just over 30 minutes, while consistently doing
128 client exclusion constraint runs. This is pretty close to the most
stressful thing that you could throw at the implementation, so that's
really not too bad.
I believe that this regression occurred as a direct result of adding
"livelock insurance". Basically, we cannot be 100% sure that the
interleaving of WAL-logged things within and across transactions won't
be such that the "only", "oldest" session that gets to wait in the
second stage (the second possible call to
check_exclusion_or_unique_constraint(), from ExecInsertIndexTuples())
will really be the "oldest" XID. Another *older* xact could just get
in ahead of us, waiting on our promise tuple as we wait on its xact
end (maybe it updates some third tuple that we didn't see in that has
already committed...not 100% sure). Obviously XID assignment order
does not guarantee that things like heap insertion and index tuple
insertion occur in serial XID order, especially with confounding
factors like super deletion. And so, every once in a long while we
deadlock.
Now, the very fact that this happens at all actually demonstrates the
need for "livelock insurance", IMV. The fact that we reliably
terminate is *reassuring*, because livelocks are in general a
terrifying possibility. We cannot *truly* solve the unprincipled
deadlock problem without adding livelocks, I think. But what we have
here is very close to eliminating unprincipled deadlocks, while not
also adding any livelocks, AFAICT. I'd argue that that's good enough.
Of course, when I remove "livelock insurance", the problem ostensibly
goes away (I've waited on such a stress test session for a couple of
hours now, so this conclusion seems very likely to be correct). I
think that we should do nothing about this, other than document it as
possible in our comments on "livelock insurance". Again, it's very
unlikely to be a problem in the real world, especially since B-Trees
are unaffected.
Also, I should point out that one of those waiters doesn't look like
an insertion-related wait at all: "7200 waits for ShareLock on
transaction 64987901; blocked by process 7044". Looks like row locking
is an essential part of this deadlock, and ordinarily that isn't even
possible for exclusion constraints (unless the patch is hacked to make
the parser accept exclusion constraints for an UPSERT, as it has been
here). Not quite sure what exact XactLockTableWait() call did this,
but don't think it was any of them within
check_exclusion_or_unique_constraint(), based on the lack of details
(TID and so on are not shown).
[1]: http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012 -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5 March 2015 at 23:44, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan <pg@heroku.com> wrote:
Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
this development justifies a new thread, though.)
Hi,
I had a play with this, mainly looking at the interaction with RLS.
(Note there is some bitrot in gram.y that prevents the first patch
from applying cleanly to HEAD)
I tested using the attached script, and one test didn't behave as I
expected. I believe the following should have been a valid upsert
(following the update path) but actually it failed:
INSERT INTO t1 VALUES (4, 0) ON CONFLICT (a) UPDATE SET b = 1;
AFAICT, it is applying a WITH CHECK OPTION with qual "b > 0 AND a % 2
= 0" to the about-to-be-updated tuple (a=4, b=0), which is wrong
because the "b > 0" check (policy p3) should only be applied to the
post-update tuple.
Possibly I'm missing something though.
Regards,
Dean
Attachments:
test.sqltext/x-sql; charset=US-ASCII; name=test.sqlDownload
On 03/05/2015 03:18 AM, Peter Geoghegan wrote:
Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
this development justifies a new thread, though.)
I'm still not sure the way the speculative locking works is the best
approach. Instead of clearing xmin on super-deletion, a new flag on the
heap tuple seems more straightforward. And we could put the speculative
insertion token in t_ctid, instead of stashing it in the PGPROC array.
That would again seem more straightforward.
A couple of quick random comments:
/*
* plan_speculative_use_index
* Use the planner to decide speculative insertion arbiter index
*
* Among indexes on target of INSERT ... ON CONFLICT, decide which index to
* use to arbitrate taking alternative path. This should be called
* infrequently in practice, because it's unusual for more than one index to
* be available that can satisfy a user-specified unique index inference
* specification.
*
* Note: caller had better already hold some type of lock on the table.
*/
Oid
plan_speculative_use_index(PlannerInfo *root, List *indexList)
{
...
/* Locate cheapest IndexOptInfo for the target index */
If I'm reading this correctly, if there are multiple indexes that match
the unique index inference specification, we pick the cheapest one.
Isn't that unstable? Two backends running the same INSERT ON CONFLICT
statement might pick different indexes, and the decision might change
over time as the table is analyzed. I think we should have a more robust
rule. Could we easily just use all matching indexes?
... Deferred unique constraints are not supported as + arbiters of whether an alternative <literal>ON CONFLICT</> path + should be taken.
We really need to find a shorter term for "arbiter of whether an
alternative path should be taken". Different variations of that term are
used a lot, and it's tedious to read.
* There is still an unresolved semantics issue with unique index
inference and non-default opclasses. I think it's sufficient that the
available/defined unique indexes dictate our idea of a unique
violation (that necessitates taking the alternative path). Even in a
world where there exists a non-default opclass with an "equals"
operator that does not match that of the default opclass (that is not
really the world we live in, because the only counter-example known is
made that way specifically to *discourage* its use by users), this
seems okay to me. It seems okay to me because surely the relevant
definition of equality is the one actually chosen for the available
unique index. If there exists an ambiguity for some strange reason
(i.e. there are two unique indexes, on the same attribute(s), but with
different "equals" operators), then its a costing issue, so the
behavior given is essentially non-predictable (it could end up being
either...but hey, those are the semantics). I have a very hard time
imagining how that could ever be the case, even when we have (say)
case insensitive opclasses for the text type. A case insensitive
opclass is stricter than a case sensitive opclass. Why would a user
ever want both on the same attribute(s) of the same table? Is the user
really more or less expecting to never get a unique violation on the
non-arbitrating unique index, despite all this?If reviewers are absolutely insistent that this theoretical ambiguity
is a problem, we can add an optional CREATE INDEX style opclass
specification (I'm already using the IndexElems representation from
CREATE INDEX for the inference specification, actually, so that would
be easy enough). I really have a hard time believing that the ugliness
is a problem for those hypothetical users that eventually consider
"equals" operator ambiguity among opclasses among available unique
indexes to be a problem. I haven't just gone and implemented this
already because I didn't want to document that an opclass
specification will be accepted. Since there is literally no reason why
anyone would care today, I see no reason to add what IMV would really
just be cruft.
I've been thinking that it would be nice to be able to specify a
constraint name. Naming an index directly feels wrong, as in relational
and SQL philosophy, indexes are just an implementation detail, but
naming a constraint is a fair game. It would also be nice to be able to
specify "use the primary key".
Attached patch contains a few more things I saw at a quick read.
- Heikki
Attachments:
misc-upsert-issues.patchapplication/x-patch; name=misc-upsert-issues.patchDownload+20-7
On Thu, Mar 5, 2015 at 3:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
Bruce Momjian kindly made available a server for stress-testing [1].
I'm using jjanes_upsert for this task (I stopped doing this for a
little while, and was in need of a new server).
BTW, this was run for about another week on Bruce's server, and no
further issues arose affecting either the B-Tree or exclusion
constraint implementations. I've stopped with it for now, because it
feels unlikely that I'm going to find any more bugs this way.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I'm still not sure the way the speculative locking works is the best
approach. Instead of clearing xmin on super-deletion, a new flag on the heap
tuple seems more straightforward. And we could put the speculative insertion
token in t_ctid, instead of stashing it in the PGPROC array. That would
again seem more straightforward.
I see the appeal of that approach. What concerns me about that
approach is that it makes it the problem of the inserter to confirm
its insertion, even though overwhelmingly, speculative insertions
succeeds (the race window is small due to the pre-check). The current
speculative token is legitimately a very short lived thing, a thing
that I hesitate to write to disk at all. So our optimistic approach to
inserting speculatively loses its optimism, which I don't like, if you
know what I mean.
OTOH, apart from avoiding stashing things in PGPROC, I do see another
advantage to what you outline. Doing things this way (and making the
insertion and confirmation of a speculative insertion a two-phased
thing) unambiguously fixes all problems with logical decoding, without
adding unexpected cross-change-coordination tricks during transaction
reassembly, and really without much further thought. All we do is get
a new case to decode, that ultimately produces a
REORDER_BUFFER_CHANGE_INSERT change, which Andres more or less wanted
to do anyway.
Under this scheme with using t_ctid, heap_insert() would do minimal
WAL logging, necessary for the same reasons that some WAL logging is
required within heap_lock_tuple() (so the new record type is very
similar to the existing, simple xl_heap_lock record type). We'd use
one of the two remaining bits within t_infomask2 for this, so that
waiters will know to interpret t_ctid as a speculative insertion token
(and then wait on it). Then, after speculative insertion succeeded,
we'd WAL log something closer to an UPDATE to confirm that insertion
was in fact successful, which is where the contents of the new tuple
is actually finally WAL-logged (like UPDATEs used to be, but without a
new tuple being WAL-logged at all, since there of course is none).
Is that more or less what you have in mind?
A couple of quick random comments:
/*
* plan_speculative_use_index
* Use the planner to decide speculative insertion arbiter
index
*
* Among indexes on target of INSERT ... ON CONFLICT, decide which index
to
* use to arbitrate taking alternative path. This should be called
* infrequently in practice, because it's unusual for more than one index
to
* be available that can satisfy a user-specified unique index inference
* specification.
*
* Note: caller had better already hold some type of lock on the table.
*/
Oid
plan_speculative_use_index(PlannerInfo *root, List *indexList)
{
...
/* Locate cheapest IndexOptInfo for the target index */If I'm reading this correctly, if there are multiple indexes that match the
unique index inference specification, we pick the cheapest one. Isn't that
unstable? Two backends running the same INSERT ON CONFLICT statement might
pick different indexes, and the decision might change over time as the table
is analyzed. I think we should have a more robust rule. Could we easily just
use all matching indexes?
Robert feel pretty strongly that there should be a costing aspect to
this. Initially I was skeptical of this, but just did what he said all
the same. But I've since come around to his view entirely because we
now support partial unique indexes.
You can add a predicate to a unique index inference specification, and
you're good, even if there is no partial index on those
attributes/expressions exactly matching the DML/inference predicate -
we use predicate_implied_by() for this, which works with an equivalent
non-partial unique index. This is because a non-partial unique index
covers those attributes sufficiently. There may be value in preferring
the cheap partial index, and then verifying that it actually did cover
the final inserted tuple version (which is how things work now). If we
cannot discriminate against one particular unique index, making sure
it covers the inserted heap tuple (by throwing a user-visible
ERRCODE_TRIGGERED_ACTION_EXCEPTION error if it doesn't, as I currently
do within ExecInsertIndexTuples()) is on shaky ground as a
user-visible behavior. I like that behavior, though - seems less
surprising than letting a failure to actually cover a selective
partial unique index predicate (that of the one and only arbiter
index) slide. You can only get this ERRCODE_TRIGGERED_ACTION_EXCEPTION
error when you actually had a predicate in your unique index inference
specification in the first place, so seems likely that you want the
error. But, I also like supporting unique indexes that are non-partial
even in the presence of a predicate in the unique index inference
specification clause, because, as I said, that's still correct - I can
see not doing this breaking queries when a partial unique index is
dropped due to a more general uniqueness requirement.
Besides, having a list of arbiter unique indexes that must all be
equivalent for the user's purpose anyway is very awkward indeed. It
seems very confusing. If they all have equivalent definitions anyway,
as they must, then it must not matter one bit which one you take
(except perhaps on cost grounds). It might be unstable, but it
couldn't possibly matter, so why bother with anything else? If it
*does* matter on semantics grounds, then we have bigger problems.
There is a tension between indexes as an implementation detail, and
indexes as the thing that define the semantics for the purposes of
this feature here (in a way that isn't baked into the DML statement,
but is only known from a cataloged definition of an index or its
physical structure). I have not fully resolved that tension here, but
I think I have a good practical balance between the two extremes. Does
that make sense?
... Deferred unique constraints are not supported as + arbiters of whether an alternative <literal>ON CONFLICT</> path + should be taken.We really need to find a shorter term for "arbiter of whether an alternative
path should be taken". Different variations of that term are used a lot, and
it's tedious to read.
Okay. I'll think about that.
I've been thinking that it would be nice to be able to specify a constraint
name. Naming an index directly feels wrong, as in relational and SQL
philosophy, indexes are just an implementation detail, but naming a
constraint is a fair game. It would also be nice to be able to specify "use
the primary key".
Now that we have partial unique indexes covered, which never have a
constraint (I think - see my remarks above), this is probably okay.
But that doesn't really solve the highly theoretical (non-)issue with
"equals" operator ambiguity, which I called out as an open issue
recently [1]/messages/by-id/CAM3SWZS9DTFt1=sQT=WFdp5UFkOac2Qc1i5WE-Od4BXJZ=JsCw@mail.gmail.com -- Peter Geoghegan. That's still at thing we need to decide on, even if that
means that you agree with me that it doesn't matter (and since this
naming of constraints in DML is an escape hatch from that problem too,
that seems more likely than before). What do you think?
Attached patch contains a few more things I saw at a quick read.
This is mostly documentation stuff. I can certainly address this
feedback quickly and easily, and will do so soon.
[1]: /messages/by-id/CAM3SWZS9DTFt1=sQT=WFdp5UFkOac2Qc1i5WE-Od4BXJZ=JsCw@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 16, 2015 at 8:10 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
(Note there is some bitrot in gram.y that prevents the first patch
from applying cleanly to HEAD)
That's trivially fixable. I'll have those fixes in the next revision,
once I firm some things up with Heikki.
I tested using the attached script, and one test didn't behave as I
expected. I believe the following should have been a valid upsert
(following the update path) but actually it failed:INSERT INTO t1 VALUES (4, 0) ON CONFLICT (a) UPDATE SET b = 1;
AFAICT, it is applying a WITH CHECK OPTION with qual "b > 0 AND a % 2
= 0" to the about-to-be-updated tuple (a=4, b=0), which is wrong
because the "b > 0" check (policy p3) should only be applied to the
post-update tuple.Possibly I'm missing something though.
I think that you may have. Did you read the commit message/docs of the
RLS commit 0004-*? You must consider the second point here, I believe:
""""
The 3 places that RLS policies are enforced are:
* Against row actually inserted, after insertion proceeds successfully
(INSERT-applicable policies only).
* Against row in target table that caused conflict. The implementation
is careful not to leak the contents of that row in diagnostic
messages (INSERT-applicable *and* UPDATE-applicable policies).
* Against the version of the row added by to the relation after
ExecUpdate() is called (INSERT-applicable *and* UPDATE-applicable
policies).
""""
You're seeing a failure that applies to the target tuple of the UPDATE
(the tuple that we can't leak the contents of). I felt it was best to
check all policies against the target/existing tuple, including both
WITH CHECK OPTIONS and USING quals (which are both enforced).
I can see why you might not like that behavior, but it is the intended
behavior. I thought that this whole intersection of RLS + UPSERT is
complex enough that it would be best to be almost as conservative as
possible in what fails and what succeeds. The one exception is when
the insert path is actually taken, since the statement is an INSERT
statement.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 17 March 2015 at 23:25, Peter Geoghegan <pg@heroku.com> wrote:
Possibly I'm missing something though.
I think that you may have. Did you read the commit message/docs of the
RLS commit 0004-*? You must consider the second point here, I believe:""""
The 3 places that RLS policies are enforced are:* Against row actually inserted, after insertion proceeds successfully
(INSERT-applicable policies only).* Against row in target table that caused conflict. The implementation
is careful not to leak the contents of that row in diagnostic
messages (INSERT-applicable *and* UPDATE-applicable policies).* Against the version of the row added by to the relation after
ExecUpdate() is called (INSERT-applicable *and* UPDATE-applicable
policies).""""
Yes, I read that, and I agree with the intention to not leak data
according to both the INSERT and UPDATE policies, however...
You're seeing a failure that applies to the target tuple of the UPDATE
(the tuple that we can't leak the contents of). I felt it was best to
check all policies against the target/existing tuple, including both
WITH CHECK OPTIONS and USING quals (which are both enforced).
I think that's an incorrect implementation of the RLS UPDATE policy.
The WITH CHECK quals of a RLS policy are intended to be applied to the
NEW data, not the existing data. This patch is applying the WITH CHECK
quals to both the existing and NEW tuples, which runs counter to the
way RLS polices are normally enforced, and I think that will just lead
to confusion.
I can see why you might not like that behavior, but it is the intended
behavior. I thought that this whole intersection of RLS + UPSERT is
complex enough that it would be best to be almost as conservative as
possible in what fails and what succeeds. The one exception is when
the insert path is actually taken, since the statement is an INSERT
statement.
The problem with that is that the user will see errors saying that the
data violates the RLS WITH CHECK policy, when they might quite
reasonably argue that it doesn't. That's not really being
conservative. I'd argue it's a bug.
Regards,
Dean
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 17, 2015 at 3:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I've been thinking that it would be nice to be able to specify a constraint
name. Naming an index directly feels wrong, as in relational and SQL
philosophy, indexes are just an implementation detail, but naming a
constraint is a fair game. It would also be nice to be able to specify "use
the primary key".
Intuitively, I think you should specify an operator name, not a
constraint name. That's what we do for, e.g., exclusion constraints,
and it feels right. People sometimes create and drop indexes (and
thus, perhaps, the constraints that depend on them) for maintenance
reasons where a change in semantics will be unwelcome. But I don't
accept Peter's argument that it's OK to be indifferent to which
particular equality semantics are being used.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 17, 2015 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I'm still not sure the way the speculative locking works is the best
approach. Instead of clearing xmin on super-deletion, a new flag on the heap
tuple seems more straightforward. And we could put the speculative insertion
token in t_ctid, instead of stashing it in the PGPROC array. That would
again seem more straightforward.I see the appeal of that approach. What concerns me about that
approach is that it makes it the problem of the inserter to confirm
its insertion, even though overwhelmingly, speculative insertions
succeeds (the race window is small due to the pre-check). The current
speculative token is legitimately a very short lived thing, a thing
that I hesitate to write to disk at all. So our optimistic approach to
inserting speculatively loses its optimism, which I don't like, if you
know what I mean.
Modifying a shared buffer is not the same as writing to disk.
If I'm reading this correctly, if there are multiple indexes that match the
unique index inference specification, we pick the cheapest one. Isn't that
unstable? Two backends running the same INSERT ON CONFLICT statement might
pick different indexes, and the decision might change over time as the table
is analyzed. I think we should have a more robust rule. Could we easily just
use all matching indexes?Robert feel pretty strongly that there should be a costing aspect to
this. Initially I was skeptical of this, but just did what he said all
the same. But I've since come around to his view entirely because we
now support partial unique indexes.
I think Heikki's concern is something different, although I am not
altogether up to speed on this and may be confused. The issue is:
suppose that process A and process B are both furiously upserting into
the same table with the same key column but, because they have
different costing parameters, process A consistently chooses index X
and process B consistently chooses index Y. In that situation, will
we deadlock, livelock, error out, bloat, or get any other undesirable
behavior, or is that A-OK?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/18/2015 06:23 PM, Robert Haas wrote:
On Tue, Mar 17, 2015 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I'm still not sure the way the speculative locking works is the best
approach. Instead of clearing xmin on super-deletion, a new flag on the heap
tuple seems more straightforward. And we could put the speculative insertion
token in t_ctid, instead of stashing it in the PGPROC array. That would
again seem more straightforward.I see the appeal of that approach. What concerns me about that
approach is that it makes it the problem of the inserter to confirm
its insertion, even though overwhelmingly, speculative insertions
succeeds (the race window is small due to the pre-check). The current
speculative token is legitimately a very short lived thing, a thing
that I hesitate to write to disk at all. So our optimistic approach to
inserting speculatively loses its optimism, which I don't like, if you
know what I mean.Modifying a shared buffer is not the same as writing to disk.
AFAICS, there is no need to go and clear the tag after the insert has
completed.
Here's what I had in mind: the inserter tags the tuple with the
speculative insertion token, by storing the token in the t_ctid field.
If the inserter needs to super-delete the tuple, it sets xmax like in a
regular deletion, but also sets another flag to indicate that it was a
super-deletion.
When another backend inserts, and notices that it has a potential
conflict with the first tuple, it tries to acquire a hw-lock on the
token. In most cases, the inserter has long since completed the
insertion, and the acquisition succeeds immediately but you have to
check because the token is not cleared on a completed insertion. It
would be slightly faster for the second backend if the inserter actually
removed the token after the insertion has completed: it wouldn't need to
check the lock if the tuple was not tagged in the first place. But
that'd be just a tiny optimization, for the rare case that there is a
conflict, and surely not worth it.
Hmm, I just realized a little fly in that ointment: if the inserter
inserts enough tuples to wrap around the token counter (4 billion?) in a
single transaction, another backend that inserts could confuse a new
speculative insertion with one that completed a long time ago. The risk
seems small enough that we could just accept it. I don't think anything
particularly bad would happen on a false positive here. Or we could also
use all 48 bits available in the t_ctid to push it truly in the realm of
ain't-gonna-happen. Or we could use (relfilenode, block, token) as the
lock tag for the SpeculativeInsertionLock.
Regarding the physical layout: We can use a magic OffsetNumber value
above MaxOffsetNumber to indicate that the t_ctid field stores a token
rather than a regular ctid value. And another magic t_ctid value to
indicate that a tuple has been super-deleted. The token and the
super-deletion flag are quite ephemeral, they are not needed after the
inserting transaction has completed, so it's nice to not consume the
valuable infomask bits for these things. Those states are conveniently
not possible on an updated tuple, when we would need the t_ctid field
for it's current purpose.
If I'm reading this correctly, if there are multiple indexes that match the
unique index inference specification, we pick the cheapest one. Isn't that
unstable? Two backends running the same INSERT ON CONFLICT statement might
pick different indexes, and the decision might change over time as the table
is analyzed. I think we should have a more robust rule. Could we easily just
use all matching indexes?Robert feel pretty strongly that there should be a costing aspect to
this. Initially I was skeptical of this, but just did what he said all
the same. But I've since come around to his view entirely because we
now support partial unique indexes.I think Heikki's concern is something different, although I am not
altogether up to speed on this and may be confused. The issue is:
suppose that process A and process B are both furiously upserting into
the same table with the same key column but, because they have
different costing parameters, process A consistently chooses index X
and process B consistently chooses index Y. In that situation, will
we deadlock, livelock, error out, bloat, or get any other undesirable
behavior, or is that A-OK?
Right, that's what I had in mind.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 18, 2015 at 11:41 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
AFAICS, there is no need to go and clear the tag after the insert has
completed.Here's what I had in mind: the inserter tags the tuple with the speculative
insertion token, by storing the token in the t_ctid field. If the inserter
needs to super-delete the tuple, it sets xmax like in a regular deletion,
but also sets another flag to indicate that it was a super-deletion.When another backend inserts, and notices that it has a potential conflict
with the first tuple, it tries to acquire a hw-lock on the token. In most
cases, the inserter has long since completed the insertion, and the
acquisition succeeds immediately but you have to check because the token is
not cleared on a completed insertion. It would be slightly faster for the
second backend if the inserter actually removed the token after the
insertion has completed: it wouldn't need to check the lock if the tuple was
not tagged in the first place. But that'd be just a tiny optimization, for
the rare case that there is a conflict, and surely not worth it.Hmm, I just realized a little fly in that ointment: if the inserter inserts
enough tuples to wrap around the token counter (4 billion?) in a single
transaction, another backend that inserts could confuse a new speculative
insertion with one that completed a long time ago. The risk seems small
enough that we could just accept it. I don't think anything particularly bad
would happen on a false positive here. Or we could also use all 48 bits
available in the t_ctid to push it truly in the realm of ain't-gonna-happen.
Or we could use (relfilenode, block, token) as the lock tag for the
SpeculativeInsertionLock.
Maybe we'd use all 48-bits anyway, but it's not a major concern either
way. Speculative token locks (by design) are only held for an instant.
I think a spurious wait would be entirely benign, and very unlikely.
Regarding the physical layout: We can use a magic OffsetNumber value above
MaxOffsetNumber to indicate that the t_ctid field stores a token rather than
a regular ctid value. And another magic t_ctid value to indicate that a
tuple has been super-deleted. The token and the super-deletion flag are
quite ephemeral, they are not needed after the inserting transaction has
completed, so it's nice to not consume the valuable infomask bits for these
things. Those states are conveniently not possible on an updated tuple, when
we would need the t_ctid field for it's current purpose.
You seem pretty convinced that this is the way to go. I'll try and
produce a revision that works this way shortly. I don't think that it
will be all that hard to come up with something to prove the idea out.
I think Heikki's concern is something different, although I am not
altogether up to speed on this and may be confused. The issue is:
suppose that process A and process B are both furiously upserting into
the same table with the same key column but, because they have
different costing parameters, process A consistently chooses index X
and process B consistently chooses index Y. In that situation, will
we deadlock, livelock, error out, bloat, or get any other undesirable
behavior, or is that A-OK?Right, that's what I had in mind.
Oh, I see. I totally failed to understand that that was the concern.
I think it'll be fine. The pre-check is going to look for a heap tuple
using one or the other of (say) a pair of equivalent indexes. We might
miss the heap tuple because we picked an index that had yet to have a
physical index tuple inserted, and then hit a conflict on optimistic
insertion (the second phase). But there is no reason to think that
that won't happen anyway. The ordering of operations isn't critical.
The one issue you might still have is a duplicate violation, because
you happened to infer the unique index that does not get to tolerate
unique violations as conflicts that can be recovered from (and there
was a race, which is unlikely). I don't really care about this,
though. You really are inferring one particular unique index, and for
reasons like this I think it would be a mistake to try to pretend that
the unique index is 100% an implementation detail. That's why I called
the new clause a unique index inference specification.
This hypothetical set of unique indexes will always have n-1 redundant
unique indexes - they must closely match. That's something that calls
into question why the user wants things this way to begin with. Also,
note that one unique index will consistently "win", since the
insertion order is stable (the relcache puts them in OID order). So it
will not be all over the map.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 18, 2015 at 9:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 17, 2015 at 3:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I've been thinking that it would be nice to be able to specify a constraint
name. Naming an index directly feels wrong, as in relational and SQL
philosophy, indexes are just an implementation detail, but naming a
constraint is a fair game. It would also be nice to be able to specify "use
the primary key".Intuitively, I think you should specify an operator name, not a
constraint name. That's what we do for, e.g., exclusion constraints,
and it feels right. People sometimes create and drop indexes (and
thus, perhaps, the constraints that depend on them) for maintenance
reasons where a change in semantics will be unwelcome.
I think we should use a constraint name. That is the plain reality of
what we're doing, and is less ambiguous than an operator. 99% of the
time you'll be happy with an unspecified, across-the-board IGNORE, or
won't be using exclusion constraints anyway (so we can infer a unique
index).
A constraint name covers all reasonable cases, since partial unique
indexes are otherwise covered (partial unique indexes do not have a
pg_constraint entry). Oracle has a hint for ignoring particular, named
unique indexes (not constraints). I realize that Oracle hints are not
supposed to affect semantics, but this is actually true (Google it).
This is a bit ugly, but less ugly as the hint, since as Heikki points
out we're only naming a constraint. Naming a constraint reflects the
reality of how the feature needs to work, and has a sort of precedent
from other systems.
But I don't
accept Peter's argument that it's OK to be indifferent to which
particular equality semantics are being used.
I am not suggesting actual indifference makes sense. I am leaving it
up to the definition of available indexes. And there are no known
cases where it could matter anyway, unless you consider the ===
operator for tuples to be a counter example. And you need multiple
conflicting unique indexes on the exact same attributes/expressions on
attributes to begin with. Isn't that a highly esoteric thing to have
to worry about? Perhaps to the extent that literally no one will ever
have to care anyway? It's an oddball use-case, if ever I saw one.
Note: the issue of caring about equality semantics across B-Tree
opclasses of the same type, and the issue of naming unique indexes are
separate issues, AFAICT. No one should confuse them. The only
crossover is that the oddball use-case mentioned could use the named
constraint thing as an escape hatch.
As I've said, I think it's misguided to try to make unique indexes
100% an implementation detail. It's going to fall apart in edge cases,
like the one with multiple unique indexes that I mentioned in my last
e-mail. No one thinks of them that way, including users.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 18, 2015 at 3:40 PM, Peter Geoghegan <pg@heroku.com> wrote:
I think Heikki's concern is something different, although I am not
altogether up to speed on this and may be confused. The issue is:
suppose that process A and process B are both furiously upserting into
the same table with the same key column but, because they have
different costing parameters, process A consistently chooses index X
and process B consistently chooses index Y. In that situation, will
we deadlock, livelock, error out, bloat, or get any other undesirable
behavior, or is that A-OK?Right, that's what I had in mind.
Oh, I see. I totally failed to understand that that was the concern.
I think it'll be fine. The pre-check is going to look for a heap tuple
using one or the other of (say) a pair of equivalent indexes. We might
miss the heap tuple because we picked an index that had yet to have a
physical index tuple inserted, and then hit a conflict on optimistic
insertion (the second phase). But there is no reason to think that
that won't happen anyway. The ordering of operations isn't critical.The one issue you might still have is a duplicate violation, because
you happened to infer the unique index that does not get to tolerate
unique violations as conflicts that can be recovered from (and there
was a race, which is unlikely). I don't really care about this,
though. You really are inferring one particular unique index, and for
reasons like this I think it would be a mistake to try to pretend that
the unique index is 100% an implementation detail. That's why I called
the new clause a unique index inference specification.This hypothetical set of unique indexes will always have n-1 redundant
unique indexes - they must closely match. That's something that calls
into question why the user wants things this way to begin with. Also,
note that one unique index will consistently "win", since the
insertion order is stable (the relcache puts them in OID order). So it
will not be all over the map.
I think this is pretty lousy. The reasons why the user wants things
that way is because they created a UNIQUE index and it got bloated
somehow with lots of dead tuples. So they made a new UNIQUE index on
the same column and then they're planning to do a DROP INDEX
CONCURRENTLY on the old one, which is maybe even now in progress. And
now they start getting duplicate key failures, the avoidance of which
was their whole reason for using UPSERT in the first place. If I were
that user, I'd report that as a bug, and if someone told me that it
was intended behavior, I'd say "oh, so you deliberately designed this
feature to not work some of the time?".
ISTM that we need to (1) decide which operator we're using to compare
and then (2) tolerate conflicts in every index that uses that
operator. In most cases there will only be one, but if there are
more, so be it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 19, 2015 at 1:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I think this is pretty lousy. The reasons why the user wants things
that way is because they created a UNIQUE index and it got bloated
somehow with lots of dead tuples. So they made a new UNIQUE index on
the same column and then they're planning to do a DROP INDEX
CONCURRENTLY on the old one, which is maybe even now in progress. And
now they start getting duplicate key failures, the avoidance of which
was their whole reason for using UPSERT in the first place. If I were
that user, I'd report that as a bug, and if someone told me that it
was intended behavior, I'd say "oh, so you deliberately designed this
feature to not work some of the time?".ISTM that we need to (1) decide which operator we're using to compare
and then (2) tolerate conflicts in every index that uses that
operator. In most cases there will only be one, but if there are
more, so be it.
On reflection, I see your point. I'll try and do something about this too.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 18, 2015 at 2:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's what I had in mind: the inserter tags the tuple with the speculative
insertion token, by storing the token in the t_ctid field. If the inserter
needs to super-delete the tuple, it sets xmax like in a regular deletion,
but also sets another flag to indicate that it was a super-deletion.
I was able to quickly hack up a prototype of this in my hotel room at
pgConf.US. It works fine at first blush, passing the jjanes_upsert
stress tests and my own regression tests without a problem. Obviously
it needs more testing and clean-up before posting, but I was pleased
with how easy this was.
When another backend inserts, and notices that it has a potential conflict
with the first tuple, it tries to acquire a hw-lock on the token. In most
cases, the inserter has long since completed the insertion, and the
acquisition succeeds immediately but you have to check because the token is
not cleared on a completed insertion.
You don't even have to check/take a ShareLock on the token when the
other xact committed/aborted, because you know that if it is there,
then based on that (and based on the fact that it wasn't super
deleted) the tuple is visible/committed, or (in the event of
other-xact-abort) not visible/aborted. In other words, we continue to
only check for a speculative token when the inserting xact is in
flight - we just take the token from the heap now instead. Not much
needs to change, AFAICT.
Regarding the physical layout: We can use a magic OffsetNumber value above
MaxOffsetNumber to indicate that the t_ctid field stores a token rather than
a regular ctid value. And another magic t_ctid value to indicate that a
tuple has been super-deleted. The token and the super-deletion flag are
quite ephemeral, they are not needed after the inserting transaction has
completed, so it's nice to not consume the valuable infomask bits for these
things. Those states are conveniently not possible on an updated tuple, when
we would need the t_ctid field for it's current purpose.
Haven't done anything about this yet. I'm just using an infomask2 bit
for now. Although that was only because I forgot that you suggested
this before having a go at implementing this new t_ctid scheme!
My next revision will have a more polished version of this scheme. I'm
not going to immediately act on Robert's feedback elsewhere (although
I'd like to), owing to time constraints - no reason to deny you the
opportunity to review the entirely unrelated low-level speculative
locking mechanism due to that.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 25, 2015 at 12:42 PM, Peter Geoghegan <pg@heroku.com> wrote:
My next revision will have a more polished version of this scheme. I'm
not going to immediately act on Robert's feedback elsewhere (although
I'd like to), owing to time constraints - no reason to deny you the
opportunity to review the entirely unrelated low-level speculative
locking mechanism due to that.
Attached revision, V3.1, implements this slightly different way of
figuring out if an insertion is speculative, as discussed. We reuse
t_ctid for speculatively inserted tuples. That's where the counter
goes. I think that this is a significant improvement, since there is
no longer any need to touch the proc array for any reason, without
there being any significant disadvantage that I'm aware of. I also
fixed some bitrot, and a bug with index costing (the details aren't
terribly interesting - tuple width wasn't being calculated correctly).
I have worked through Heikki's feedback on documentation and code
tweaks, too (they were mostly just documentation changes).
Notably, I have not:
* Added ON CONFLICT PRIMARY KEY (might get to this later)
* Otherwise altered the inference specification. I have not allowed it
to name a constraint, which Heikki and I favor (Robert wanted to name
an operator directly). Again, just haven't got around to it.
* Done anything new with logical decoding. My handling of conflicts
during transaction reassembly remains questionable. I hope this can be
worked out soon, possibly with input from Andres. He is sitting next
to me at pgConf.US, so maybe we can work something out in person.
* Addressed the concerns held by Heikki and Robert about multiple
equivalent unique indexes. I confirmed through testing that this can
cause unique violations that are arguably spurious. It isn't too bad,
though - with a couple of unique indexes, jjanes_upsert requires high
concurrency (i.e. 128 clients) in order to see one or two or three
such unique violations after a minute or so. But even still, let's fix
it.
As I mentioned, I am at a conference, and obviously there are other
time pressures, so I haven't put as much testing into this revision as
I'd like. I thought that under the circumstances it was preferable to
post what I have sooner rather than later, though.
--
Peter Geoghegan
Attachments:
0004-RLS-support-for-ON-CONFLICT-UPDATE.patchtext/x-patch; charset=US-ASCII; name=0004-RLS-support-for-ON-CONFLICT-UPDATE.patchDownload+357-33
0003-Support-INSERT-.-ON-CONFLICT-UPDATE.patchtext/x-patch; charset=US-ASCII; name=0003-Support-INSERT-.-ON-CONFLICT-UPDATE.patchDownload+2382-159
0002-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchtext/x-patch; charset=US-ASCII; name=0002-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchDownload+177-116
0001-Support-INSERT-.-ON-CONFLICT-IGNORE.patchtext/x-patch; charset=US-ASCII; name=0001-Support-INSERT-.-ON-CONFLICT-IGNORE.patchDownload+2215-134
On 03/26/2015 08:00 PM, Peter Geoghegan wrote:
On Wed, Mar 25, 2015 at 12:42 PM, Peter Geoghegan <pg@heroku.com> wrote:
My next revision will have a more polished version of this scheme. I'm
not going to immediately act on Robert's feedback elsewhere (although
I'd like to), owing to time constraints - no reason to deny you the
opportunity to review the entirely unrelated low-level speculative
locking mechanism due to that.Attached revision, V3.1, implements this slightly different way of
figuring out if an insertion is speculative, as discussed. We reuse
t_ctid for speculatively inserted tuples. That's where the counter
goes. I think that this is a significant improvement, since there is
no longer any need to touch the proc array for any reason, without
there being any significant disadvantage that I'm aware of. I also
fixed some bitrot, and a bug with index costing (the details aren't
terribly interesting - tuple width wasn't being calculated correctly).
Cool. Quickly looking at the patch though - does it actually work as it
is? RelationPutHeapTuple will overwrite the ctid field when the tuple is
put on the page, so I don't think the correct token will make it to disk
as the patch stands. Also, there are a few places where we currently
check if t_ctid equals the tuple's location, and try to follow t_ctid if
it doesn't. I think those need to be taught that t_ctid can also be a token.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 26, 2015 at 2:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Attached revision, V3.1, implements this slightly different way of
figuring out if an insertion is speculative, as discussed. We reuse
t_ctid for speculatively inserted tuples. That's where the counter
goes. I think that this is a significant improvement, since there is
no longer any need to touch the proc array for any reason, without
there being any significant disadvantage that I'm aware of. I also
fixed some bitrot, and a bug with index costing (the details aren't
terribly interesting - tuple width wasn't being calculated correctly).Cool. Quickly looking at the patch though - does it actually work as it is?
The test cases pass, including jjanes_upsert, and stress tests that
test for unprincipled deadlocks. But yes, I am entirely willing to
believe that something that was written in such haste could be broken.
My manual testing was pretty minimal.
Sorry for posting a shoddy patch, but I thought it was more important
to show you that this is perfectly workable ASAP.
RelationPutHeapTuple will overwrite the ctid field when the tuple is put on
the page, so I don't think the correct token will make it to disk as the
patch stands.
Oops - You're right. I find it interesting that this didn't result in
breaking my test cases. I guess that not having proc array locking
might have made the difference with unprincipled deadlocks, which I
could not recreate (and row locking saves us from breaking UPSERT, I
think - although if so the token lock would still certainly be needed
for the IGNORE variant). It is interesting that this wasn't obviously
broken for UPSERT, though. I think it at least suggests that when
testing, we need to be more careful with taking a working UPSERT as a
proxy for a working ON CONFLICT IGNORE.
Also, there are a few places where we currently check if
t_ctid equals the tuple's location, and try to follow t_ctid if it doesn't.
I think those need to be taught that t_ctid can also be a token.
I did fix at least some of those. I thought that the choke point for
doing that was fairly small, entirely confined to one or two routines
with heapam.c. But it would surely be better to follow your suggestion
of using an invalid/magic tuple offset value to be sure that it cannot
possibly occur elsewhere. And I'm still using that infomask2 bit,
which is probably not really necessary. So that needs to change too.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27/03/15 09:14, Peter Geoghegan wrote:
On Thu, Mar 26, 2015 at 2:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
[...]
Oops - You're right. I find it interesting that this didn't result in
breaking my test cases.
[...]
Reminds of the situation where I got an A++ for a COBOL programming
assignment that successfully handled the test data provided - then I
found a major bug when 'idly' reviewing my code! The lecturer (also a
highly experienced developer) was amused when I pointed it out to her,
and she said I still deserved the A++!
Cheers,
Gavin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers