INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Attached is a cumulative patch set - version 2.0 of INSERT ... ON
CONFLICT {UPDATE | IGNORE}.
This revision does not offer a variant implementing approach #1 to
value locking [1]https://wiki.postgresql.org/wiki/Value_locking#.231._Heavyweight_page_locking_.28Peter_Geoghegan.29 (only approach #2), since maintaining both
approaches in parallel has about outlived its usefulness.
I'm calling this version 2.0 because it has RLS support. This is
significant because AFAICT it's the last feature that needs to have
interactions with UPSERT considered. I've worked through a rather long
list of existing interrelated features, implementing support in each
case. I've had feedback from others on what behavior is appropriate
when that wasn't obvious, and have made sure those areas had
appropriate support. This now includes RLS, but past revisions added
support for inheritance, updatable views, statement-level triggers,
postgres_fdw, column-level privileges, partial indexes, exclusion
constraints, and more. Basically, I think we're done with discussing
those aspects, and the semantics/syntax in general, or are pretty
close to done. Certainly, support for these other interrelated
features is quite comprehensive at this point. Now the core mechanism
of the patch should be discussed in detail. The general structure and
design is also interesting. After months and months of discussion, it
now seems very likely that the semantics offered are the right ones.
Since even before V1.0 was posted back in August, that's all that
we've discussed, really (apart from the recent back and forth with
Heikki on value locking bugs, of course).
I've approached RLS along the lines Stephen seemed to think would work
best following extensive discussion [2]/messages/by-id/20150109214041.GK3062@tamriel.snowman.net, or at least I believe that
I've produced RLS support that is what we informally agreed on. All
security barrier quals are treated as WITH CHECK OPTIONs in the
context of ON CONFLICT UPDATE. INSERTs don't have to deal with
UPDATE-related policies/WITH CHECK OPTIONs, but when the update path
is taken, both the INSERT and UPDATE related policies must both pass.
They must pass for the tuple that necessitated taking the UPDATE path
(the locked tuple to be updated), and also the finished tuple added
back to the relation by ExecUpdate(). There are 3 possible calls to
ExecWithCheckOptions() in the context of INSERT ... ON CONFLICT
UPDATE. Those 2 that I just mentioned, that involve UPDATE *and*
INSERT WITH CHECK options, and also the ExecInsert()
ExecWithCheckOptions() call.
RLS support is provided in a separate cumulative commit in the hope
that this makes it easier to review by a subject matter expert.
Documentation [3]http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createpolicy.html and tests covering RLS are provided, of course.
I also include various bugfixes to approach #2 to value locking (these
were all previously separately posted, but are now integrated into the
main ON CONFLICT commit). Specifically, these are fixes for the bugs
that emerged thanks to Jeff Janes' great work on stress testing [4]https://github.com/petergeoghegan/jjanes_upsert -- Peter Geoghegan.
With these fixes, I have been unable to reproduce any problem with
this patch with the test suite, even after many days of running the
script on a quad-core server, with constant concurrent VACUUM runs,
etc. I think that we still need to think about the issues that
transpired with exclusion constraints, but since I couldn't find
another problem with an adapted version of Jeff's tool that tested
exclusion constraints, I'm inclined to think that it should be
possible to support exclusion constraints for the IGNORE variant.
It would be great to have more input on stress testing from Jeff.
Thoughts?
[1]: https://wiki.postgresql.org/wiki/Value_locking#.231._Heavyweight_page_locking_.28Peter_Geoghegan.29
[2]: /messages/by-id/20150109214041.GK3062@tamriel.snowman.net
[3]: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createpolicy.html
[4]: https://github.com/petergeoghegan/jjanes_upsert -- Peter Geoghegan
--
Peter Geoghegan
Attachments:
0008-User-visible-documentation-for-INSERT-.-ON-CONFLICT-.patchtext/x-patch; charset=US-ASCII; name=0008-User-visible-documentation-for-INSERT-.-ON-CONFLICT-.patchDownload+573-59
0007-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patchtext/x-patch; charset=US-ASCII; name=0007-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patchDownload+49-1
0006-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchtext/x-patch; charset=US-ASCII; name=0006-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchDownload+1330-8
0005-RLS-support-for-ON-CONFLICT-UPDATE.patchtext/x-patch; charset=US-ASCII; name=0005-RLS-support-for-ON-CONFLICT-UPDATE.patchDownload+153-21
0004-Project-updates-from-ON-CONFLICT-UPDATE-RETURNING.patchtext/x-patch; charset=US-ASCII; name=0004-Project-updates-from-ON-CONFLICT-UPDATE-RETURNING.patchDownload+39-13
0003-EXCLUDED-expressions-within-ON-CONFLICT-UPDATE.patchtext/x-patch; charset=US-ASCII; name=0003-EXCLUDED-expressions-within-ON-CONFLICT-UPDATE.patchDownload+391-2
0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchtext/x-patch; charset=US-ASCII; name=0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchDownload+2125-132
0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchtext/x-patch; charset=US-ASCII; name=0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchDownload+152-84
On Sat, Jan 10, 2015 at 8:32 PM, Peter Geoghegan <pg@heroku.com> wrote:
I also include various bugfixes to approach #2 to value locking (these
were all previously separately posted, but are now integrated into the
main ON CONFLICT commit). Specifically, these are fixes for the bugs
that emerged thanks to Jeff Janes' great work on stress testing [4].
With these fixes, I have been unable to reproduce any problem with
this patch with the test suite, even after many days of running the
script on a quad-core server, with constant concurrent VACUUM runs,
etc.
I continued with this since posting V2.0. I've run this bash script,
that invokes Jeff's script at various client counts, with runs of
various duration (since each client does a fixed amount of work):
https://github.com/petergeoghegan/jjanes_upsert/blob/master/run_test.sh
As previously discussed, Jeff's script comprehensively verifies the
correctness of the final values of a few thousand rows within a table
after many concurrent upserts, within and across upserting sessions,
and with concurrent deletions, too.
When building Postgres for this stress test, I included Jeff's
modifications that increase the XID burn rate artificially (I chose a
burn rate of X50). This makes anti-wraparound VACUUMs much more
frequent. I'm also looking out for outlier query execution durations,
because in theory they could indicate an unknown lock starvation
problem. I haven't seen any notable outliers after over a week of
testing.
I think that we still need to think about the issues that
transpired with exclusion constraints, but since I couldn't find
another problem with an adapted version of Jeff's tool that tested
exclusion constraints, I'm inclined to think that it should be
possible to support exclusion constraints for the IGNORE variant.
Exclusion constraints were my focus with stress testing today. I
performed equivalent verification of upserts using exclusion
constraints (this is a hack; exclusion constraints are only intended
to be used with the IGNORE variant, but I get better test coverage
than I might otherwise this way). Unfortunately, even with the recent
bugfixes, there are still problems. On this server (rather than my
laptop), with 8 clients I can see errors like this before too long
(note that this output includes custom instrumentation from Jeff):
"""""""
6670 2015-01-17 18:02:54 PST LOG: JJ scan_all 1, relfrozenid -813636509
6670 2015-01-17 18:02:54 PST LOG: JJ freezeLimit -661025537
6670 2015-01-17 18:02:54 PST LOG: JJ freeze_min_age 50000000
vacuum_freeze_table_age 150000000 freeze_table_age 150000000 ReadNew
-611025384
6670 2015-01-17 18:02:54 PST LOG: JJ scan_all 1, relfrozenid -813636101
6670 2015-01-17 18:02:54 PST LOG: JJ transaction ID wrap limit is
1352632427, limited by database with OID 12746
6670 2015-01-17 18:02:54 PST LOG: autovacuum: done processing
database "postgres" at recent Xid of 3683945176 recent mxid of 1
6668 2015-01-17 18:02:54 PST ERROR: conflicting key value violates
exclusion constraint "upsert_race_test_index_excl"
6668 2015-01-17 18:02:54 PST DETAIL: Key (index)=(7142) conflicts
with existing key (index)=(600).
6668 2015-01-17 18:02:54 PST STATEMENT: insert into upsert_race_test
(index, count) values ('7142','1') on conflict
update set count=TARGET.count + EXCLUDED.count
where TARGET.index = EXCLUDED.index
returning upsert_race_test.count
"""""""
It's always an exclusion violation problem that I see here.
As you can see, the query involved has no "unique index inference"
specification, per the hack to make this work with exclusion
constraints (the artificially much greater XID burn rate might have
also increased the likelihood of this error dramatically). You'll also
note that the DETAIL message seems to indicate that this
btree_gist-based exclusion constraint doesn't behave like a unique
constraint at all, because the conflicting new value (7142) is not at
all the same as the existing value (600). But that's wrong -- it's
supposed to be B-Tree-like. In short, there are further race
conditions with exclusion constraints.
I think that the fundamental, unfixable race condition here is the
disconnect between index tuple insertion and checking for would-be
exclusion violations that exclusion constraints naturally have here,
that unique indexes naturally don't have [1]/messages/by-id/54A7C76D.3070101@vmware.com -- Peter Geoghegan (note that I'm talking
only about approach #2 to value locking here; approach #1 isn't in
V2.0). I suspect that the feature is not technically feasible to make
work correctly with exclusion constraints, end of story. VACUUM
interlocking is probably also involved here, but the unfixable race
condition seems like our fundamental problem.
We could possibly spend a lot of time discussing whether or not I'm
right about it being inherently impossible to make INSERT ... ON
CONFLICT IGNORE work with exclusion constraints. However, I strongly
suggest that we cut scope and at least leave them out of any version
that can be committed for 9.5, and instead work on other areas,
because it is at least now clear that they are much harder to get
right than unique constraints. Besides, making exclusion constraints
work with INSERT ... ON CONFLICT IGNORE is nice, but ultimately not
all that important. For that matter I think that INSERT ... ON
CONFLICT IGNORE is more generally not all that important compared to
ON CONFLICT UPDATE. I'd cut scope by cutting ON CONFLICT IGNORE if
that was the consensus....we could add back ON CONFLICT IGNORE in 9.6
when we had a better sense of exclusion constraints here. Exclusion
constraints can never be useful with ON CONFLICT UPDATE anyway.
Please work with me towards a committable patch. I think we have every
chance of committing this for 9.5, with value locking approach #2,
provided we now cut scope a bit. As I mention above, V2.0 has stood up
to more than a week of aggressive, comprehensive stress testing/custom
correctness verification on an 8 core box (plus numerous other stress
tests in months past). UPSERT (which never involved exclusion
constraints) is a very comprehensive and mature effort, and I think it
now needs one big push from a senior community member. I feel that I
cannot do anything more without that input.
[1]: /messages/by-id/54A7C76D.3070101@vmware.com -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 17, 2015 at 6:48 PM, Peter Geoghegan <pg@heroku.com> wrote:
I continued with this since posting V2.0.
Attached version (V2.1) fixes bit-rot caused by the recent changes by
Stephen ("Fix column-privilege leak in error-message paths"). More
precisely, it is rebased on top of today's 17792b commit.
I have not addressed the recently described problems with exclusion
constraints. I hope we can do so shortly. Simply removing IGNORE
support until such time as we straighten that all out (9.6?) seems
like the simplest solution. No need to block the progress of "UPSERT",
since exclusion constraint support was only ever going to be useful
for the less compelling IGNORE variant. What do other people think? Do
you agree with my view that we should shelve IGNORE support for now,
Heikki?
There is one minor bugfix here: I have tightened up the conditions
under which user-defined rule application will be rejected.
Previously, I neglected to specifically check for UPDATE rules when an
INSERT ... ON CONFLICT UPDATE statement was considered. That's been
fixed.
On the stress-testing front, I'm still running Jeff Janes' tool [1]https://github.com/petergeoghegan/jjanes_upsert -- Peter Geoghegan,
while also continuing to use his Postgres modifications to
artificially increase the XID burn rate. However, my personal server
is no longer used for this task. I'm using an AWS EC2 instance - a
r3.8xlarge. This server provides 32 logical cores, and uses an "Intel
Xeon E5-2670 v2 @ 2.50GHz" CPU. It seems reasonable to suppose that
any latent concurrency bugs are more likely to reveal themselves when
using the new server.
Anyone who would like access to the server should contact me
privately. It's a throw-away EC2 instance, so this isn't particularly
difficult to do.
Thanks
[1]: https://github.com/petergeoghegan/jjanes_upsert -- Peter Geoghegan
--
Peter Geoghegan
Attachments:
0008-User-visible-documentation-for-INSERT-.-ON-CONFLICT-.patchtext/x-patch; charset=US-ASCII; name=0008-User-visible-documentation-for-INSERT-.-ON-CONFLICT-.patchDownload+568-52
0007-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patchtext/x-patch; charset=US-ASCII; name=0007-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patchDownload+49-1
0006-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchtext/x-patch; charset=US-ASCII; name=0006-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchDownload+1323-8
0005-RLS-support-for-ON-CONFLICT-UPDATE.patchtext/x-patch; charset=US-ASCII; name=0005-RLS-support-for-ON-CONFLICT-UPDATE.patchDownload+158-25
0004-Project-updates-from-ON-CONFLICT-UPDATE-RETURNING.patchtext/x-patch; charset=US-ASCII; name=0004-Project-updates-from-ON-CONFLICT-UPDATE-RETURNING.patchDownload+39-13
0003-EXCLUDED-expressions-within-ON-CONFLICT-UPDATE.patchtext/x-patch; charset=US-ASCII; name=0003-EXCLUDED-expressions-within-ON-CONFLICT-UPDATE.patchDownload+391-2
0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchtext/x-patch; charset=US-ASCII; name=0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchDownload+2134-135
0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchtext/x-patch; charset=US-ASCII; name=0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchDownload+162-89
On Thu, Jan 29, 2015 at 11:38 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
Simply removing IGNORE support until such time as we straighten
that all out (9.6?) seems like the simplest solution. No need to block
the progress of "UPSERT", since exclusion constraint support was
only ever going to be useful for the less compelling IGNORE variant.
What do other people think? Do you agree with my view that we should
shelve IGNORE support for now, Heikki?
I appreciate the work you're doing and (as a user rather than a
pg-hacker) don't want to butt in but if it would be possible to allow
support for IGNORE for unique but not exclusion constraints that would
be really helpful for my own use cases, where being able to insert
from a dataset into a table containing unique constraints without
having to first check the dataset for uniqueness (within both itself
and the target table) is very useful.
It's possible that I've misunderstood anyway and that you mean purely
that exclusion constraint IGNORE should be shelved until 9.6, in which
case I apologise.
Of course if there's no way to allow unique constraint IGNORE without
exclusion constraints then just ignore me; I (along I'm sure with all
the others who are following this conversation from afar) will be
incredibly grateful for the work you've done either way.
I suppose there's no reason why we couldn't use a no-op ON CONFLICT
UPDATE anyway, but that does seem rather messy and (I imagine) would
involve rather more work (unless the optimizer were able to optimize
away the "update"? I don't know enough to be able to say if it would).
Thanks
Geoff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
On Fri, Jan 30, 2015 at 6:59 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
I appreciate the work you're doing and (as a user rather than a
pg-hacker) don't want to butt in but if it would be possible to allow
support for IGNORE for unique but not exclusion constraints that would
be really helpful for my own use cases, where being able to insert
from a dataset into a table containing unique constraints without
having to first check the dataset for uniqueness (within both itself
and the target table) is very useful.It's possible that I've misunderstood anyway and that you mean purely
that exclusion constraint IGNORE should be shelved until 9.6, in which
case I apologise.
Well, the issue is that we can't really add exclusion constraints to
the IGNORE case later. So the fact that we cannot do exclusion
constraints kind of implies that we can either shelve IGNORE and maybe
look at it later, or accept that we'll never support exclusion
constraints with IGNORE. We'd then include IGNORE without exclusion
constraint support now and forever. I tend to think that we'll end up
doing the latter anyway, but I really don't want to add additional
risk of this not getting into 9.5 by arguing about that now. It
doesn't matter that much.
I suppose there's no reason why we couldn't use a no-op ON CONFLICT
UPDATE anyway
Right. IGNORE isn't really all that compelling for that reason. Note
that this will still lock the unmodified row, though.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
A first (not actually that quick :() look through the patches to see
what actually happened in the last months. I didn't keep up with the
thread.
Generally the split into the individual commits doesn't seem to make
much sense to me. The commits individually (except the first) aren't
indivdiually commitable and aren't even meaningful. Splitting off the
internal docs, tests and such actually just seems to make reviewing
harder because you miss context. Splitting it so that individual piece
are committable and reviewable makes sense, but... I have no problem
doing the user docs later. If you split of RLS support, you need to
throw an error before it's implemented.
0001:
* References INSERT with ON CONFLICT UPDATE, can thus not be committed
independently. I don't think those references really are needed.
* I'm not a fan of the increased code duplication in
ExecCheckRTEPerms(). Can you look into cleaning that up?
* Also the comments inside the ACL_INSERT case still reference UPDATE.
Other than that I think we can just go ahead and commit this ahead of
time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit
message only.
0007:
* "AMs" alone isn't particularly unique.
* Without the context of the discussion "unprincipled deadlocks" aren't
well defined.
* Too many "" words.
* Waiting "too long" isn't defined. Neither is why that'd imply
unprincipled deadlocks. Somewhat understandable with the context of
the discussion, but surely not a couple years down the road.
* As is I don't find the README entry super helpful. It should state
what the reason for doing this is cleary, start at the higher level,
and then move to the details.
* Misses details about the speculative heavyweight locking of tuples.
0002:
* Tentatively I'd say that killspeculative should be done via a separate
function instead of heap_delete()
* I think we should, as you ponder in a comment, do the OCU specific
stuff lazily and/or in a separate function from BuildIndexInfo(). That
function is already quite visible in profiles, and the additional work
isn't entirely trivial.
* I doubt logical decoding works with the patch as it stands.
* The added ereport (i.e. user facing) error message in
ExecInsertIndexTuples won't help a user at all.
* Personally I don't care one iota for comments like "Get information
from the result relation info structure.". Yes, one of these already
exists, but ...
* If a arbiter index is passed to ExecCheckIndexConstraints(), can't we
abort the loop after checking it? Also, do we really have to iterate
over indexes for that case? How about moving the loop contents to a
separate function and using that separately for the arbiter cases?
* Don't like the comment above check_exclusion_or_unique_constraint()'s
much. Makes too much of a special case of OCU
* ItemPointerIsValid
* ExecCheckHeapTupleVisible's comment "It is not acceptable to proceed "
sounds like you're talking with a child or so ;)
* ExecCheckHeapTupleVisible()'s errhint() sounds like an
argument/excuse (actually like a code comment). That's not going to
help a user at all.
* I find the modified control flow in ExecInsert() pretty darn ugly. I
think this needs to be cleaned up. The speculative case should imo be
a separate function or something.
* /*
* This may occur when an instantaneously invisible tuple is blamed
* as a conflict because multiple rows are inserted with the same
* constrained values.
How can this happen? We don't insert multiple rows with the same
command id?
* ExecLockUpdatedTuple() has (too?) many comments, but little overview
of what/why it is doing what it does on a higher level.
* plan_speculative_use_index: "Use the planner to decide speculative
insertion arbiter index" - Huh? " rel is the target to undergo ON
CONFLICT UPDATE/IGNORE." - Which rel?
* formulations as "fundamental nexus" are hard to understand imo.
* Perhaps it has previously been discussed but I'm not convinced by the
reasoning for not looking at opclasses in infer_unique_index(). This
seems like it'd prohibit ever having e.g. case insensitive opclasses -
something surely worthwile.
* Doesn't infer_unique_index() have to look for indisvalid? This isn't
going to work well with a invalid (not to speak for a !ready) index.
* Is ->relation in the UpdateStmt generated in transformInsertStmt ever
used for anything? If so, it'd possibly generate some possible
nastyness due to repeated name lookups. Looks like it'll be used in
transformUpdateStmt
* What's the reason for the !pstate->p_parent? Also why the parens?
pstate->p_is_speculative = (pstate->parentParseState &&
(!pstate->p_parent_cte &&
pstate->parentParseState->p_is_insert &&
pstate->parentParseState->p_is_speculative));
* Why did you need to make %nonassoc DISTINCT and ON nonassoc in the grammar?
* The whole speculative insert logic isn't really well documented. Why,
for example, do we actually need the token? And why are there no
issues with overflow? And where is it documented that a 0 means
there's no token? ...
* Isn't "SpecType" a awfully generic (and nondescriptive) name?
* /* XXX: Make sure that re-use of bits is safe here */ - no, not
unless you change existing checks.
* /*
* Immediately VACUUM "super-deleted" tuples
*/
if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
InvalidTransactionId))
return HEAPTUPLE_DEAD;
Is that branch really needed? Shouldn't it just be happening as a
consequence of the already existing code? Same in SatisfiesMVCC. If
you actually needed that block, it'd need to be done in SatisfiesSelf
as well, no? You have a comment about a possible loop - but that seems
wrong to me, implying that HEAP_XMIN_COMMITTED was set invalidly.
Ok, I can't focus at all any further at this point. But there's enough
comments here that some even might make sense ;)
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 30 January 2015 at 21:58, Peter Geoghegan <pg@heroku.com> wrote:
On Fri, Jan 30, 2015 at 6:59 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
I suppose there's no reason why we couldn't use a no-op ON CONFLICT
UPDATE anywayRight. IGNORE isn't really all that compelling for that reason. Note
that this will still lock the unmodified row, though.
Mmmf. So I would have to make sure that my source tuples were unique
before doing the INSERT (otherwise the first ON CONFLICT UPDATE for a
tuple would block any other)? That's potentially very slow :(
When you say that you can't add exclusion constraints later, do you
mean from a coding point of view or just because people would get
confused whether exclusion constraints could be IGNOREd or not?
Geoff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2 February 2015 at 14:32, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
Mmmf. So I would have to make sure that my source tuples were unique
before doing the INSERT (otherwise the first ON CONFLICT UPDATE for a
tuple would block any other)? That's potentially very slow :(
Replying to my own message, because it occurs to me I might be being
stupid (surely not :) )
When you say "this will still lock the unmodified row" did you mean
just that it's locked to _other_ processes until commit? That would be
much less impactful.
Geoff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/18/2015 04:48 AM, Peter Geoghegan wrote:
I think that the fundamental, unfixable race condition here is the
disconnect between index tuple insertion and checking for would-be
exclusion violations that exclusion constraints naturally have here,
that unique indexes naturally don't have [1] (note that I'm talking
only about approach #2 to value locking here; approach #1 isn't in
V2.0). I suspect that the feature is not technically feasible to make
work correctly with exclusion constraints, end of story. VACUUM
interlocking is probably also involved here, but the unfixable race
condition seems like our fundamental problem.
It's not a fundamental, unfixable race condition. In [1], I gave you
three ideas straight off the top of my head on how that could be fixed.
Please work with me towards a committable patch.
I'm trying...
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/30/2015 01:38 AM, Peter Geoghegan wrote:
I have not addressed the recently described problems with exclusion
constraints. I hope we can do so shortly. Simply removing IGNORE
support until such time as we straighten that all out (9.6?) seems
like the simplest solution. No need to block the progress of "UPSERT",
since exclusion constraint support was only ever going to be useful
for the less compelling IGNORE variant. What do other people think? Do
you agree with my view that we should shelve IGNORE support for now,
Heikki?
No, I don't agree. Let's fix it.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Feb 02, 2015 at 4:48 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
I think that the fundamental, unfixable race condition here is the
disconnect between index tuple insertion and checking for would-be
exclusion violations that exclusion constraints naturally have here,
that unique indexes naturally don't have [1] (note that I'm talking
only about approach #2 to value locking here; approach #1 isn't in
V2.0). I suspect that the feature is not technically feasible to make
work correctly with exclusion constraints, end of story. VACUUM
interlocking is probably also involved here, but the unfixable race
condition seems like our fundamental problem.It's not a fundamental, unfixable race condition. In [1], I gave you
three ideas straight off the top of my head on how that could be fixed.
That was different - I tried to make it work by fixing some bugs
there. However, I'm now finding myself up against these new bugs. I
think that the underlying cause is the lack of any real locking
(unlike with the B-Tree AM) in *both* cases, but I don't even know
that for sure. The error messages you see are quite odd - why should a
btree_gist-based exclusion constraint cause a violation when
non-conflicting values are inserted? There is some other race
condition here. This wasn't a livelock (or a deadlock), which is what
your comments in early January apply to. I think that this has
something to do with VACUUM interlocking. But with the B-Tree AM
(which we're handling differently, by re-using infrastructure used for
deferred unique constraints), things work quite well. The patch stands
up to Jeff's vigorous stress-tests.
I'm not fundamentally in disagreement with you about any of this. All
I'm saying is that we should cut scope today. We're not precluding
picking up an IGNORE feature that does support exclusion constraints
in the future. Why should we insist upon having that in the first cut?
It's both significantly harder, and significantly less useful to
users, and so cutting that makes perfect sense AFAICT. As I've said
many times, exclusion constraint support was only ever going to be
useful to the IGNORE variant (I've tested exclusion constraints by
contriving a case to make them do UPSERTs, but this is only for the
benefit of the stress-test).
Thanks
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/30/2015 01:38 AM, Peter Geoghegan wrote:
On the stress-testing front, I'm still running Jeff Janes' tool [1],
while also continuing to use his Postgres modifications to
artificially increase the XID burn rate.
I followed the instructions in README.md to reproduce this. I downloaded
the tool, applied the upsert patchset, applied the hack to
parse_clause.c as instructed in the README.md file, installed
btree_gist, and ran count_upsert_exclusion.pl.
It failed immediately with an assertion failure:
TRAP: FailedAssertion("!(node->spec != SPEC_INSERT || node->arbiterIndex
!= ((Oid) 0))", File: "nodeModifyTable.c", Line: 1619)
Is that just because of the hack in parse_clause.c?
With assertions disabled, count_upsert_exclusion.pl ran successfully to
the end. I also tried running "VACUUM FREEZE upsert_race_test" in a loop
in another session at the same time, but it didn't make a difference.
How quickly do you see the errors?
I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and
set jj_xid=10000 to increase XID burn rate, but I'm still not seeing any
errors.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 3, 2015 at 2:05 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
TRAP: FailedAssertion("!(node->spec != SPEC_INSERT || node->arbiterIndex !=
((Oid) 0))", File: "nodeModifyTable.c", Line: 1619)Is that just because of the hack in parse_clause.c?
Yes. I never built with assertions and so didn't see this, but it
doesn't matter.
With assertions disabled, count_upsert_exclusion.pl ran successfully to the
end. I also tried running "VACUUM FREEZE upsert_race_test" in a loop in
another session at the same time, but it didn't make a difference. How
quickly do you see the errors?I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set
jj_xid=10000 to increase XID burn rate, but I'm still not seeing any errors.
Did you build fully optimized, assertion-free code? I've been doing
that. I found it necessary to recreate some of the bugs Jeff's tool
caught. I also think that I might have needed an 8 core box to see it,
but less sure about that.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/03/2015 08:17 PM, Peter Geoghegan wrote:
On Tue, Feb 3, 2015 at 2:05 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:TRAP: FailedAssertion("!(node->spec != SPEC_INSERT || node->arbiterIndex !=
((Oid) 0))", File: "nodeModifyTable.c", Line: 1619)Is that just because of the hack in parse_clause.c?
Yes. I never built with assertions and so didn't see this, but it
doesn't matter.With assertions disabled, count_upsert_exclusion.pl ran successfully to the
end. I also tried running "VACUUM FREEZE upsert_race_test" in a loop in
another session at the same time, but it didn't make a difference. How
quickly do you see the errors?I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set
jj_xid=10000 to increase XID burn rate, but I'm still not seeing any errors.Did you build fully optimized, assertion-free code? I've been doing
that. I found it necessary to recreate some of the bugs Jeff's tool
caught. I also think that I might have needed an 8 core box to see it,
but less sure about that.
I had compiled with -O0, but without assertions. I tried now again with
-O3. It's been running for about 10 minutes now, and I haven't seen any
errors.
Since you can reproduce this, it would be good if you could debug this.
The error message where the alleged duplicate key actually had a
different value is a bit scary. Makes me wonder if it might be a bug
with exclusion constraints in general, or just with the patch.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Feb 4, 2015 at 9:54 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
I had compiled with -O0, but without assertions. I tried now again with -O3.
It's been running for about 10 minutes now, and I haven't seen any errors.
Did you run with an artificially high XID burn rate (i.e. did you also
apply Jeff's modifications to Postgres, and specify a high burn rate
using his custom GUC)? Maybe that was important.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
A first (not actually that quick :() look through the patches to see
what actually happened in the last months. I didn't keep up with the
thread.
So, let me get this out of the way: This is the first in-depth
technical review that this work has had in a long time. Thank you for
your help here.
Generally the split into the individual commits doesn't seem to make
much sense to me. The commits individually (except the first) aren't
indivdiually commitable and aren't even meaningful. Splitting off the
internal docs, tests and such actually just seems to make reviewing
harder because you miss context. Splitting it so that individual piece
are committable and reviewable makes sense, but... I have no problem
doing the user docs later. If you split of RLS support, you need to
throw an error before it's implemented.
I mostly agree. Basically, I did not intend for all of the patches to
be individually committed. The mechanism by which EXCLUDED.*
expressions are added is somewhat novel, and deserves to be
independently *considered*. I'm trying to show how the parts fit
together more so than breaking things down in to smaller commits (as
you picked up on, 0001 is the exception - that is genuinely intended
to be committed early). Also, those commit messages give me the
opportunity to put those parts in their appropriate context vis-a-vis
our discussions. They refer to the Wiki, for example, or reasons why
pg_stat_statements shouldn't care about ExcludedExpr. Obviously the
final commit messages won't look that way.
0001:
* References INSERT with ON CONFLICT UPDATE, can thus not be committed
independently. I don't think those references really are needed.
* I'm not a fan of the increased code duplication in
ExecCheckRTEPerms(). Can you look into cleaning that up?
* Also the comments inside the ACL_INSERT case still reference UPDATE.Other than that I think we can just go ahead and commit this ahead of
time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit
message only.
Cool. Attached revision makes those changes.
0007:
* "AMs" alone isn't particularly unique.
* Without the context of the discussion "unprincipled deadlocks" aren't
well defined.
* Too many "" words.
* Waiting "too long" isn't defined. Neither is why that'd imply
unprincipled deadlocks. Somewhat understandable with the context of
the discussion, but surely not a couple years down the road.
* As is I don't find the README entry super helpful. It should state
what the reason for doing this is cleary, start at the higher level,
and then move to the details.
* Misses details about the speculative heavyweight locking of tuples.
Fair points. I'll work through that feedback.
Actually, I think we should memorialize that "unprincipled deadlocks"
should be avoided in some more general way, since they are after all a
general problem that we've seen elsewhere. I'm not sure about how to
go about doing that, though.
0002:
* Tentatively I'd say that killspeculative should be done via a separate
function instead of heap_delete()
Really? I guess if that were to happen, it would entail refactoring
heap_delete() to call a static function, which was also called by a
new kill_speculative() function that does this. Otherwise, you'd have
far too much duplication.
* I think we should, as you ponder in a comment, do the OCU specific
stuff lazily and/or in a separate function from BuildIndexInfo(). That
function is already quite visible in profiles, and the additional work
isn't entirely trivial.
Okay.
* I doubt logical decoding works with the patch as it stands.
I thought so. Perhaps you could suggest a better use of the available
XLOG_HEAP_* bits. I knew I needed to consider that more carefully
(hence the XXX comment), but didn't get around to it.
* The added ereport (i.e. user facing) error message in
ExecInsertIndexTuples won't help a user at all.
So, this:
/* Skip this index-update if the predicate isn't satisfied */ if (!ExecQual(predicate, econtext, false)) + { + if (arbiterIdx == indexRelation->rd_index->indexrelid) + ereport(ERROR, + (errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION), + errmsg("partial arbiter unique index has predicate that does not cover tuple proposed for insertion"), + errdetail("ON CONFLICT inference clause implies that the tuple proposed for insertion actually be covered by partial predicate for index \"%s\".", + RelationGetRelationName(indexRelation)), + errhint("ON CONFLICT inference clause must infer a unique index that covers the final tuple, after BEFORE ROW INSERT triggers fire."), + errtableconstraint(heapRelation, + RelationGetRelationName(indexRelation)))); continue; + }
Yeah, that isn't a great error message. This happens here because you
are using a partial unique index (and so you must have had an
inference specification with a "WHERE" to get here). However, what you
actually went to insert would not be covered by this partial unique
index, and so couldn't ever take the alternative path, and so is
likely not thought out. Maybe it would be better to silently always
let the INSERT succeed as an INSERT. *That* actually wasn't really
discussed - this is all my idea.
* Personally I don't care one iota for comments like "Get information
from the result relation info structure.". Yes, one of these already
exists, but ...
Okay.
* If a arbiter index is passed to ExecCheckIndexConstraints(), can't we
abort the loop after checking it? Also, do we really have to iterate
over indexes for that case? How about moving the loop contents to a
separate function and using that separately for the arbiter cases?
Well, the failure to do that implies very few extra cycles, but sure.
I'll add a new reason to break at the end, when
check_exclusion_or_unique_constraint() is called in respect of a
particular (inferred) arbiter unique index.
* Don't like the comment above check_exclusion_or_unique_constraint()'s
much. Makes too much of a special case of OCU
I guess I should just refer to speculative insertion.
* ItemPointerIsValid
What about it?
* ExecCheckHeapTupleVisible's comment "It is not acceptable to proceed "
sounds like you're talking with a child or so ;)
Fair point. I should say "It would not be consistent with the
guarantees of the higher isolation levels..."
* ExecCheckHeapTupleVisible()'s errhint() sounds like an
argument/excuse (actually like a code comment). That's not going to
help a user at all.
Really? I thought it might be less than intuitive that higher
isolation levels cannot decide to do nothing on the basis of something
not in their MVCC snapshot. But come to think of it, yeah, that
errhint() isn't adding much over the main error message.
* I find the modified control flow in ExecInsert() pretty darn ugly. I
think this needs to be cleaned up. The speculative case should imo be
a separate function or something.
Hmm. I'm not quite sold on that. Basically, if we did that, we'd still
have a function that was more or less a strict superset of
ExecInsert(). What have we saved?
What I do agree with is that ExecInsert() should be refactored to make
the common case (a vanilla insert) look like the common case, whereas
the uncommon case (an upsert) should have that dealt with specially.
There is room for improvement. Is that a fair compromise?
* /*
* This may occur when an instantaneously invisible tuple is blamed
* as a conflict because multiple rows are inserted with the same
* constrained values.
How can this happen? We don't insert multiple rows with the same
command id?
This is a cardinality violation [1]https://wiki.postgresql.org/wiki/UPSERT#.22Cardinality_violation.22_errors_in_detail -- Peter Geoghegan. It can definitely happen - just
try the examples you see on the Wiki. This is possible because I
modified heap_lock_tuple() to return HeapTupleInvisible (and not just
complain directly when HeapTupleSatisfiesUpdate() returns
"HeapTupleInvisible"). It's also possible because we're using a
DirtySnapshot at various points. This is sort of like how ExecUpdate()
handles a return value of "HeapTupleSelfUpdated" from heap_update().
Not quite though, because 1. ) I prefer to throw an error (rather than
silently not UPDATE that slot), and 2. ) we're not dealing with MVCC
semantics, so the return values are different in both cases. The
*nature* of the situation handled is similar between UPSERTs (in
ExecLockUpdatedTuple()) and vanilla UPDATEs (in ExecUpdate()), though.
Does that make sense?
* ExecLockUpdatedTuple() has (too?) many comments, but little overview
of what/why it is doing what it does on a higher level.
Fair point. Seems like material for a better worked out executor README.
* plan_speculative_use_index: "Use the planner to decide speculative
insertion arbiter index" - Huh? " rel is the target to undergo ON
CONFLICT UPDATE/IGNORE." - Which rel?
Sorry, that's an obsolete comment (the function signature changed). It
should refer to the target of the Query being planned.
* formulations as "fundamental nexus" are hard to understand imo.
I'm trying to suggest that INSERT ... ON CONFLICT UPDATE is not quite
two separate top-level commands, and yet is also not a new, distinct
type of top-level command. This is mostly a high level design decision
that maximizes code reuse.
* Perhaps it has previously been discussed but I'm not convinced by the
reasoning for not looking at opclasses in infer_unique_index(). This
seems like it'd prohibit ever having e.g. case insensitive opclasses -
something surely worthwile.
I don't think anyone gave that idea the thumbs-up. However, I really
don't see the problem. Sure, we could have case insensitive opclasses
in the future, and you may want to make a unique index using one. But
then it becomes a matter of whatever unique indexes are available. The
limitation is only that you cannot explicitly indicate that you want a
certain opclass. It comes down to whatever unique indexes happen to be
available, since of course taking the alternative path is arbitrated
by a would-be unique violation. It's a bit odd that we're leaving it
up to the available indexes to decide on semantics like that, but the
problem is so narrow and the solution so involved that I'd argue it's
acceptable.
* Doesn't infer_unique_index() have to look for indisvalid? This isn't
going to work well with a invalid (not to speak for a !ready) index.
It does (check IndexIsValid()). I think the mistake I made here was
not checking IndexIsReady(), since that is an additional concern above
what the similar get_relation_info() function must consider.
* Is ->relation in the UpdateStmt generated in transformInsertStmt ever
used for anything? If so, it'd possibly generate some possible
nastyness due to repeated name lookups. Looks like it'll be used in
transformUpdateStmt
What, you mean security issues, for example? I have a hard time seeing
how that could work in practice, given that the one and only target
RTE is marked with the appropriate updatedCols originating from
transformUpdateStmt(). Still, it is a concern generally - better safe
than sorry. I was thinking of plugging it by ensuring that the
Relations matched, but that might not be good enough. Maybe it would
be better to bite the bullet and have transformUpdateStmt() use the
same Relation directly, which is something I hoped to avoid (better to
have transformUpdateStmt() know as little as possible about this, I'd
say).
* What's the reason for the !pstate->p_parent? Also why the parens?
pstate->p_is_speculative = (pstate->parentParseState &&
(!pstate->p_parent_cte &&
pstate->parentParseState->p_is_insert &&
pstate->parentParseState->p_is_speculative));
You mean the "!pstate->p_parent_cte"? That's there because you can get
queries to segfault if this logic doesn't consider that a
data-modifying CTE can have an UPDATE that appears within a CTE
referenced from an INSERT. :-)
* Why did you need to make %nonassoc DISTINCT and ON nonassoc in the grammar?
To prevent a shift/reduce conflict, I changed the associativity.
Without this, here are the details of State 700, which has the
conflict (from gram.output):
"""""
State 700
1465 opt_distinct: DISTINCT .
1466 | DISTINCT . ON '(' expr_list ')'
ON shift, and go to state 1094
ON [reduce using rule 1465 (opt_distinct)]
$default reduce using rule 1465 (opt_distinct)
"""""
* The whole speculative insert logic isn't really well documented. Why,
for example, do we actually need the token? And why are there no
issues with overflow? And where is it documented that a 0 means
there's no token? ...
Fair enough. Presumably it's okay that overflow theoretically could
occur, because a race is all but impossible. The token represents a
particular attempt by some backend at inserting a tuple, that needs to
be waited on specifically only if it is their active attempt (and the
xact is still running). Otherwise, you get unprincipled deadlocks.
Even if by some incredibly set of circumstances it wraps around, worst
case scenario you get an unprinciped deadlock, which is hardly the end
of the world given the immense number of insertions required, and the
immense unlikelihood that things would work out that way - it'd be
basically impossible.
I'll document the "0" thing.
* Isn't "SpecType" a awfully generic (and nondescriptive) name?
OK. That'll be changed.
* /* XXX: Make sure that re-use of bits is safe here */ - no, not
unless you change existing checks.
I think that this is a restatement of your remarks on logical decoding. No?
* /*
* Immediately VACUUM "super-deleted" tuples
*/
if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
InvalidTransactionId))
return HEAPTUPLE_DEAD;
Is that branch really needed? Shouldn't it just be happening as a
consequence of the already existing code? Same in SatisfiesMVCC. If
you actually needed that block, it'd need to be done in SatisfiesSelf
as well, no? You have a comment about a possible loop - but that seems
wrong to me, implying that HEAP_XMIN_COMMITTED was set invalidly.
Indeed, this code is kind of odd. While I think the omission within
SatisfiesSelf() may be problematic too, if you really want to know why
this code is needed, uncomment it and run Jeff's stress-test. It will
reliably break.
This code:
"""""
if (HeapTupleHeaderXminInvalid(tuple))
return HEAPTUPLE_DEAD;
"""""
and this code:
"""""
/*
* Immediately VACUUM "super-deleted" tuples
*/
if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
InvalidTransactionId))
return HEAPTUPLE_DEAD;
"""""
are not equivalent (nor is the latter a superset of the former). Maybe
they should be, but they're not. What's more, heap tuple header raw
xmin has never been able to change, and I don't think there is any
reason for it to be InvalidTransactionId. See my new comments within
EvalPlanQualFetch() remarking on how it's now possible for that to
change (before, the comment claimed that it wasn't possible).
Ok, I can't focus at all any further at this point. But there's enough
comments here that some even might make sense ;)
Most do. :-)
Thanks.
[1]: https://wiki.postgresql.org/wiki/UPSERT#.22Cardinality_violation.22_errors_in_detail -- Peter Geoghegan
--
Peter Geoghegan
Attachments:
0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchtext/x-patch; charset=US-ASCII; name=0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchDownload+176-115
On Wed, Feb 4, 2015 at 10:03 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Feb 4, 2015 at 9:54 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:I had compiled with -O0, but without assertions. I tried now again with -O3.
It's been running for about 10 minutes now, and I haven't seen any errors.Did you run with an artificially high XID burn rate (i.e. did you also
apply Jeff's modifications to Postgres, and specify a high burn rate
using his custom GUC)? Maybe that was important.
Excuse me: I now see that you specifically indicated that you did. But
looks like your XID burn rate was quite a lot higher than mine
(assuming that you were consistent in using " jj_xid=10000", although
I'm not asserting that that was significant).
I attach a log of output from an example session where exclusion
constraints are shown to break (plus the corresponding server log,
plus /proc/cpuinfo on the off chance that that's significant). As you
can from the fact that the span of time recorded in the log is only a
couple of minutes, this is really easy for me to
recreate....sometimes. I could not recreate the problem with only 4
clients (on this 8 core server) after a few dozen attempts, and then I
couldn't recreate the issue at all, so clearly those details matter.
It might have something to do with CPU scaling, which I've found can
significantly affect outcomes for things like this (looks like my
hosting provider changed settings in the system BIOS recently, such
that I cannot set the CPU governor to "performance").
Perhaps you could take a crack at recreating this, Jeff?
Thanks
--
Peter Geoghegan
Attachments:
upsert_exclusion_bugs.txttext/plain; charset=US-ASCII; name=upsert_exclusion_bugs.txtDownload
On 29 January 2015 at 23:38, Peter Geoghegan <pg@heroku.com> wrote:
On Sat, Jan 17, 2015 at 6:48 PM, Peter Geoghegan <pg@heroku.com> wrote:
I continued with this since posting V2.0.
Attached version (V2.1) fixes bit-rot caused by the recent changes by
Stephen ("Fix column-privilege leak in error-message paths"). More
precisely, it is rebased on top of today's 17792b commit.
Patch 0002 no longer applies due to a conflict in
src/backend/executor/execUtils.c.
Thom
On Wed, Feb 4, 2015 at 04:49:46PM -0800, Peter Geoghegan wrote:
On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
A first (not actually that quick :() look through the patches to see
what actually happened in the last months. I didn't keep up with the
thread.So, let me get this out of the way: This is the first in-depth
technical review that this work has had in a long time. Thank you for
your help here.
I looked at all the patches too. The patch is only 9k lines, not huge.
Other than the locking part, the biggest part of this patch is adjusting
things so that an INSERT can change into an UPDATE. The code that
handles SELECT/INSERT/UPDATE/DELETE is already complex, and this makes
it even more so. I have no idea how we can be sure we have hit every
single case, but I am also unclear how we will _ever_ know we have hit
them all.
We know people want this feature, and this patch seems to be our best
bet to getting it. If we push this off for 9.6, I am not sure what that
buys us.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Feb 6, 2015 at 1:51 PM, Bruce Momjian <bruce@momjian.us> wrote:
Other than the locking part, the biggest part of this patch is adjusting
things so that an INSERT can change into an UPDATE.
Thanks for taking a look at it. That's somewhat cleaned up in the
attached patchseries - V2.2. This has been rebased to repair the minor
bit-rot pointed out by Thom.
Highlights:
* Better parser representation.
The auxiliary UPDATE never uses its own RangeVar (in fact,
UpdateStmt.relation is never set). This eliminates any possibility of
repeated name lookup problems, addressing Andres' concern. But it also
results in better code. The auxiliary UPDATE is not modified by the
parent INSERT at all - rather, the UPDATE knows to fetch its target
Relation from the parsestate parent INSERT. There is no need to
artificially cut off the parent within the auxiliary UPDATE to make
sure the independent RTE is not visible (during parse analysis, prior
to merging the two, as happened in earlier revisions) - that was a
kludge that I'm glad to be rid of. There is no merging of distinct
INSERT and UPDATE Relations/RTEs because there is only ever one
Relation/RTE to begin with. Previously, the merging merged RTE
selectedCols and updatedCols into the parent INSERT (for column-level
privileges, for example). I'm also a lot less cute about determining
whether an UPDATE is an auxiliary UPDATE from within the parser, which
was also a concern raised by Andres.
About 90% of the special case code previously in transformInsertStmt()
is now in setTargetTable(). This is a significant improvement all
around, since the code is now more consistent with existing parse
analysis code - setTargetTable() is naturally where the auxilary
UPDATE figures out details on its target, and builds an EXCLUDED RTE
and adds it to the namespace as a special case (just like for regular
UPDATE targets, which similarly get added to the namespace +
joinlist).
All of this implies a slight behavioral change (which is documented):
The TARGET.* alias is now visible everywhere. So you see it within
every node of EXPLAIN output, and if you want to qualify a RETURNING
column, the TARGET.* alias must be used (not the original table name).
I think that this is an improvement too, although it is arguably a
slight behavioral change to INSERTs in general (can't think why anyone
would particularly want to qualify with an alias in INSERT's
RETURNING, though). Note that the EXCLUDED.* pseudo-alias is still
only visible within the UPDATE's targetlist and WHERE clause. I think
it would be a bad idea to make the EXCLUDED.* tuples visible from
RETURNING [1]/messages/by-id/CAM3SWZTcpy9rroLM3TkfuU4HDLrEtuGzxLptGn2vLhVAFwQCVA@mail.gmail.com -- Peter Geoghegan.
* Cleaner ExecInsert() control flow. Andres rightly complained that
the existing control flow was convoluted. I believe he will find this
revision a lot clearer, although I have not gone so far as creating
something like an ExecUpsert().
* Better documentation. The executor README has been overhauled to
describe the flow of things from a higher level. The procarray changes
are better documented by comments, too.
* Special work previously within BuildIndexInfo() that is needed for
unique indexes for the UPSERT case only is now done only in the UPSERT
case. There is now no added overhead in BuildIndexInfo() for existing
cases.
* Worked on feedback on various points of style raised by Andres (e.g.
an errhint() was removed).
* Better explanation of the re-use of XLOG_HEAP* flag bits. I believe
that it's fine to reuse the "(1<<7)" bit, given that each distinct use
of the bit can only appear in distinct record types (that is, the bit
is used by xl_heap_multi_insert, and now xl_heap_delete). Those two
uses should be mutually exclusive. It's not as if we've had to be
economical with the use of heap flag XLog record bits before now, so
the best approach here isn't obvious. For now, I see no problem with
this reuse.
* SnapshotSelf (that is, HeapTupleSatisfiesSelf()) has additions
analogous to previous additions to the HeapTupleSatisfiesVacuum() and
HeapTupleSatisfiesMVCC() visibility routines. I still don't think that
the changes to tqual.c are completely satisfactory, but as long as
they're directly necessary (which they evidently are - Jeff's
stress-testing tool shows that) then I should at least make the
appropriate changes everywhere. We should definitely focus on why
they're necessary, and consider if we can do better.
* There was some squashing of commits, since Andres felt that they
weren't all useful as separate commits. I've still split out the RTE
permissions commit, as well as the RLS commit (plus the documentation
and test commits, FWIW). I hope that this will make it easier to
review parts of the patch, without being generally excessive.
When documentation and tests are left out, the entire patch series is left at:
68 files changed, 2958 insertions(+), 297 deletions(-)
which is not too big.
Thanks
[1]: /messages/by-id/CAM3SWZTcpy9rroLM3TkfuU4HDLrEtuGzxLptGn2vLhVAFwQCVA@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan