INSERT ... ON CONFLICT {UPDATE | IGNORE}
Attached WIP patch extends the INSERT statement, adding a new ON
CONFLICT {UPDATE | IGNORE} clause. This allows INSERT statements to
perform UPSERT operations (if you want a more formal definition of
UPSERT, I refer you to my pgCon talk's slides [1]http://www.pgcon.org/2014/schedule/attachments/327_upsert_weird.pdf, ("Goals for UPSERT in Postgres"), or the thread in
which I delineated the differences between SQL MERGE and UPSERT [2]/messages/by-id/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com).
The patch builds on previous work in this area, and incorporates
feedback from Kevin and Andres.
Overview
=======
Example usage:
INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT UPDATE
SET val = 'update';
Essentially, the implementation has all stages of query processing
track some auxiliary UPDATE state. So, for example, during parse
analysis, UPDATE transformation occurs in an ad-hoc fashion tightly
driven by the parent INSERT, but using the existing infrastructure
(i.e. transformStmt()/transformUpdateStmt() is called, and is
insulated from having to care about the feature as a special case).
There are some restrictions on what this auxiliary update may do, but
FWIW there are considerably fewer than those that the equivalent MySQL
or SQLite feature imposes on their users. All of the following SQL
queries are valid with the patch applied:
-- Nesting within wCTE:
WITH t AS (
INSERT INTO z SELECT i, 'insert'
FROM generate_series(0, 16) i
ON CONFLICT UPDATE SET v = v || 'update' -- use of
operators/functions in targetlist
RETURNING * -- only projects inserted tuples, never updated tuples
)
SELECT * FROM t JOIN y ON t.k = y.a ORDER BY a, k;
-- IGNORE variant:
INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT IGNORE;
-- predicate within UPDATE auxiliary statement (row is still locked
when the UPDATE predicate isn't satisfied):
INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT UPDATE
WHERE val != 'delete';
As with SQL MERGE (at least as implemented in other systems),
subqueries may not appear within the UPDATE's targetlist, nor may they
appear within the special WHERE clause. But the "INSERT part" of the
query has no additional limitations, so you may for example put
subqueries within a VALUES() clause, or INSERT...SELECT...ON CONFLICT
UPDATE... just as you'd expect. INSERT has been augmented with a new
clause, but that clause does not unreasonably fail to play nice with
any other aspect of insertion. (Actually, that isn't quite true, since
at least for now table inheritance, updatable views and foreign tables
are unsupported. This can be revisited.)
I think that without initially realizing it, I copied the SQLite
syntax [3]https://sqlite.org/lang_conflict.html. However, unlike with that SQLite feature, CONFLICT only
refers to a would-be duplicate violation, and not a violation of any
other kind of constraint.
How this new approach works (Executor and Optimizer stuff)
============================================
During the execution of the parent ModifyTable, a special auxiliary
subquery (the UPDATE ModifyTable) is considered as a special case.
This is not a subplan of the ModifyTable node in the conventional
sense, and so does not appear within EXPLAIN output. However, it is
more or less independently planned, and entirely driven by the INSERT
ModifyTable. ExecModifyTable() is never called with this special
auxiliary plan state passed directly. Rather, its parent manages the
process as the need arises. ExecLockUpdateTuple() locks and
potentially updates tuples, using the EvalPlanQual() mechanism (even
at higher isolation levels, with appropriate precautions).
The per-tuple expression context of the auxiliary query/plan is used
with EvalPlanQual() from within ExecLockUpdateTuple() (the new routine
tasked with locking and updating on conflict). There is a new
ExecUpdate() call site within ExecLockUpdateTuple(). Given the
restrictions necessarily imposed on this pseudo-rescanning
(principally the outright rejection of anything that necessitates
PARAM_EXEC parameters during planning), this is safe, as far as I'm
aware. It is convenient to be able to re-use infrastructure in such a
way as to more or less handle the UPDATE independently, driven by the
INSERT, except for execution which is more directly handled by the
INSERT (i.e. there is no ExecModifyTable() call in respect of this new
auxiliary ModifyTable plan). Granted, it is kind of bizarre that the
auxiliary query may have a more complex plan than is necessary for our
purposes, but it doesn't actually appear to be a problem when
"rescanning" (Like a SELECT FOR UPDATE/FOR SHARE's node, we call
EvalPlanQualSetTuple() directly). It is likely worthwhile to teach the
optimizer that we really don't care about how the one and only base
rel within the UPDATE auxiliary subquery (the target table) is
scanned, if only to save a few cycles. I have (temporarily) hacked the
optimizer to prevent index-only scans, which are problematic here, by
adding disable_cost when a query parse tree that uses the feature is
seen. Although what I've done is a temporary kludge, the basic idea of
forcing a particular type of relation scan has a precedent: UPDATE
WHERE CURRENT OF artificially forces a TID scan, because only a TID
scan will work correctly there. I couldn't come up with a convenient
way to artificially inject disable_cost into alternative scan types,
in the less invasive style of isCurrentOf, because there is no
convenient qual to target within cost_qual_eval().
As in previous incarnations, we lock each tuple (although, of course,
only with the UPDATE variant). We may or may not also actually proceed
with the update, depending on whether or not the user-specified
special update predicate (if any) is satisfied. But if we do,
EvalPlanQual() is (once the tuple is locked) only ever evaluated on a
conclusively committed and locked-by-us conflict tuple as part of the
process of updating, even though it's possible for the UPDATE
predicate to be satisfied where conceivably it would not be satisfied
by the tuple version actually visible to the command's MVCC snapshot.
I think this is the correct behavior. We all seem to be in agreement
that we should update at READ COMMITTED if *no* version of the tuple
is visible. It seems utterly arbitrary to me to suggest that on the
one hand it's okay to introduce one particular "MVCC violation", but
not another equivalent one. The first scenario is one in which we
update despite our update's (or rather insert's) "predicate" not being
satisfied (according to our MVCC snapshot). The second scenario is one
in which the same "predicate" is also not satisfied according to our
MVCC snapshot, but in a slightly different way. Why bother introducing
a complicated distinction, if it's a distinction without a difference?
I'd rather have a behavior that is consistent, easy to reason about,
and easy to explain. And so, the predicate is considered once, after
conclusively locking a conflict tuple.
It feels natural and appropriate to me that if the special UPDATE qual
isn't satisfied, we still lock the tuple. After all, in order to make
a conclusive determination about the qual not being satisfied, we need
to lock the tuple. This happens to insulate ExecUpdate() from having
to care about "invisible tuples", which are now possible (although we
still throw an error, just with a useful error message that phrases
the problem in reference to this new feature).
Of course, at higher isolation levels serialization errors are thrown
when something inconsistent with the higher level's guarantees would
otherwise need to occur (even for the IGNORE variant). Still,
interactions with SSI, and preserving the guarantees of SSI should
probably be closely considered by a subject matter expert.
Omission
=======
The patch currently lacks a way of referencing datums rejected for
insertion when updating. The way MySQL handles the issue seems
questionable. They allow you to do something like this:
INSERT INTO upsert (key, val) VALUES (1 'val') ON DUPLICATE KEY UPDATE
val = VALUES(val);
The implication is that the updated value comes from the INSERT's
VALUES() list, but emulating that seems like a bad idea. In general,
at least with Postgres it's entirely possible that values rejected
differ from the values appearing in the VALUES() list, due to the
effects of before triggers. I'm not sure whether or not we should
assume equivalent transformations during any UPDATE before triggers.
This is an open item. I think it makes sense to deal with it a bit later.
"Value locking"
===========
To date, on-list discussion around UPSERT has almost exclusively
concerned what I've called "value locking"; the idea of locking values
in unique indexes in the abstract (to establish the right to insert
ahead of time). There was some useful discussion on this question
between myself and Heikki back around December/January. Ultimately, we
were unable to reach agreement on an approach and discussion tapered
off. However, Heikki did understand the concerns that informed by
design. He recognized the need to be able to easily *release* value
locks, so as to avoid "unprincipled deadlocks", where under high
concurrency there are deadlocks between sessions that only UPSERT a
single row at a time. I'm not sure how widely appreciated this point
is, but I believe that Heikki appreciates it. It is a very important
point in my opinion. I don't want an implementation that is in any way
inferior to the "UPSERT looping subxact" pattern does (i.e. the plpsql
thing that the docs suggest).
When we left off, Heikki continued to favor an approach that involved
speculatively inserting heap tuples, and then deleting them in the
event of a conflict. This design was made more complicated when the
need to *release* value locks became apparent (Heikki ended up making
some changes to HeapTupleSatisfiesDirty(), as well as sketching a
design for what you might call a "super delete", where xmin can be set
to InvalidTransactionId for speculatively-inserted heap tuples). After
all, it wasn't as if we could abort a subxact to release locks, which
is what the "UPSERT looping subxact" pattern does. I think it's fair
to say that that design became more complicated than initially
anticipated [4]/messages/by-id/CAM3SWZQoArVQGMi=v-jk3sBjsPg+wdjeUkM_6L5TZG_i9pyGzQ@mail.gmail.com [5]/messages/by-id/52B4AAF0.5090806@vmware.com.
Anyway, the greater point here is that fundamentally, AFAICT Heikki
and I were in agreement. Once you buy into the idea that we must avoid
holding on to "value locks" of whatever form - as Heikki evidently did
- then exactly what form they take is ultimately only a detail.
Granted, it's a very important detail, but a detail nonetheless. It
can be discussed entirely independently of all of this new stuff, and
thank goodness for that.
If anyone finds my (virtually unchanged) page heavyweight lock based
value locking approach objectionable, I ask that the criticism be
framed in a way that makes a sharp distinction between each of the
following:
1. You don't accept that value locks must be easily released in the
event of a conflict. Is anyone in this camp? It's far from obvious to
me what side of this question Andres is on at this stage, for example.
Robert might have something to say here too.
2. Having taken into account the experience of myself and Heikki, and
all that is implied by taking that approach ***while avoiding
unprincipled deadlocks***, you continue to believe that an approach
based on speculative heap insertion, or some alternative scheme is
better than what I have done to the nbtree code here, or you otherwise
dislike something about the proposed value locking scheme. You accept
that value locks must be released and released easily in the event of
a conflict, but like Heikki you just don't like what I've done to get
there.
Since we can (I believe) talk about the value locking aspect and the
rest of the patch independently, we should do so...unless you're in
camp 1, in which case I guess that we'll have to thrash it out.
Syntax, footguns
=============
As I mentioned, I have incorporated feedback from Kevin Grittner. You
may specify a unique index to merge on from within the INSERT
statement, thus avoiding the risk of inadvertently having the update
affect the wrong tuple due to the user failing to consider that there
was a would-be unique violation within some other unique index
constraining some other attribute. You may write the DML statement
like this:
INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN
upsert_pkey UPDATE SET val = 'update';
I think that there is a good chance that at least some people will
want to make this mandatory. I guess that's fair enough, but I
*really* don't want to *mandate* that users specify the name of their
unique index in DML for obvious reasons. Perhaps we can come up with a
more tasteful syntax that covers all interesting cases (consider the
issues with partial unique indexes and before triggers for example,
where a conclusion reached about which index to use during parse
analysis may subsequently be invalidated by user-defined code, or
ambiguous specifications in the face of overlapping attributes between
two unique composite indexes, etc). The Right Thing is far from
obvious, and there is very little to garner from other systems, since
SQL MERGE promises essentially nothing about concurrency, both as
specified by the standard and in practice. You don't need a unique
index at all, and as I showed in my pgCon talk, there are race
conditions even for a trivial UPSERT operations in all major SQL MERGE
implementations.
Note that making mandatory (via syntax) merging on one particular
unique index buys the implementation no useful leeway. Just for
example, the unprincipled deadlocks test case that illustrated the
problem with early "promise tuple" style approaches to value locking
[6]: /messages/by-id/CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com
whether or not this should be mandatory is just a detail of the
feature's high level design, as opposed to something expected to
significantly influence the implementation.
Testing, performance
===============
As you'd expect, I've included both isolation tests and regression
tests covering a reasonable variety of cases. In addition, stress
testing is an important part of my testing strategy. Reviewers are
encouraged to try out these test bash scripts:
https://github.com/petergeoghegan/upsert
(Interested hackers should request collaborator status on that Github
project from me privately. I welcome new, interesting test cases.)
The performance of the patch seems quite good, and is something that
these stress-testing bash scripts also test. Upserts are compared
against "equivalent" inserts when we know we'll never update, and
against "equivalent" updates when we know we'll never insert.
On an 8 core test server, I can sustain ~90,000 ordinary insert
transactions per second on an unlogged table defined as follows:
create unlogged table foo
(
merge serial primary key,
b int4,
c text
);
In all cases pgbench uses 8 clients (1 per CPU core).
With "equivalent" upserts, it's about ~66,000 TPS. But this is a
particularly unsympathetic case, because I've deliberately exaggerated
the effects of heavyweight lock contention on leaf pages by using a
serial primary key. Plus, there's the additional planning and parsing
overhead.
When comparing updating with updating upserting, it's a similar story.
100,000 tuples are pre-inserted in each case. I can sustain ~98,000
TPS with plain updates, or ~70,000 TPS with "equivalent" upserts.
B-Tree index page heavyweight lock contention probably explains some
of the difference between "UPSERT inserts" and "UPSERT updates".
Interlocking with VACUUM, race conditions
===============================
In previous revisions, when we went to lock + update a tuple, no
"value locks" were held, and neither were any B-Tree page buffer pins,
because they were both released at the same time (recall that I call
my heavyweight lock on B-Tree leaf pages a value lock). We still do
that (unprincipled deadlocks are our only alternative), but now hold
on to the pin for longer, until after tuple locking. Old versions of
this patch used to sit on the B-Tree buffer pin to prevent concurrent
deletion only as long as value locks were held, but maybe it isn't
good enough to sit on the pin until before we lock/update, as value
locks are released: dropping the pin implies that the heap tuple can
physically go away, and in general the same TID may then contain
anything. We may have to interlock against vacuum by sitting on the
B-Tree buffer pin (but not the value lock) throughout locking +
update. That makes it impossible for the heap tuple slot to fail to
relate to the tuple from the B-Tree, that is under consideration for
locking/updating. Recall that we aren't quite dealing with MVCC
semantics here, since in READ COMMITTED mode we can lock a
conclusively committed + visible tuple with *no* version visible to
our command's MVCC snapshot. Therefore, it seems worth considering the
possibility that the nbtree README's observations on the necessity of
holding a pin to interlock against VACUUM (for non-MVCC snapshots)
apply.
In this revision we have two callbacks (or two calls to the same
callback, with different effects): One to release value locks early,
to avoid unprincipled deadlocks, and a second to finally release the
last unneeded buffer pin.
Recall that when we find a conflict (within _bt_checkunique()), it
must be conclusively committed and visible to new MVCC snapshots; we
know at that juncture that it's live. The concern is that it might be
deleted *and* garbage collected in the interim between finding the
conflict tuple, and locking it (in practice this interim period is
only an instant).
This is probably too paranoid, though: the fact that the upserter's
transaction is running ought to imply that GetOldestXmin() returns an
XID sufficient to prevent this. OTOH, I'm not sure that there exists
anything that looks like a precedent for relying on blocking vacuum in
this manner, and it might turn out to be limiting to rely on this.
And, I hasten to add, my fix (sitting on a B-Tree pin throughout row
locking) is in another way perhaps not paranoid enough: Who is to say
that our conflicting value is on the same B-Tree leaf page as our
value lock? If might not be, since _bt_checkunique() looks at later
B-Tree pages (the value locked page is merely "the first leaf page the
value could be on"). Pinning the heavyweight lock page's buffer is
certainly justified by the need for non-speculative inserters to see a
flag that obligates them to acquire the heavyweight page lock
themselves (see comments in patch for more), but this other reason is
kind of dubious.
In other words: I'm relying on the way VACUUM actually works to
prevent premature garbage collection. It's possible to imagine a world
in which HeapTupleSatisfiesVacuum() is smart enough to realize that
the tuple UPSERT wants to lock is not visible to anyone (assuming MVCC
semantics, etc), and never can be. I've tentatively added code to keep
a buffer pin for longer, but that's probably not good enough if we
assume that it's necessary at all. Basically, I want to be comfortable
about my rationale for it being okay that a "non-MVCC" "index scan"
doesn't hold a pin, but right now I'm not. I was conflicted on whether
or not I should include the "unpin later" logic at all; for now I've
left it in, if only as a placeholder. Needless to say, if there is a
race condition you can take it that it's very difficult to isolate.
FWIW, somewhat extensive stress-testing has revealed no bugs that you
might associate with these problems, with and without extended buffer
pinning, and with artificial random sleeps added at key points in an
effort to make any race condition bugs manifest themselves. I have
made a concerted effort to break the patch in that way, and I'm now
running out of ideas. Running the stress tests (with random delays in
key points in the code) for several days reveals no bugs. This is on
the same dedicated 8 core server, with plenty of concurrency.
It's probably a good idea to begin using my B-Tree verification tool
[7]: /messages/by-id/CAM3SWZRtV+xmRWLWq6c-x7czvwavFdwFi4St1zz4dDgFH4yN4g@mail.gmail.com -- Peter Geoghegan
MVCC, and will only detect the violation of invariants that are
localized to the B-Tree code, at least at the moment.
Open items
=========
I already mentioned the inability to reference rejected rows in an
UPDATE, as well as my unease about VACUUM interlocking, both of which
are open item. Also, some of the restrictions that I already mentioned
- on updatable views, inheritance, and foreign tables - are probably
unnecessary. We should be able to come with reasonable behavior for at
least some of those.
Patch
====
I thought that I went too long without posting something about all of
this to the list to get feedback, and so I decided to post this WIP
patch set. I've tried to break it up into pieces, but it isn't all
that suitable for representing as cumulative commits. I've also tried
to break up the discussion usefully (the question of how everything
fits together at a high level can hopefully be discussed separately
from the question of how "value locks" are actually implemented).
Thoughts?
[1]: http://www.pgcon.org/2014/schedule/attachments/327_upsert_weird.pdf, ("Goals for UPSERT in Postgres")
("Goals for UPSERT in Postgres")
[2]: /messages/by-id/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com
[3]: https://sqlite.org/lang_conflict.html
[4]: /messages/by-id/CAM3SWZQoArVQGMi=v-jk3sBjsPg+wdjeUkM_6L5TZG_i9pyGzQ@mail.gmail.com
[5]: /messages/by-id/52B4AAF0.5090806@vmware.com
[6]: /messages/by-id/CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com
[7]: /messages/by-id/CAM3SWZRtV+xmRWLWq6c-x7czvwavFdwFi4St1zz4dDgFH4yN4g@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan
Attachments:
0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchtext/x-patch; charset=US-ASCII; name=0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patchDownload
From ef50bc4c71b08d08aa4a3c3e9fe0ea218f68d326 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@heroku.com>
Date: Tue, 26 Aug 2014 21:28:40 -0700
Subject: [PATCH 1/4] Make UPDATE privileges distinct from INSERT privileges in
RTEs
Previously, relation range table entries used a single Bitmapset field
representing which columns required either UPDATE or INSERT privileges,
despite the fact that INSERT and UPDATE privileges are separately
cataloged, and may be independently held. This worked because
ExecCheckRTEPerms() was called with a ACL_INSERT or ACL_UPDATE
requiredPerms, and based on that it was evident which type of
optimizable statement was under consideration. Since historically no
type of optimizable statement could directly INSERT and UPDATE at the
same time, there was no ambiguity as to which privileges were required.
This largely mechanical commit is required infrastructure for the
INSERT...ON CONFLICT UPDATE feature, which introduces an optimizable
statement that may be subject to both INSERT and UPDATE permissions
enforcement. Tests follow in a later commit.
Note that this commit necessitates an initdb, since stored ACLs are
broken.
---
contrib/postgres_fdw/postgres_fdw.c | 2 +-
src/backend/commands/copy.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/trigger.c | 12 ++++----
src/backend/executor/execMain.c | 51 ++++++++++++++++++++++++++-----
src/backend/nodes/copyfuncs.c | 3 +-
src/backend/nodes/equalfuncs.c | 3 +-
src/backend/nodes/outfuncs.c | 3 +-
src/backend/nodes/readfuncs.c | 3 +-
src/backend/optimizer/prep/prepsecurity.c | 6 ++--
src/backend/optimizer/prep/prepunion.c | 4 ++-
src/backend/parser/analyze.c | 4 +--
src/backend/parser/parse_relation.c | 21 ++++++++-----
src/backend/rewrite/rewriteHandler.c | 18 ++++++++---
src/include/nodes/parsenodes.h | 3 +-
15 files changed, 98 insertions(+), 39 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 4c49776..c99a3ee 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1197,7 +1197,7 @@ postgresPlanForeignModify(PlannerInfo *root,
}
else if (operation == CMD_UPDATE)
{
- Bitmapset *tmpset = bms_copy(rte->modifiedCols);
+ Bitmapset *tmpset = bms_copy(rte->updatedCols);
AttrNumber col;
while ((col = bms_first_member(tmpset)) >= 0)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fbd7492..eb2844a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -832,7 +832,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString, uint64 *processed)
FirstLowInvalidHeapAttributeNumber;
if (is_from)
- rte->modifiedCols = bms_add_member(rte->modifiedCols, attno);
+ rte->insertedCols = bms_add_member(rte->insertedCols, attno);
else
rte->selectedCols = bms_add_member(rte->selectedCols, attno);
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 5245171..9688916 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -414,7 +414,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
rte->requiredPerms = ACL_INSERT;
for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
- rte->modifiedCols = bms_add_member(rte->modifiedCols,
+ rte->insertedCols = bms_add_member(rte->insertedCols,
attnum - FirstLowInvalidHeapAttributeNumber);
ExecCheckRTPerms(list_make1(rte), true);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 9bf0098..b9cd78b 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -65,8 +65,8 @@ int SessionReplicationRole = SESSION_REPLICATION_ROLE_ORIGIN;
/* How many levels deep into trigger execution are we? */
static int MyTriggerDepth = 0;
-#define GetModifiedColumns(relinfo, estate) \
- (rt_fetch((relinfo)->ri_RangeTableIndex, (estate)->es_range_table)->modifiedCols)
+#define GetUpdatedColumns(relinfo, estate) \
+ (rt_fetch((relinfo)->ri_RangeTableIndex, (estate)->es_range_table)->updatedCols)
/* Local function prototypes */
static void ConvertTriggerToFK(CreateTrigStmt *stmt, Oid funcoid);
@@ -2345,7 +2345,7 @@ ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
if (!trigdesc->trig_update_before_statement)
return;
- modifiedCols = GetModifiedColumns(relinfo, estate);
+ modifiedCols = GetUpdatedColumns(relinfo, estate);
LocTriggerData.type = T_TriggerData;
LocTriggerData.tg_event = TRIGGER_EVENT_UPDATE |
@@ -2391,7 +2391,7 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
if (trigdesc && trigdesc->trig_update_after_statement)
AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
false, NULL, NULL, NIL,
- GetModifiedColumns(relinfo, estate));
+ GetUpdatedColumns(relinfo, estate));
}
TupleTableSlot *
@@ -2418,7 +2418,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
* been modified, then we can use a weaker lock, allowing for better
* concurrency.
*/
- modifiedCols = GetModifiedColumns(relinfo, estate);
+ modifiedCols = GetUpdatedColumns(relinfo, estate);
keyCols = RelationGetIndexAttrBitmap(relinfo->ri_RelationDesc,
INDEX_ATTR_BITMAP_KEY);
if (bms_overlap(keyCols, modifiedCols))
@@ -2545,7 +2545,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
true, trigtuple, newtuple, recheckIndexes,
- GetModifiedColumns(relinfo, estate));
+ GetUpdatedColumns(relinfo, estate));
if (trigtuple != fdw_trigtuple)
heap_freetuple(trigtuple);
}
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 072c7df..d71e9b8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -631,11 +631,11 @@ ExecCheckRTEPerms(RangeTblEntry *rte)
}
/*
- * Basically the same for the mod columns, with either INSERT or
- * UPDATE privilege as specified by remainingPerms.
+ * Basically the same for the mod columns, for both INSERT and UPDATE
+ * privilege as specified by remainingPerms (INSERT...ON CONFLICT
+ * UPDATE may set both).
*/
- remainingPerms &= ~ACL_SELECT;
- if (remainingPerms != 0)
+ if (remainingPerms & ACL_INSERT)
{
/*
* When the query doesn't explicitly change any columns, allow the
@@ -643,14 +643,14 @@ ExecCheckRTEPerms(RangeTblEntry *rte)
* to handle SELECT FOR UPDATE as well as possible corner cases in
* INSERT and UPDATE.
*/
- if (bms_is_empty(rte->modifiedCols))
+ if (bms_is_empty(rte->insertedCols))
{
- if (pg_attribute_aclcheck_all(relOid, userid, remainingPerms,
+ if (pg_attribute_aclcheck_all(relOid, userid, ACL_INSERT,
ACLMASK_ANY) != ACLCHECK_OK)
return false;
}
- tmpset = bms_copy(rte->modifiedCols);
+ tmpset = bms_copy(rte->insertedCols);
while ((col = bms_first_member(tmpset)) >= 0)
{
/* remove the column number offset */
@@ -663,7 +663,42 @@ ExecCheckRTEPerms(RangeTblEntry *rte)
else
{
if (pg_attribute_aclcheck(relOid, col, userid,
- remainingPerms) != ACLCHECK_OK)
+ ACL_INSERT) != ACLCHECK_OK)
+ return false;
+ }
+ }
+ bms_free(tmpset);
+ }
+
+ if (remainingPerms & ACL_UPDATE)
+ {
+ /*
+ * When the query doesn't explicitly change any columns, allow the
+ * query if we have permission on any column of the rel. This is
+ * to handle SELECT FOR UPDATE as well as possible corner cases in
+ * INSERT and UPDATE.
+ */
+ if (bms_is_empty(rte->updatedCols))
+ {
+ if (pg_attribute_aclcheck_all(relOid, userid, ACL_UPDATE,
+ ACLMASK_ANY) != ACLCHECK_OK)
+ return false;
+ }
+
+ tmpset = bms_copy(rte->updatedCols);
+ while ((col = bms_first_member(tmpset)) >= 0)
+ {
+ /* remove the column number offset */
+ col += FirstLowInvalidHeapAttributeNumber;
+ if (col == InvalidAttrNumber)
+ {
+ /* whole-row reference can't happen here */
+ elog(ERROR, "whole-row update is not implemented");
+ }
+ else
+ {
+ if (pg_attribute_aclcheck(relOid, col, userid,
+ ACL_UPDATE) != ACLCHECK_OK)
return false;
}
}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index aa053a0..9f5bcd7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1998,7 +1998,8 @@ _copyRangeTblEntry(const RangeTblEntry *from)
COPY_SCALAR_FIELD(requiredPerms);
COPY_SCALAR_FIELD(checkAsUser);
COPY_BITMAPSET_FIELD(selectedCols);
- COPY_BITMAPSET_FIELD(modifiedCols);
+ COPY_BITMAPSET_FIELD(insertedCols);
+ COPY_BITMAPSET_FIELD(updatedCols);
COPY_NODE_FIELD(securityQuals);
return newnode;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 719923e..8d13c69 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2319,7 +2319,8 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
COMPARE_SCALAR_FIELD(requiredPerms);
COMPARE_SCALAR_FIELD(checkAsUser);
COMPARE_BITMAPSET_FIELD(selectedCols);
- COMPARE_BITMAPSET_FIELD(modifiedCols);
+ COMPARE_BITMAPSET_FIELD(insertedCols);
+ COMPARE_BITMAPSET_FIELD(updatedCols);
COMPARE_NODE_FIELD(securityQuals);
return true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e686a6c..da75e29 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2423,7 +2423,8 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
WRITE_UINT_FIELD(requiredPerms);
WRITE_OID_FIELD(checkAsUser);
WRITE_BITMAPSET_FIELD(selectedCols);
- WRITE_BITMAPSET_FIELD(modifiedCols);
+ WRITE_BITMAPSET_FIELD(insertedCols);
+ WRITE_BITMAPSET_FIELD(updatedCols);
WRITE_NODE_FIELD(securityQuals);
}
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 69d9989..65584d1 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1252,7 +1252,8 @@ _readRangeTblEntry(void)
READ_UINT_FIELD(requiredPerms);
READ_OID_FIELD(checkAsUser);
READ_BITMAPSET_FIELD(selectedCols);
- READ_BITMAPSET_FIELD(modifiedCols);
+ READ_BITMAPSET_FIELD(insertedCols);
+ READ_BITMAPSET_FIELD(updatedCols);
READ_NODE_FIELD(securityQuals);
READ_DONE();
diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c
index 2420f97..1de9d99 100644
--- a/src/backend/optimizer/prep/prepsecurity.c
+++ b/src/backend/optimizer/prep/prepsecurity.c
@@ -115,7 +115,8 @@ expand_security_quals(PlannerInfo *root, List *tlist)
rte->requiredPerms = 0;
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* For the most part, Vars referencing the original relation
@@ -213,7 +214,8 @@ expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
rte->requiredPerms = 0;
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* Now deal with any PlanRowMark on this RTE by requesting a lock
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 0410fdd..d607523 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1374,7 +1374,9 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
{
childrte->selectedCols = translate_col_privs(rte->selectedCols,
appinfo->translated_vars);
- childrte->modifiedCols = translate_col_privs(rte->modifiedCols,
+ childrte->insertedCols = translate_col_privs(rte->insertedCols,
+ appinfo->translated_vars);
+ childrte->updatedCols = translate_col_privs(rte->updatedCols,
appinfo->translated_vars);
}
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index fb6c44c..c0b1fe3 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -737,7 +737,7 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
false);
qry->targetList = lappend(qry->targetList, tle);
- rte->modifiedCols = bms_add_member(rte->modifiedCols,
+ rte->insertedCols = bms_add_member(rte->insertedCols,
attr_num - FirstLowInvalidHeapAttributeNumber);
icols = lnext(icols);
@@ -2006,7 +2006,7 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
origTarget->location);
/* Mark the target column as requiring update permissions */
- target_rte->modifiedCols = bms_add_member(target_rte->modifiedCols,
+ target_rte->updatedCols = bms_add_member(target_rte->updatedCols,
attrno - FirstLowInvalidHeapAttributeNumber);
origTargetList = lnext(origTargetList);
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 478584d..364ced0 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1052,7 +1052,8 @@ addRangeTableEntry(ParseState *pstate,
rte->requiredPerms = ACL_SELECT;
rte->checkAsUser = InvalidOid; /* not set-uid by default, either */
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* Add completed RTE to pstate's range table list, but not to join list
@@ -1105,7 +1106,8 @@ addRangeTableEntryForRelation(ParseState *pstate,
rte->requiredPerms = ACL_SELECT;
rte->checkAsUser = InvalidOid; /* not set-uid by default, either */
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* Add completed RTE to pstate's range table list, but not to join list
@@ -1183,7 +1185,8 @@ addRangeTableEntryForSubquery(ParseState *pstate,
rte->requiredPerms = 0;
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* Add completed RTE to pstate's range table list, but not to join list
@@ -1437,7 +1440,8 @@ addRangeTableEntryForFunction(ParseState *pstate,
rte->requiredPerms = 0;
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* Add completed RTE to pstate's range table list, but not to join list
@@ -1509,7 +1513,8 @@ addRangeTableEntryForValues(ParseState *pstate,
rte->requiredPerms = 0;
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* Add completed RTE to pstate's range table list, but not to join list
@@ -1577,7 +1582,8 @@ addRangeTableEntryForJoin(ParseState *pstate,
rte->requiredPerms = 0;
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* Add completed RTE to pstate's range table list, but not to join list
@@ -1677,7 +1683,8 @@ addRangeTableEntryForCTE(ParseState *pstate,
rte->requiredPerms = 0;
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* Add completed RTE to pstate's range table list, but not to join list
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index e6c5530..cc967f0 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -1401,7 +1401,8 @@ ApplyRetrieveRule(Query *parsetree,
rte->requiredPerms = 0;
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* For the most part, Vars referencing the view should remain as
@@ -1464,12 +1465,14 @@ ApplyRetrieveRule(Query *parsetree,
subrte->requiredPerms = rte->requiredPerms;
subrte->checkAsUser = rte->checkAsUser;
subrte->selectedCols = rte->selectedCols;
- subrte->modifiedCols = rte->modifiedCols;
+ subrte->insertedCols = rte->insertedCols;
+ subrte->updatedCols = rte->updatedCols;
rte->requiredPerms = 0; /* no permission check on subquery itself */
rte->checkAsUser = InvalidOid;
rte->selectedCols = NULL;
- rte->modifiedCols = NULL;
+ rte->insertedCols = NULL;
+ rte->updatedCols = NULL;
/*
* If FOR [KEY] UPDATE/SHARE of view, mark all the contained tables as
@@ -2697,8 +2700,13 @@ rewriteTargetView(Query *parsetree, Relation view)
* This step needs the modified view targetlist, so we have to do things
* in this order.
*/
- Assert(bms_is_empty(new_rte->modifiedCols));
- new_rte->modifiedCols = adjust_view_column_set(view_rte->modifiedCols,
+ Assert(bms_is_empty(new_rte->insertedCols) &&
+ bms_is_empty(new_rte->updatedCols));
+
+ new_rte->insertedCols = adjust_view_column_set(view_rte->insertedCols,
+ view_targetlist);
+
+ new_rte->updatedCols = adjust_view_column_set(view_rte->updatedCols,
view_targetlist);
/*
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d2c0b29..2c5d842 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -814,7 +814,8 @@ typedef struct RangeTblEntry
AclMode requiredPerms; /* bitmask of required access permissions */
Oid checkAsUser; /* if valid, check access as this role */
Bitmapset *selectedCols; /* columns needing SELECT permission */
- Bitmapset *modifiedCols; /* columns needing INSERT/UPDATE permission */
+ Bitmapset *insertedCols; /* columns needing INSERT permission */
+ Bitmapset *updatedCols; /* columns needing UPDATE permission */
List *securityQuals; /* any security barrier quals to apply */
} RangeTblEntry;
--
1.9.1
0004-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patchtext/x-patch; charset=US-ASCII; name=0004-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patchDownload
From 2c7dd20a47f539f2de514706a32b02207a04e5a6 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@heroku.com>
Date: Wed, 27 Aug 2014 15:16:11 -0700
Subject: [PATCH 4/4] Internal documentation for INSERT ... ON CONFLICT {UPDATE
| IGNORE}
Includes documentation for both the nbtree and executor READMEs, as well
as the user-facing SGML documentation around index AMs. The latter two
additions should make clear how the AM interface has been revised for
amcanunique AMs in the abstract, without reference to the only current
implementation, nbtree.
---
doc/src/sgml/indexam.sgml | 145 ++++++++++++++++++++++++++++++++++++---
src/backend/access/nbtree/README | 75 ++++++++++++++++++++
src/backend/executor/README | 34 +++++++++
3 files changed, 244 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 157047a..5cd1f14 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -176,13 +176,73 @@ ambuildempty (Relation indexRelation);
<para>
<programlisting>
+SpeculativeState *state
+amlock (Relation uniqueIndexRelation,
+ Datum values,
+ bool *isnull,
+ Relation heapRel,
+ List *otherSpecStates,
+ bool priorConflict
+);
+</programlisting>
+ Lock index tuple values as part of the first phase of speculative
+ insertion. If all unique index values are concurrently locked, the
+ core system will attempt a proper insertion. Otherwise, it may
+ lock a <quote>rejector row</> TID returned through the state return
+ value.
+ </para>
+ <para>
+ The core system will not hold on to value locks acquired by AMs for
+ more than an instant. It will only hold index value locks across
+ insertion of the corresponding heap row, perhaps including
+ toasting, and across multiple <structfield>amcanunique</> lock
+ acquisitions, plus the latter phase of speculative insertion. AM
+ authors should carefully analyze the deadlock hazards, including
+ interaction with system catalogs that may be innocously queried
+ with little provocation, and including calls to arbitrary
+ user-defined functions with value locks held during
+ <function>FormIndexDatum</> with expression indexes.
+ Conventionally, value locks are implemented using heavyweight locks
+ on pages, managed by the lock manager. The core code calls the
+ function in reverse order to the usual insertion order. The
+ <literal>priorConflict</> field hints to the implementation that
+ there was recently a conflict on the constant proposed for
+ insertion, which can optionally be used to implement opportunistic
+ lock strength reduction optimizations.
+ </para>
+
+ <para>
+ The function's state result value is used in a subsequent call to
+ <function>aminsert</> if and only if the core system resolves to
+ proceed with speculative insertion of a slot proposed for
+ insertion, which it will do if all value locks are successfully
+ acquired (though even that condition does not ensure that the
+ system will not need to restart from the very beginning of value
+ locking for all unique indexes -- <structfield>amcanunique</>
+ methods can only assume that insertion will proceed when there is a
+ corresponding <function>aminsert</> call, and can probably never
+ reasonably ascertain that insertion will not proceed, since that's
+ the core code's concern. The core code will only release index
+ value locks through callback, however). It also provides callbacks
+ for both the core code and all <structfield>amcanunique</> access
+ methods to release all value locks. The routine may need to do so
+ for all value locks held while it blocks pending the outcome of a
+ conflicting transaction. The AM may then report an unsuccessful
+ attempt at value locking, obliging the core code to restart all
+ value locking for all unique indexes. see <xref
+ linkend="index-unique-checks"> for more details.
+ </para>
+
+ <para>
+<programlisting>
bool
aminsert (Relation indexRelation,
Datum *values,
bool *isnull,
ItemPointer heap_tid,
Relation heapRelation,
- IndexUniqueCheck checkUnique);
+ IndexUniqueCheck checkUnique,
+ SpeculativeState *state);
</programlisting>
Insert a new tuple into an existing index. The <literal>values</> and
<literal>isnull</> arrays give the key values to be indexed, and
@@ -190,11 +250,15 @@ aminsert (Relation indexRelation,
If the access method supports unique indexes (its
<structname>pg_am</>.<structfield>amcanunique</> flag is true) then
<literal>checkUnique</> indicates the type of uniqueness check to
- perform. This varies depending on whether the unique constraint is
- deferrable; see <xref linkend="index-unique-checks"> for details.
- Normally the access method only needs the <literal>heapRelation</>
- parameter when performing uniqueness checking (since then it will have to
- look into the heap to verify tuple liveness).
+ perform, and <literal>state</> indicates state that is passed
+ backwards and forwards between the AM and core code to manage
+ speculative insertion state, and to provide a callback to release
+ all value locks held. <literal>checkUnique</> varies depending on
+ whether the unique constraint is deferrable; see <xref
+ linkend="index-unique-checks"> for details. Normally the access
+ method only needs the <literal>heapRelation</> parameter when
+ performing uniqueness checking (since then it will have to look
+ into the heap to verify tuple liveness).
</para>
<para>
@@ -766,7 +830,15 @@ amrestrpos (IndexScanDesc scan);
using <firstterm>unique indexes</>, which are indexes that disallow
multiple entries with identical keys. An access method that supports this
feature sets <structname>pg_am</>.<structfield>amcanunique</> true.
- (At present, only b-tree supports it.)
+ (At present, only b-tree supports it.) Note that
+ <structfield>amcanunique</> access methods are also obliged to
+ support <quote>speculative insertion</> via phased locking in
+ advance of heap tuple insertion, where <quote>value locks</>
+ on values proposed for insertion are held across multiple
+ <structfield>amcanunique</> indexes concurrently. Speculative
+ insertion is used by <command>INSERT</command> statements when
+ <literal>ON CONFLICT UPDATE</literal> or <literal>ON CONFLICT
+ IGNORE</literal> is specified.
</para>
<para>
@@ -793,14 +865,38 @@ amrestrpos (IndexScanDesc scan);
commits. If it rolls back then there is no conflict. If it commits
without deleting the conflicting row again, there is a uniqueness
violation. (In practice we just wait for the other transaction to
- end and then redo the visibility check in toto.)
+ end and then redo the visibility check in toto). However,
+ during speculative insertion, where a unique constraint
+ violation cannot be thrown, but it is unlikely to be practical
+ to hold on to a value lock indefinitely pending the outcome of
+ another transaction, a callback is provided to instruct the
+ core code to release each index value lock. When a wait on the
+ conflicting transaction is complete, the AM specifies that the
+ core code should retry its slot insertion from scratch,
+ possibly reacquiring value locks on previously locked values on
+ seperate, earlier-locked unique indexes.
+ </para>
+ <para>
+ Note that the unlocking callback is called twice in the event
+ of locking/updating being indicated -- the first call (before
+ locking/updating the slot) releases functional value locks, and
+ the second call (after locking/updating) releases an interlock
+ against VACUUM (this should only be necessary for the merge-on
+ unique index). This is necessary because it would be bad news
+ to attempt to lock/update a row that happens to occupy the same
+ heap slot as a row version that the AM has made a determination
+ about the non-uniqueness of. XXX: There are open questions
+ around VACUUM interlocking.
</para>
</listitem>
<listitem>
<para>
Similarly, if a conflicting valid row has been deleted by an
as-yet-uncommitted transaction, the would-be inserter must wait
- for that transaction to commit or abort, and then repeat the test.
+ for that transaction to commit or abort, and then repeat the test,
+ or in the case of speculative insertion, start value locking
+ for all values proposed for insertion by the current executor
+ slot from the very beginning.
</para>
</listitem>
</itemizedlist>
@@ -814,6 +910,7 @@ amrestrpos (IndexScanDesc scan);
ordinary scenario of inserting a row that's just been created by
the current transaction. It can happen during
<command>CREATE UNIQUE INDEX CONCURRENTLY</>, however.)
+ Speculative inserters need not bother with this.
</para>
<para>
@@ -865,7 +962,8 @@ amrestrpos (IndexScanDesc scan);
method must allow duplicate entries into the index, and report any
potential duplicates by returning FALSE from <function>aminsert</>.
For each row for which FALSE is returned, a deferred recheck will
- be scheduled.
+ be scheduled. Speculative insertion into deferred unique indexes is not
+ supported.
</para>
<para>
@@ -900,6 +998,33 @@ amrestrpos (IndexScanDesc scan);
for the same tuple values as were used in the original insertion.
</para>
</listitem>
+ <listitem>
+ <para>
+ <literal>UNIQUE_CHECK_SPEC</> indicates that the call is the
+ second phase of speculative insertion, corresponding to an
+ earlier <function>amlock</> call. Value locks should already
+ be held from the first phae for unique index insertion to pick
+ up from, with a heap tuple pointer now past to finish the
+ insertion with. The core code will have previously called
+ <function>amlock</> in respect of each index value as part of
+ this slot insertion. Conceptually, the implementation performs
+ the index tuple insertion in a staggered manner, with the work
+ broken out into discrete locking and insertion phases, with the
+ possibility of not continuing with insertion (to lock a row for
+ update, for example) in the event of total consensus to proceed
+ among unique indexes not emerging.
+ </para>
+
+ <para>
+ Precautions must be taken around value locking. These are
+ separately noted in the <function>amlock</> documentation. If
+ insertion proceeds in respect of a given slot, and
+ <function>aminsert</> is called with a
+ <literal>UNIQUE_CHECK_SPEC</> argument, the
+ <function>aminsert</> implementation is obligated to release
+ its value lock on the unique index as part of insertion.
+ </para>
+ </listitem>
</itemizedlist>
</para>
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index 4820f76..2e6627a 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -568,6 +568,81 @@ item is irrelevant, and need not be stored at all. This arrangement
corresponds to the fact that an L&Y non-leaf page has one more pointer
than key.
+Notes about value locking for speculative insertion
+---------------------------------------------------
+
+As an amcanunique AM, the btree implementation is required to support
+"speculative insertion". This means that the value locking method
+through which unique index enforcement conventionally occurs is
+extended and generalized, such that insertion is staggered: the core
+code attempts to get full consensus on whether values proposed for
+insertion will not cause duplicate key violations. Speculative
+insertion is only possible for unique index insertion without deferred
+uniqueness checking (since speculative insertion into a deferred
+unique constraint's index is a contradiction in terms).
+
+For conventional unique index insertion, the Btree implementation
+exclusive locks a buffer holding the first page that the value to be
+inserted could possibly be on, though only for an instant, during and
+shortly after uniqueness verification. It would not be acceptable to
+hold this lock across complex operations for the duration of the
+remainder of the first phase of speculative insertion. Therefore, we
+convert this exclusive buffer lock to an exclusive page lock managed
+by the lock manager, thereby greatly ameliorating the consequences of
+undiscovered deadlocking implementation bugs (though deadlocks are not
+expected), and minimizing the impact on system interruptibility, while
+not affecting index scans.
+
+It may be useful to informally think of the page lock type acquired by
+speculative insertion as similar to an intention exclusive lock, a
+type of lock found in some legacy 2PL database systems that use
+multi-granularity locking. A session establishes the exclusive right
+to subsequently establish a full write lock, without actually blocking
+reads of the page unless and until a lock conversion actually occurs,
+at which point both reads and writes are blocked. Under this mental
+model, buffer shared locks can be thought of as intention shared
+locks.
+
+As implemented, these heavyweight locks are only relevant to the
+insertion case; at no other point are they actually considered, since
+insertion is the only way through which new values are introduced.
+The first page a value proposed for insertion into an index could be
+on represents a natural choke point for our extended, though still
+rather limited system of value locking. Naturally, when we perform a
+"lock escalation" and acquire an exclusive buffer lock, all other
+buffer locks on the same buffer are blocked, which is how the
+implementation localizes knowledge about the heavyweight lock to
+insertion-related routines. Apart from deletion, all exclusive
+locking of Btree buffers happen as a direct or indirect result of
+insertion, so this approach is sufficient. (Actually, an exclusive
+lock may still be acquired without insertion to initialize a root
+page, but that hardly matters). Note that holding a buffer pin for
+the duration of value locking is necessary to ensure that
+non-speculative inserters notice that it is necessary to acquire a
+heavyweight lock.
+
+All value locks are dropped immediately as speculative insertion is
+aborted, as the implementation waits on the outcome of another xact,
+or as "insertion proper" occurs. These page-level locks are not
+intended to last more than an instant. In general, the protocol for
+heavyweight locking Btree pages is that heavyweight locks are acquired
+before any buffer locks are held, while the locks are only released
+after all buffer locks are released. While not a hard and fast rule,
+presently we avoid heavyweight page locking more than one page per
+unique index concurrently.
+
+The core code inserts into indexes in a well-defined order: Sorted by
+OID. This is a general precaution against deadlock hazards in AMs
+that acquire exclusive locks. However, during the first phase of
+speculative insertion, corresponding locking operations first occur in
+reverse order (for unique indexes only). This has the benefit of
+reducing lock contention, by guaranteeing that only heap tuple
+insertion occurs between value locking and index tuple insertion once
+consensus to proceed emerges. The protocol for releasing value locks
+is that the core system always does so in reverse order to the
+original locking order (so "right-way-around"), either during
+insertion or when a would-be duplicate conflict is found.
+
Notes to Operator Class Implementors
------------------------------------
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 8afa1e3..1aa21fd 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -200,3 +200,37 @@ is no explicit prohibition on SRFs in UPDATE, but the net effect will be
that only the first result row of an SRF counts, because all subsequent
rows will result in attempts to re-update an already updated target row.
This is historical behavior and seems not worth changing.)
+
+Speculative insertion
+---------------------
+
+Speculative insertion is a process that the executor manages for the benefit of
+INSERT...ON CONFLICT UPDATE... . The basic idea is that values within AMs
+(that do not currently exist) are speculatively locked. If a consensus to
+insert emerges among all unique indexes, we proceed with physical index tuple
+insertion for each unique index in turn, releasing value locks as each physical
+insertion is performed. Otherwise, we must UPDATE the existing value (or IGNORE).
+
+When we UPDATE, value locks are released before an opportunistic attempt at
+locking a conclusively visible conflicting tuple occurs. If this process fails,
+we retry. We may retry indefinitely. Failing to release value locks serves no
+practical purpose, since they don't prevent many types of conflicts that the
+UPDATE case must care about, and is actively harmful, since it will result in
+unprincipled deadlocking under high concurrency.
+
+The representation of the UPDATE query tree is as a separate query tree,
+auxiliary to the main INSERT query tree, and its plan is not formally a subplan
+of the parent INSERT's. Rather, the plan's state is used selectively by its
+parent.
+
+Having successfully locked a definitively visible tuple, we update it, applying
+the EvalPlanQual() query execution mechanism to the latest (at just determined
+by an amcanunique AM) conclusively visible, now locked tuple. Earlier versions
+are not evaluated against our qual, and we never directly walk the update chain
+in the event of the tuple being deleted/updated (which is conceptually a
+conflict). The process simply restarts without making useful progress in the
+present iteration. It is sometimes necessary to UPDATE a row where no row
+version is visible, so it seems inconsistent to require that earlier versions
+(including a version that may exist that is visible to our command's MVCC
+snapshot) must satisfy the qual just because there happened to be a version
+visible, where otherwise no evaluation would occur.
--
1.9.1
0003-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchtext/x-patch; charset=US-ASCII; name=0003-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchDownload
From 881564456e3c0b868324f61568f835f330cb2ad3 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@heroku.com>
Date: Wed, 27 Aug 2014 15:11:15 -0700
Subject: [PATCH 3/4] Tests for INSERT ... ON CONFLICT {UPDATE | IGNORE}
Add dedicated isolation tests for both UPDATE and IGNORE variants,
illustrating the "MVCC violation" that allows a READ COMMITTED
transaction's UPDATE to succeed in updating a tuple with no version
visible to its command's MVCC snapshot. Regression tests are for the
most part intended to exercise interactions with inter-related features.
Add a few general purpose smoke tests too.
---
.../isolation/expected/insert-conflict-ignore.out | 23 ++++++
.../isolation/expected/insert-conflict-update.out | 23 ++++++
src/test/isolation/isolation_schedule | 2 +
.../isolation/specs/insert-conflict-ignore.spec | 41 ++++++++++
.../isolation/specs/insert-conflict-update.spec | 40 ++++++++++
src/test/regress/expected/inherit.out | 3 +
src/test/regress/expected/privileges.out | 5 ++
src/test/regress/expected/rules.out | 22 +++++-
src/test/regress/expected/subselect.out | 18 +++++
src/test/regress/expected/triggers.out | 90 ++++++++++++++++++++++
src/test/regress/expected/updatable_views.out | 2 +
src/test/regress/expected/update.out | 5 ++
src/test/regress/expected/with.out | 64 +++++++++++++++
src/test/regress/input/constraints.source | 5 ++
src/test/regress/output/constraints.source | 15 +++-
src/test/regress/sql/inherit.sql | 3 +
src/test/regress/sql/privileges.sql | 3 +
src/test/regress/sql/rules.sql | 9 +++
src/test/regress/sql/subselect.sql | 14 ++++
src/test/regress/sql/triggers.sql | 58 ++++++++++++++
src/test/regress/sql/updatable_views.sql | 1 +
src/test/regress/sql/update.sql | 8 ++
src/test/regress/sql/with.sql | 37 +++++++++
23 files changed, 485 insertions(+), 6 deletions(-)
create mode 100644 src/test/isolation/expected/insert-conflict-ignore.out
create mode 100644 src/test/isolation/expected/insert-conflict-update.out
create mode 100644 src/test/isolation/specs/insert-conflict-ignore.spec
create mode 100644 src/test/isolation/specs/insert-conflict-update.spec
diff --git a/src/test/isolation/expected/insert-conflict-ignore.out b/src/test/isolation/expected/insert-conflict-ignore.out
new file mode 100644
index 0000000..e6cc2a1
--- /dev/null
+++ b/src/test/isolation/expected/insert-conflict-ignore.out
@@ -0,0 +1,23 @@
+Parsed test spec with 2 sessions
+
+starting permutation: ignore1 ignore2 c1 select2 c2
+step ignore1: INSERT INTO ints(key, val) VALUES(1, 'ignore1') ON CONFLICT IGNORE;
+step ignore2: INSERT INTO ints(key, val) VALUES(1, 'ignore2') ON CONFLICT IGNORE; <waiting ...>
+step c1: COMMIT;
+step ignore2: <... completed>
+step select2: SELECT * FROM ints;
+key val
+
+1 ignore1
+step c2: COMMIT;
+
+starting permutation: ignore1 ignore2 a1 select2 c2
+step ignore1: INSERT INTO ints(key, val) VALUES(1, 'ignore1') ON CONFLICT IGNORE;
+step ignore2: INSERT INTO ints(key, val) VALUES(1, 'ignore2') ON CONFLICT IGNORE; <waiting ...>
+step a1: ABORT;
+step ignore2: <... completed>
+step select2: SELECT * FROM ints;
+key val
+
+1 ignore2
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/insert-conflict-update.out b/src/test/isolation/expected/insert-conflict-update.out
new file mode 100644
index 0000000..8b2ad84
--- /dev/null
+++ b/src/test/isolation/expected/insert-conflict-update.out
@@ -0,0 +1,23 @@
+Parsed test spec with 2 sessions
+
+starting permutation: insert1 insert2 c1 select2 c2
+step insert1: INSERT INTO upsert(key, val) VALUES(1, 'insert1') ON CONFLICT UPDATE set val = val || ' updated by insert1';
+step insert2: INSERT INTO upsert(key, val) VALUES(1, 'insert2') ON CONFLICT UPDATE set val = val || ' updated by insert2'; <waiting ...>
+step c1: COMMIT;
+step insert2: <... completed>
+step select2: SELECT * FROM upsert;
+key val
+
+1 insert1 updated by insert2
+step c2: COMMIT;
+
+starting permutation: insert1 insert2 a1 select2 c2
+step insert1: INSERT INTO upsert(key, val) VALUES(1, 'insert1') ON CONFLICT UPDATE set val = val || ' updated by insert1';
+step insert2: INSERT INTO upsert(key, val) VALUES(1, 'insert2') ON CONFLICT UPDATE set val = val || ' updated by insert2'; <waiting ...>
+step a1: ABORT;
+step insert2: <... completed>
+step select2: SELECT * FROM upsert;
+key val
+
+1 insert2
+step c2: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 10c89ff..fa92329 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -16,6 +16,8 @@ test: fk-deadlock2
test: eval-plan-qual
test: lock-update-delete
test: lock-update-traversal
+test: insert-conflict-ignore
+test: insert-conflict-update
test: delete-abort-savept
test: delete-abort-savept-2
test: aborted-keyrevoke
diff --git a/src/test/isolation/specs/insert-conflict-ignore.spec b/src/test/isolation/specs/insert-conflict-ignore.spec
new file mode 100644
index 0000000..fde43b3
--- /dev/null
+++ b/src/test/isolation/specs/insert-conflict-ignore.spec
@@ -0,0 +1,41 @@
+# INSERT...ON CONFLICT IGNORE test
+#
+# This test tries to expose problems with the interaction between concurrent
+# sessions during INSERT...ON CONFLICT IGNORE.
+#
+# The convention here is that session 1 always ends up inserting, and session 2
+# always ends up ignoring.
+
+setup
+{
+ CREATE TABLE ints (key int primary key, val text);
+}
+
+teardown
+{
+ DROP TABLE ints;
+}
+
+session "s1"
+setup
+{
+ BEGIN ISOLATION LEVEL READ COMMITTED;
+}
+step "ignore1" { INSERT INTO ints(key, val) VALUES(1, 'ignore1') ON CONFLICT IGNORE; }
+step "c1" { COMMIT; }
+step "a1" { ABORT; }
+
+session "s2"
+setup
+{
+ BEGIN ISOLATION LEVEL READ COMMITTED;
+}
+step "ignore2" { INSERT INTO ints(key, val) VALUES(1, 'ignore2') ON CONFLICT IGNORE; }
+step "select2" { SELECT * FROM ints; }
+step "c2" { COMMIT; }
+step "a2" { ABORT; }
+
+# Regular case where one session block-waits on another to determine if it
+# should proceed with an insert or ignore.
+permutation "ignore1" "ignore2" "c1" "select2" "c2"
+permutation "ignore1" "ignore2" "a1" "select2" "c2"
diff --git a/src/test/isolation/specs/insert-conflict-update.spec b/src/test/isolation/specs/insert-conflict-update.spec
new file mode 100644
index 0000000..e5f62b2
--- /dev/null
+++ b/src/test/isolation/specs/insert-conflict-update.spec
@@ -0,0 +1,40 @@
+# INSERT...ON CONFLICT UPDATE test
+#
+# This test tries to expose problems with the interaction between concurrent
+# sessions.
+
+setup
+{
+ CREATE TABLE upsert (key int primary key, val text);
+}
+
+teardown
+{
+ DROP TABLE upsert;
+}
+
+session "s1"
+setup
+{
+ BEGIN ISOLATION LEVEL READ COMMITTED;
+}
+step "insert1" { INSERT INTO upsert(key, val) VALUES(1, 'insert1') ON CONFLICT UPDATE set val = val || ' updated by insert1'; }
+step "c1" { COMMIT; }
+step "a1" { ABORT; }
+
+session "s2"
+setup
+{
+ BEGIN ISOLATION LEVEL READ COMMITTED;
+}
+step "insert2" { INSERT INTO upsert(key, val) VALUES(1, 'insert2') ON CONFLICT UPDATE set val = val || ' updated by insert2'; }
+step "select2" { SELECT * FROM upsert; }
+step "c2" { COMMIT; }
+step "a2" { ABORT; }
+
+# One session (session 2) block-waits on another (session 1) to determine if it
+# should proceed with an insert or update. Notably, this entails updating a
+# tuple while there is no version of that tuple visible to the updating
+# session's snapshot. This is permitted only in READ COMMITTED mode.
+permutation "insert1" "insert2" "c1" "select2" "c2"
+permutation "insert1" "insert2" "a1" "select2" "c2"
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 56e2c99..27aae0e 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -13,6 +13,9 @@ INSERT INTO a(aa) VALUES('aaaaa');
INSERT INTO a(aa) VALUES('aaaaaa');
INSERT INTO a(aa) VALUES('aaaaaaa');
INSERT INTO a(aa) VALUES('aaaaaaaa');
+-- INSERT ON CONFLICT UPDATE does not support inheritance
+INSERT INTO a(aa) VALUES('aaaaaaaa') ON CONFLICT UPDATE set aa = 'bbbbbbbb';
+ERROR: INSERT...ON CONFLICT does not support table inheritance
INSERT INTO b(aa) VALUES('bbb');
INSERT INTO b(aa) VALUES('bbbb');
INSERT INTO b(aa) VALUES('bbbbb');
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index 1675075..243eb6c 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -367,6 +367,11 @@ UPDATE atest5 SET one = 8; -- fail
ERROR: permission denied for relation atest5
UPDATE atest5 SET three = 5, one = 2; -- fail
ERROR: permission denied for relation atest5
+INSERT INTO atest5(two) VALUES (6) ON CONFLICT UPDATE set three = 10; -- ok
+INSERT INTO atest5(two) VALUES (6) ON CONFLICT UPDATE set one = 8; -- fails (due to UPDATE)
+ERROR: permission denied for relation atest5
+INSERT INTO atest5(three) VALUES (4) ON CONFLICT UPDATE set three = 10; -- fail (due to INSERT)
+ERROR: permission denied for relation atest5
SET SESSION AUTHORIZATION regressuser1;
REVOKE ALL (one) ON atest5 FROM regressuser4;
GRANT SELECT (one,two,blue) ON atest6 TO regressuser4;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ca56b47..4490dab 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1123,12 +1123,19 @@ SELECT * FROM shoelace_log ORDER BY sl_name;
SELECT * FROM shoelace_obsolete WHERE sl_avail = 0;
insert into shoelace values ('sl9', 0, 'pink', 35.0, 'inch', 0.0);
insert into shoelace values ('sl10', 1000, 'magenta', 40.0, 'inch', 0.0);
+-- insert rules still apply - ON CONFLICT UPDATE is irrelevant
+insert into shoelace values ('sl10', 1000, 'magenta', 40.0, 'inch', 0.0)
+ on conflict ignore;
+insert into shoelace values ('sl10', 1000, 'magenta', 70.0, 'inch', 0.0)
+ on conflict update set sl_color = 'orange';
SELECT * FROM shoelace_obsolete ORDER BY sl_len_cm;
sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm
------------+----------+------------+--------+----------+-----------
sl9 | 0 | pink | 35 | inch | 88.9
sl10 | 1000 | magenta | 40 | inch | 101.6
-(2 rows)
+ sl10 | 1000 | magenta | 40 | inch | 101.6
+ sl10 | 1000 | magenta | 70 | inch | 177.8
+(4 rows)
SELECT * FROM shoelace_candelete;
sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm
@@ -1144,6 +1151,8 @@ SELECT * FROM shoelace ORDER BY sl_name;
------------+----------+------------+--------+----------+-----------
sl1 | 5 | black | 80 | cm | 80
sl10 | 1000 | magenta | 40 | inch | 101.6
+ sl10 | 1000 | magenta | 40 | inch | 101.6
+ sl10 | 1000 | magenta | 70 | inch | 177.8
sl2 | 6 | black | 100 | cm | 100
sl3 | 10 | black | 35 | inch | 88.9
sl4 | 8 | black | 40 | inch | 101.6
@@ -1151,7 +1160,7 @@ SELECT * FROM shoelace ORDER BY sl_name;
sl6 | 20 | brown | 0.9 | m | 90
sl7 | 6 | brown | 60 | cm | 60
sl8 | 21 | brown | 40 | inch | 101.6
-(9 rows)
+(11 rows)
SELECT * FROM shoe ORDER BY shoename;
shoename | sh_avail | slcolor | slminlen | slminlen_cm | slmaxlen | slmaxlen_cm | slunit
@@ -2324,6 +2333,15 @@ DETAIL: Key (id3a, id3c)=(1, 13) is not present in table "rule_and_refint_t2".
insert into rule_and_refint_t3 values (1, 13, 11, 'row6');
ERROR: insert or update on table "rule_and_refint_t3" violates foreign key constraint "rule_and_refint_t3_id3a_fkey"
DETAIL: Key (id3a, id3b)=(1, 13) is not present in table "rule_and_refint_t1".
+insert into rule_and_refint_t3 values (1, 13, 11, 'row6')
+ on conflict ignore;
+ERROR: insert or update on table "rule_and_refint_t3" violates foreign key constraint "rule_and_refint_t3_id3a_fkey"
+DETAIL: Key (id3a, id3b)=(1, 13) is not present in table "rule_and_refint_t1".
+insert into rule_and_refint_t3 values (1, 13, 11, 'row6')
+ on conflict update set id3a = 1, id2b = 11, id3c = 11;
+ERROR: column "id2b" of relation "rule_and_refint_t3" does not exist
+LINE 2: on conflict update set id3a = 1, id2b = 11, id3c = 11;
+ ^
create rule rule_and_refint_t3_ins as on insert to rule_and_refint_t3
where (exists (select 1 from rule_and_refint_t3
where (((rule_and_refint_t3.id3a = new.id3a)
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 01c9130..9769599 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -599,6 +599,24 @@ from
(0 rows)
--
+-- Test case for subselect within UPDATE of INSERT...ON CONFLICT UPDATE
+--
+create temp table upsert(key int4 primary key, val text);
+insert into upsert values(1, 'val') on conflict update set val = 'not seen';
+insert into upsert values(1, 'val') on conflict update set val = 'should see ' || (select f1 from int4_tbl where f1 != 0 limit 1)::text;
+ERROR: paramaterized auxiliary UPDATE queries are unsupported
+select * from upsert;
+ key | val
+-----+-----
+ 1 | val
+(1 row)
+
+with aa as (select 'int4_tbl' u from int4_tbl limit 1)
+insert into upsert values (1, 'x'), (999, 'y')
+on conflict update set val = (select u from aa)
+returning *;
+ERROR: paramaterized auxiliary UPDATE queries are unsupported
+--
-- Test case for cross-type partial matching in hashed subplan (bug #7597)
--
create temp table outer_7597 (f1 int4, f2 int4);
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index f1a5fde..803532c 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -1731,3 +1731,93 @@ select * from self_ref_trigger;
drop table self_ref_trigger;
drop function self_ref_trigger_ins_func();
drop function self_ref_trigger_del_func();
+--
+-- Verify behavior of before and after triggers with INSERT...ON CONFLICT
+-- UPDATE
+--
+create table upsert (key int4 primary key, color text);
+create function upsert_before_func()
+ returns trigger language plpgsql as
+$$
+begin
+ if (TG_OP = 'UPDATE') then
+ raise warning 'before update (old): %', old.*::text;
+ raise warning 'before update (new): %', new.*::text;
+ elsif (TG_OP = 'INSERT') then
+ raise warning 'before insert (new): %', new.*::text;
+ if new.key % 2 = 0 then
+ new.key := new.key + 1;
+ new.color := new.color || ' trig modified';
+ raise warning 'before insert (new, modified): %', new.*::text;
+ end if;
+ end if;
+ return new;
+end;
+$$;
+create trigger upsert_before_trig before insert or update on upsert
+ for each row execute procedure upsert_before_func();
+create function upsert_after_func()
+ returns trigger language plpgsql as
+$$
+begin
+ if (TG_OP = 'UPDATE') then
+ raise warning 'after update (old): %', new.*::text;
+ raise warning 'after update (new): %', new.*::text;
+ elsif (TG_OP = 'INSERT') then
+ raise warning 'after insert (new): %', new.*::text;
+ end if;
+ return null;
+end;
+$$;
+create trigger upsert_after_trig after insert or update on upsert
+ for each row execute procedure upsert_after_func();
+insert into upsert values(1, 'black') on conflict update set color = 'updated ' || color;
+WARNING: before insert (new): (1,black)
+WARNING: after insert (new): (1,black)
+insert into upsert values(2, 'red') on conflict update set color = 'updated ' || color;
+WARNING: before insert (new): (2,red)
+WARNING: before insert (new, modified): (3,"red trig modified")
+WARNING: after insert (new): (3,"red trig modified")
+insert into upsert values(3, 'orange') on conflict update set color = 'updated ' || color;
+WARNING: before insert (new): (3,orange)
+WARNING: before update (old): (3,"red trig modified")
+WARNING: before update (new): (3,"updated red trig modified")
+WARNING: after update (old): (3,"updated red trig modified")
+WARNING: after update (new): (3,"updated red trig modified")
+insert into upsert values(4, 'green') on conflict update set color = 'updated ' || color;
+WARNING: before insert (new): (4,green)
+WARNING: before insert (new, modified): (5,"green trig modified")
+WARNING: after insert (new): (5,"green trig modified")
+insert into upsert values(5, 'purple') on conflict update set color = 'updated ' || color;
+WARNING: before insert (new): (5,purple)
+WARNING: before update (old): (5,"green trig modified")
+WARNING: before update (new): (5,"updated green trig modified")
+WARNING: after update (old): (5,"updated green trig modified")
+WARNING: after update (new): (5,"updated green trig modified")
+insert into upsert values(6, 'white') on conflict update set color = 'updated ' || color;
+WARNING: before insert (new): (6,white)
+WARNING: before insert (new, modified): (7,"white trig modified")
+WARNING: after insert (new): (7,"white trig modified")
+insert into upsert values(7, 'pink') on conflict update set color = 'updated ' || color;
+WARNING: before insert (new): (7,pink)
+WARNING: before update (old): (7,"white trig modified")
+WARNING: before update (new): (7,"updated white trig modified")
+WARNING: after update (old): (7,"updated white trig modified")
+WARNING: after update (new): (7,"updated white trig modified")
+insert into upsert values(8, 'yellow') on conflict update set color = 'updated ' || color;
+WARNING: before insert (new): (8,yellow)
+WARNING: before insert (new, modified): (9,"yellow trig modified")
+WARNING: after insert (new): (9,"yellow trig modified")
+select * from upsert;
+ key | color
+-----+-----------------------------
+ 1 | black
+ 3 | updated red trig modified
+ 5 | updated green trig modified
+ 7 | updated white trig modified
+ 9 | yellow trig modified
+(5 rows)
+
+drop table upsert;
+drop function upsert_before_func();
+drop function upsert_after_func();
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index ea9197a..3403879 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -215,6 +215,8 @@ INSERT INTO rw_view15 VALUES (3, 'ROW 3'); -- should fail
ERROR: cannot insert into column "upper" of view "rw_view15"
DETAIL: View columns that are not columns of their base relation are not updatable.
INSERT INTO rw_view15 (a) VALUES (3); -- should be OK
+INSERT INTO rw_view15 (a) VALUES (3) ON CONFLICT UPDATE SET upper = upper; -- fails, unsupported
+ERROR: INSERT ON CONFLICT is not supported on updatable views
ALTER VIEW rw_view15 ALTER COLUMN upper SET DEFAULT 'NOT SET';
INSERT INTO rw_view15 (a) VALUES (4); -- should fail
ERROR: cannot insert into column "upper" of view "rw_view15"
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index 1de2a86..41da5e9 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -147,4 +147,9 @@ SELECT a, b, char_length(c) FROM update_test;
42 | 12 | 10000
(4 rows)
+ALTER TABLE update_test ADD constraint uuu UNIQUE(a);
+INSERT INTO update_test
+VALUES (21, 1, 'b'), (41, 1, 'b'), (42, 1, 'b')
+ON CONFLICT UPDATE SET (b, c) = (7, 'f');
+-- SELECT a, b FROM update_test;
DROP TABLE update_test;
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 06b372b..44a2f48 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -1806,6 +1806,70 @@ SELECT * FROM y;
-400
(22 rows)
+-- data-modifying WITH containing INSERT...ON CONFLICT UPDATE
+CREATE TABLE z AS SELECT i AS k, (i || ' v')::text v FROM generate_series(1, 16, 3) i;
+ALTER TABLE z ADD UNIQUE (k);
+WITH t AS (
+ INSERT INTO z SELECT i, 'insert'
+ FROM generate_series(0, 16) i
+ ON CONFLICT UPDATE SET v = v || ', now update'
+ RETURNING *
+)
+SELECT * FROM t JOIN y ON t.k = y.a ORDER BY a, k;
+ k | v | a
+---+--------+---
+ 0 | insert | 0
+ 0 | insert | 0
+(2 rows)
+
+-- New query/snapshot demonstrates side-effects of previous query.
+SELECT * FROM z ORDER BY k;
+ k | v
+----+------------------
+ 0 | insert
+ 1 | 1 v, now update
+ 2 | insert
+ 3 | insert
+ 4 | 4 v, now update
+ 5 | insert
+ 6 | insert
+ 7 | 7 v, now update
+ 8 | insert
+ 9 | insert
+ 10 | 10 v, now update
+ 11 | insert
+ 12 | insert
+ 13 | 13 v, now update
+ 14 | insert
+ 15 | insert
+ 16 | 16 v, now update
+(17 rows)
+
+--
+-- All these cases should fail, due to restrictions imposed upon the UPDATE
+-- portion of the query.
+--
+WITH aa AS (SELECT 1 a, 2 b)
+INSERT INTO z VALUES(1, 'insert')
+ON CONFLICT UPDATE SET V = (SELECT b || ' update' FROM aa WHERE a = 1 LIMIT 1);
+ERROR: paramaterized auxiliary UPDATE queries are unsupported
+WITH aa AS (SELECT 1 a, 2 b)
+INSERT INTO z VALUES(1, 'insert')
+ON CONFLICT UPDATE SET v = ' update' WHERE k = (SELECT a FROM aa);
+ERROR: paramaterized auxiliary UPDATE queries are unsupported
+WITH aa AS (SELECT 1 a, 2 b)
+INSERT INTO z VALUES(1, 'insert')
+ON CONFLICT UPDATE SET v = (SELECT b || ' update' FROM aa WHERE a = 1 LIMIT 1);
+ERROR: paramaterized auxiliary UPDATE queries are unsupported
+WITH aa AS (SELECT 'a' a, 'b' b UNION ALL SELECT 'a' a, 'b' b)
+INSERT INTO z VALUES(1, 'insert')
+ON CONFLICT UPDATE SET v = (SELECT b || ' update' FROM aa WHERE a = 'a' LIMIT 1);
+ERROR: paramaterized auxiliary UPDATE queries are unsupported
+WITH aa AS (SELECT 1 a, 2 b)
+INSERT INTO z VALUES(1, (SELECT b || ' insert' FROM aa WHERE a = 1 ))
+ON CONFLICT UPDATE SET v = (SELECT b || ' update' FROM aa WHERE a = 1 LIMIT 1);
+ERROR: paramaterized auxiliary UPDATE queries are unsupported
+DROP TABLE Z;
-- check that run to completion happens in proper ordering
TRUNCATE TABLE y;
INSERT INTO y SELECT generate_series(1, 3);
diff --git a/src/test/regress/input/constraints.source b/src/test/regress/input/constraints.source
index 16d38f6..a004a7e 100644
--- a/src/test/regress/input/constraints.source
+++ b/src/test/regress/input/constraints.source
@@ -292,6 +292,11 @@ INSERT INTO UNIQUE_TBL VALUES (5, 'one');
INSERT INTO UNIQUE_TBL (t) VALUES ('six');
INSERT INTO UNIQUE_TBL (t) VALUES ('seven');
+INSERT INTO UNIQUE_TBL VALUES (5, 'five-upsert-insert') ON CONFLICT UPDATE SET t = 'five-upsert-update';
+INSERT INTO UNIQUE_TBL VALUES (6, 'six-upsert-insert') ON CONFLICT UPDATE SET t = 'six-upsert-update';
+-- should fail
+INSERT INTO UNIQUE_TBL VALUES (1, 'a'), (2, 'b'), (2, 'b') ON CONFLICT UPDATE SET t = 'fails';
+
SELECT '' AS five, * FROM UNIQUE_TBL;
DROP TABLE UNIQUE_TBL;
diff --git a/src/test/regress/output/constraints.source b/src/test/regress/output/constraints.source
index 2ffd263..1f6870b 100644
--- a/src/test/regress/output/constraints.source
+++ b/src/test/regress/output/constraints.source
@@ -421,16 +421,23 @@ INSERT INTO UNIQUE_TBL VALUES (4, 'four');
INSERT INTO UNIQUE_TBL VALUES (5, 'one');
INSERT INTO UNIQUE_TBL (t) VALUES ('six');
INSERT INTO UNIQUE_TBL (t) VALUES ('seven');
+INSERT INTO UNIQUE_TBL VALUES (5, 'five-upsert-insert') ON CONFLICT UPDATE SET t = 'five-upsert-update';
+INSERT INTO UNIQUE_TBL VALUES (6, 'six-upsert-insert') ON CONFLICT UPDATE SET t = 'six-upsert-update';
+-- should fail
+INSERT INTO UNIQUE_TBL VALUES (1, 'a'), (2, 'b'), (2, 'b') ON CONFLICT UPDATE SET t = 'fails';
+ERROR: could not lock instantaneously invisible tuple inserted in same transaction
+HINT: Ensure that no rows proposed for insertion in the same command have constrained values that duplicate each other.
SELECT '' AS five, * FROM UNIQUE_TBL;
- five | i | t
-------+---+-------
+ five | i | t
+------+---+--------------------
| 1 | one
| 2 | two
| 4 | four
- | 5 | one
| | six
| | seven
-(6 rows)
+ | 5 | five-upsert-update
+ | 6 | six-upsert-insert
+(7 rows)
DROP TABLE UNIQUE_TBL;
CREATE TABLE UNIQUE_TBL (i int, t text,
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 09bb750..5101858 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -13,6 +13,9 @@ INSERT INTO a(aa) VALUES('aaaaaa');
INSERT INTO a(aa) VALUES('aaaaaaa');
INSERT INTO a(aa) VALUES('aaaaaaaa');
+-- INSERT ON CONFLICT UPDATE does not support inheritance
+INSERT INTO a(aa) VALUES('aaaaaaaa') ON CONFLICT UPDATE set aa = 'bbbbbbbb';
+
INSERT INTO b(aa) VALUES('bbb');
INSERT INTO b(aa) VALUES('bbbb');
INSERT INTO b(aa) VALUES('bbbbb');
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index a0ff953..230b6d6 100644
--- a/src/test/regress/sql/privileges.sql
+++ b/src/test/regress/sql/privileges.sql
@@ -245,6 +245,9 @@ INSERT INTO atest5 VALUES (5,5,5); -- fail
UPDATE atest5 SET three = 10; -- ok
UPDATE atest5 SET one = 8; -- fail
UPDATE atest5 SET three = 5, one = 2; -- fail
+INSERT INTO atest5(two) VALUES (6) ON CONFLICT UPDATE set three = 10; -- ok
+INSERT INTO atest5(two) VALUES (6) ON CONFLICT UPDATE set one = 8; -- fails (due to UPDATE)
+INSERT INTO atest5(three) VALUES (4) ON CONFLICT UPDATE set three = 10; -- fail (due to INSERT)
SET SESSION AUTHORIZATION regressuser1;
REVOKE ALL (one) ON atest5 FROM regressuser4;
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 1e15f84..91b5f2d 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -680,6 +680,11 @@ SELECT * FROM shoelace_log ORDER BY sl_name;
insert into shoelace values ('sl9', 0, 'pink', 35.0, 'inch', 0.0);
insert into shoelace values ('sl10', 1000, 'magenta', 40.0, 'inch', 0.0);
+-- insert rules still apply - ON CONFLICT UPDATE is irrelevant
+insert into shoelace values ('sl10', 1000, 'magenta', 40.0, 'inch', 0.0)
+ on conflict ignore;
+insert into shoelace values ('sl10', 1000, 'magenta', 70.0, 'inch', 0.0)
+ on conflict update set sl_color = 'orange';
SELECT * FROM shoelace_obsolete ORDER BY sl_len_cm;
SELECT * FROM shoelace_candelete;
@@ -844,6 +849,10 @@ insert into rule_and_refint_t3 values (1, 12, 11, 'row3');
insert into rule_and_refint_t3 values (1, 12, 12, 'row4');
insert into rule_and_refint_t3 values (1, 11, 13, 'row5');
insert into rule_and_refint_t3 values (1, 13, 11, 'row6');
+insert into rule_and_refint_t3 values (1, 13, 11, 'row6')
+ on conflict ignore;
+insert into rule_and_refint_t3 values (1, 13, 11, 'row6')
+ on conflict update set id3a = 1, id2b = 11, id3c = 11;
create rule rule_and_refint_t3_ins as on insert to rule_and_refint_t3
where (exists (select 1 from rule_and_refint_t3
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 56707e2..7d6e35b 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -361,6 +361,20 @@ from
int4_tbl i4 on dummy = i4.f1;
--
+-- Test case for subselect within UPDATE of INSERT...ON CONFLICT UPDATE
+--
+create temp table upsert(key int4 primary key, val text);
+insert into upsert values(1, 'val') on conflict update set val = 'not seen';
+insert into upsert values(1, 'val') on conflict update set val = 'should see ' || (select f1 from int4_tbl where f1 != 0 limit 1)::text;
+
+select * from upsert;
+
+with aa as (select 'int4_tbl' u from int4_tbl limit 1)
+insert into upsert values (1, 'x'), (999, 'y')
+on conflict update set val = (select u from aa)
+returning *;
+
+--
-- Test case for cross-type partial matching in hashed subplan (bug #7597)
--
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 0ea2c31..d3b6aa3 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1173,3 +1173,61 @@ select * from self_ref_trigger;
drop table self_ref_trigger;
drop function self_ref_trigger_ins_func();
drop function self_ref_trigger_del_func();
+
+--
+-- Verify behavior of before and after triggers with INSERT...ON CONFLICT
+-- UPDATE
+--
+create table upsert (key int4 primary key, color text);
+
+create function upsert_before_func()
+ returns trigger language plpgsql as
+$$
+begin
+ if (TG_OP = 'UPDATE') then
+ raise warning 'before update (old): %', old.*::text;
+ raise warning 'before update (new): %', new.*::text;
+ elsif (TG_OP = 'INSERT') then
+ raise warning 'before insert (new): %', new.*::text;
+ if new.key % 2 = 0 then
+ new.key := new.key + 1;
+ new.color := new.color || ' trig modified';
+ raise warning 'before insert (new, modified): %', new.*::text;
+ end if;
+ end if;
+ return new;
+end;
+$$;
+create trigger upsert_before_trig before insert or update on upsert
+ for each row execute procedure upsert_before_func();
+
+create function upsert_after_func()
+ returns trigger language plpgsql as
+$$
+begin
+ if (TG_OP = 'UPDATE') then
+ raise warning 'after update (old): %', new.*::text;
+ raise warning 'after update (new): %', new.*::text;
+ elsif (TG_OP = 'INSERT') then
+ raise warning 'after insert (new): %', new.*::text;
+ end if;
+ return null;
+end;
+$$;
+create trigger upsert_after_trig after insert or update on upsert
+ for each row execute procedure upsert_after_func();
+
+insert into upsert values(1, 'black') on conflict update set color = 'updated ' || color;
+insert into upsert values(2, 'red') on conflict update set color = 'updated ' || color;
+insert into upsert values(3, 'orange') on conflict update set color = 'updated ' || color;
+insert into upsert values(4, 'green') on conflict update set color = 'updated ' || color;
+insert into upsert values(5, 'purple') on conflict update set color = 'updated ' || color;
+insert into upsert values(6, 'white') on conflict update set color = 'updated ' || color;
+insert into upsert values(7, 'pink') on conflict update set color = 'updated ' || color;
+insert into upsert values(8, 'yellow') on conflict update set color = 'updated ' || color;
+
+select * from upsert;
+
+drop table upsert;
+drop function upsert_before_func();
+drop function upsert_after_func();
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index c072fca..0cc2f8c 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -69,6 +69,7 @@ DELETE FROM rw_view14 WHERE a=3; -- should be OK
-- Partially updatable view
INSERT INTO rw_view15 VALUES (3, 'ROW 3'); -- should fail
INSERT INTO rw_view15 (a) VALUES (3); -- should be OK
+INSERT INTO rw_view15 (a) VALUES (3) ON CONFLICT UPDATE SET upper = upper; -- fails, unsupported
ALTER VIEW rw_view15 ALTER COLUMN upper SET DEFAULT 'NOT SET';
INSERT INTO rw_view15 (a) VALUES (4); -- should fail
UPDATE rw_view15 SET upper='ROW 3' WHERE a=3; -- should fail
diff --git a/src/test/regress/sql/update.sql b/src/test/regress/sql/update.sql
index e71128c..2692eef 100644
--- a/src/test/regress/sql/update.sql
+++ b/src/test/regress/sql/update.sql
@@ -74,4 +74,12 @@ UPDATE update_test AS t SET b = update_test.b + 10 WHERE t.a = 10;
UPDATE update_test SET c = repeat('x', 10000) WHERE c = 'car';
SELECT a, b, char_length(c) FROM update_test;
+ALTER TABLE update_test ADD constraint uuu UNIQUE(a);
+
+INSERT INTO update_test
+VALUES (21, 1, 'b'), (41, 1, 'b'), (42, 1, 'b')
+ON CONFLICT UPDATE SET (b, c) = (7, 'f');
+
+-- SELECT a, b FROM update_test;
+
DROP TABLE update_test;
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index c716369..200c200 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -795,6 +795,43 @@ SELECT * FROM t LIMIT 10;
SELECT * FROM y;
+-- data-modifying WITH containing INSERT...ON CONFLICT UPDATE
+CREATE TABLE z AS SELECT i AS k, (i || ' v')::text v FROM generate_series(1, 16, 3) i;
+ALTER TABLE z ADD UNIQUE (k);
+
+WITH t AS (
+ INSERT INTO z SELECT i, 'insert'
+ FROM generate_series(0, 16) i
+ ON CONFLICT UPDATE SET v = v || ', now update'
+ RETURNING *
+)
+SELECT * FROM t JOIN y ON t.k = y.a ORDER BY a, k;
+
+-- New query/snapshot demonstrates side-effects of previous query.
+SELECT * FROM z ORDER BY k;
+
+--
+-- All these cases should fail, due to restrictions imposed upon the UPDATE
+-- portion of the query.
+--
+WITH aa AS (SELECT 1 a, 2 b)
+INSERT INTO z VALUES(1, 'insert')
+ON CONFLICT UPDATE SET V = (SELECT b || ' update' FROM aa WHERE a = 1 LIMIT 1);
+WITH aa AS (SELECT 1 a, 2 b)
+INSERT INTO z VALUES(1, 'insert')
+ON CONFLICT UPDATE SET v = ' update' WHERE k = (SELECT a FROM aa);
+WITH aa AS (SELECT 1 a, 2 b)
+INSERT INTO z VALUES(1, 'insert')
+ON CONFLICT UPDATE SET v = (SELECT b || ' update' FROM aa WHERE a = 1 LIMIT 1);
+WITH aa AS (SELECT 'a' a, 'b' b UNION ALL SELECT 'a' a, 'b' b)
+INSERT INTO z VALUES(1, 'insert')
+ON CONFLICT UPDATE SET v = (SELECT b || ' update' FROM aa WHERE a = 'a' LIMIT 1);
+WITH aa AS (SELECT 1 a, 2 b)
+INSERT INTO z VALUES(1, (SELECT b || ' insert' FROM aa WHERE a = 1 ))
+ON CONFLICT UPDATE SET v = (SELECT b || ' update' FROM aa WHERE a = 1 LIMIT 1);
+
+DROP TABLE Z;
+
-- check that run to completion happens in proper ordering
TRUNCATE TABLE y;
--
1.9.1
0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchtext/x-patch; charset=US-ASCII; name=0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patchDownload
From fc30da97cf60bb9e1e50972acc0f58cb7329f639 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@heroku.com>
Date: Wed, 27 Aug 2014 15:01:32 -0700
Subject: [PATCH 2/4] Support INSERT ... ON CONFLICT {UPDATE | IGNORE}
This non-standard INSERT clause allows DML statement authors to specify
that in the event of each of any of the tuples being inserted
duplicating an existing tuple in terms of a value or set of values
constrained by a unique index, an alternative path may be taken. The
statement may alternatively IGNORE the tuple being inserted without
raising an error, or go to UPDATE the existing tuple whose value is
duplicated by a value within one single tuple proposed for insertion.
The implementation loops until either an insert or an UPDATE/IGNORE
occurs. No existing tuple may be affected more than once per INSERT.
This is implemented using a new infrastructure called "speculative
insertion". A speculative inserter establishes the right to insert
ahead of time. In order to proceed with insertion, consensus must be
reached between all unique indexes on the table, or else the alternative
path is taken. Finally, heap tuple insertion occurs in the conventional
manner, followed by insertion of index tuples, where index tuple
insertion picks up where value locking left off, performing essentially
the conventional insertion process in a staggered fashion.
Alternatively, we may go to UPDATE, using the EvalPlanQual() mechanism
to execute a special auxiliary plan.
READ COMMITTED isolation level is permitted to UPDATE a tuple even where
no version is visible to the command's MVCC snapshot. Similarly, any
query predicate associated with the UPDATE portion of the new statement
need only satisfy an already locked, conclusively committed and visible
conflict tuple. When the predicate isn't satisfied, the tuple is still
locked, which implies that at READ COMMITTED, a tuple may be locked
without any version being visible to the command's MVCC snapshot.
Users may optionally specify a single unique index to merge on with a
WITHIN `unique_index` specification, which is useful when there is a
concern about spuriously merging on the wrong unique index due to there
being more than one would-be unique violation. Otherwise, we UPDATE (or
IGNORE) based on the first would-be unique violation detected, on the
assumption that that is the only unique index where a violation could
appear.
The auxiliary ModifyTable plan used by the UPDATE portion of the new
statement is not formally a subplan of its parent INSERT ModifyTable
plan. Rather, it's an independently planned subquery, whose execution
is tightly driven by its parent. Special auxiliary state pertaining to
the auxiliary UPDATE is tracked by its parent through all stages of
query execution.
The optimizer imposes some restrictions on child auxiliary UPDATE plans,
which make the plans comport with their parent to the extent required
during the executor stage. One user-visible consequences of this is
that the special auxiliary UPDATE query cannot have subselects within
its targetlist or WHERE clause. UPDATEs may not reference any other
table, and UPDATE FROM is disallowed. INSERT's RETURNING clause
continues to only project tuples actually inserted.
---
contrib/pg_stat_statements/pg_stat_statements.c | 4 +
src/backend/access/heap/heapam.c | 22 +-
src/backend/access/heap/tuptoaster.c | 2 +-
src/backend/access/index/indexam.c | 153 +++++-
src/backend/access/nbtree/nbtinsert.c | 612 ++++++++++++++++++++++--
src/backend/access/nbtree/nbtree.c | 71 ++-
src/backend/catalog/index.c | 6 +-
src/backend/catalog/indexing.c | 3 +-
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copy.c | 4 +-
src/backend/commands/explain.c | 9 +-
src/backend/executor/execMain.c | 11 +-
src/backend/executor/execUtils.c | 293 ++++++++++--
src/backend/executor/nodeLockRows.c | 9 +-
src/backend/executor/nodeModifyTable.c | 427 ++++++++++++++++-
src/backend/nodes/copyfuncs.c | 23 +
src/backend/nodes/equalfuncs.c | 16 +
src/backend/nodes/nodeFuncs.c | 5 +
src/backend/nodes/outfuncs.c | 19 +
src/backend/nodes/readfuncs.c | 3 +
src/backend/optimizer/path/costsize.c | 8 +-
src/backend/optimizer/plan/createplan.c | 11 +-
src/backend/optimizer/plan/planner.c | 42 ++
src/backend/optimizer/plan/setrefs.c | 8 +
src/backend/optimizer/plan/subselect.c | 2 +
src/backend/parser/analyze.c | 58 ++-
src/backend/parser/gram.y | 65 ++-
src/backend/parser/parse_clause.c | 42 +-
src/backend/parser/parse_relation.c | 8 +-
src/backend/rewrite/rewriteHandler.c | 5 +
src/backend/utils/time/tqual.c | 23 +
src/include/access/genam.h | 100 +++-
src/include/access/nbtree.h | 49 +-
src/include/catalog/pg_am.h | 44 +-
src/include/catalog/pg_proc.h | 2 +
src/include/executor/executor.h | 4 +-
src/include/nodes/execnodes.h | 3 +
src/include/nodes/nodes.h | 13 +
src/include/nodes/parsenodes.h | 24 +
src/include/nodes/plannodes.h | 3 +
src/include/optimizer/planmain.h | 3 +-
src/include/parser/kwlist.h | 2 +
src/include/parser/parse_clause.h | 1 +
src/include/parser/parse_relation.h | 4 +-
src/include/utils/rel.h | 1 +
src/include/utils/tqual.h | 1 +
46 files changed, 2058 insertions(+), 162 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 799242b..c8ce44b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2198,6 +2198,10 @@ JumbleQuery(pgssJumbleState *jstate, Query *query)
JumbleRangeTable(jstate, query->rtable);
JumbleExpr(jstate, (Node *) query->jointree);
JumbleExpr(jstate, (Node *) query->targetList);
+ APP_JUMB(query->specClause);
+ APP_JUMB(query->mergeIndex);
+ if (query->onConflict)
+ JumbleQuery(jstate, (Query *) query->onConflict);
JumbleExpr(jstate, (Node *) query->returningList);
JumbleExpr(jstate, (Node *) query->groupClause);
JumbleExpr(jstate, query->havingQual);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4d7575b..b0c5d62 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4101,14 +4101,15 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
*
* Function result may be:
* HeapTupleMayBeUpdated: lock was successfully acquired
+ * HeapTupleInvisible: lock failed because tuple instantaneously invisible
* HeapTupleSelfUpdated: lock failed because tuple updated by self
* HeapTupleUpdated: lock failed because tuple updated by other xact
*
- * In the failure cases, the routine fills *hufd with the tuple's t_ctid,
- * t_xmax (resolving a possible MultiXact, if necessary), and t_cmax
- * (the last only for HeapTupleSelfUpdated, since we
- * cannot obtain cmax from a combocid generated by another transaction).
- * See comments for struct HeapUpdateFailureData for additional info.
+ * In the failure cases other than HeapTupleInvisible, the routine fills *hufd
+ * with the tuple's t_ctid, t_xmax (resolving a possible MultiXact, if
+ * necessary), and t_cmax (the last only for HeapTupleSelfUpdated, since we
+ * cannot obtain cmax from a combocid generated by another transaction). See
+ * comments for struct HeapUpdateFailureData for additional info.
*
* See README.tuplock for a thorough explanation of this mechanism.
*/
@@ -4145,8 +4146,15 @@ l3:
if (result == HeapTupleInvisible)
{
- UnlockReleaseBuffer(*buffer);
- elog(ERROR, "attempted to lock invisible tuple");
+ LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * This is possible, but only when locking a tuple for speculative
+ * insertion. We return this value here rather than throwing an error
+ * in order to give that case the opportunity to throw a more specific
+ * error.
+ */
+ return HeapTupleInvisible;
}
else if (result == HeapTupleBeingUpdated)
{
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index ce44bbd..114553e 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1530,7 +1530,7 @@ toast_save_datum(Relation rel, Datum value,
&(toasttup->t_self),
toastrel,
toastidxs[i]->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO, NULL);
}
/*
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 53cf96f..1b5c1fc 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -18,8 +18,12 @@
* index_rescan - restart a scan of an index
* index_endscan - end a scan
* index_insert - insert an index tuple into a relation
+ * index_lock - lock index values speculatively
* index_markpos - mark a scan position
* index_restrpos - restore a scan position
+ * index_proceed - proceed with insert?
+ * index_release - release index locks
+ * index_tid - get duplicate/conflict tid from speculative state
* index_getnext_tid - get the next TID from a scan
* index_fetch_heap - get the scan's next heap tuple
* index_getnext - get the next heap tuple from a scan
@@ -98,6 +102,13 @@
AssertMacro(!ReindexIsProcessingIndex(RelationGetRelid(indexRelation))) \
)
+#define SPECULATIVE_CHECKS \
+( \
+ AssertMacro(!state || \
+ RelationGetRelid(indexRelation) == \
+ RelationGetRelid(state->uniqueIndex)) \
+)
+
#define SCAN_CHECKS \
( \
AssertMacro(IndexScanIsValid(scan)), \
@@ -199,6 +210,46 @@ index_close(Relation relation, LOCKMODE lockmode)
}
/* ----------------
+ * index_lock - lock index values speculatively.
+ *
+ * Returns palloc'd state indicating if the insertion would have proceeded were
+ * this not just a "dry-run". The underlying amcanunique implementation
+ * performs locking. Callers are obligated to pass this state back in a
+ * subsequent index_insert, or use the callback set here to release locks and
+ * other resources.
+ * ----------------
+ */
+SpeculativeState *
+index_lock(Relation indexRelation,
+ Datum *values,
+ bool *isnull,
+ Relation heapRelation,
+ List *otherSpecStates,
+ bool priorConflict)
+{
+ FmgrInfo *procedure;
+
+ RELATION_CHECKS;
+ GET_REL_PROCEDURE(amlock);
+
+ if (!(indexRelation->rd_am->ampredlocks))
+ CheckForSerializableConflictIn(indexRelation,
+ (HeapTuple) NULL,
+ InvalidBuffer);
+
+ /*
+ * call am's lock proc.
+ */
+ return (SpeculativeState *) DatumGetPointer(FunctionCall6(procedure,
+ PointerGetDatum(indexRelation),
+ PointerGetDatum(values),
+ PointerGetDatum(isnull),
+ PointerGetDatum(heapRelation),
+ PointerGetDatum(otherSpecStates),
+ BoolGetDatum(priorConflict)));
+}
+
+/* ----------------
* index_insert - insert an index tuple into a relation
* ----------------
*/
@@ -208,11 +259,14 @@ index_insert(Relation indexRelation,
bool *isnull,
ItemPointer heap_t_ctid,
Relation heapRelation,
- IndexUniqueCheck checkUnique)
+ IndexUniqueCheck checkUnique,
+ SpeculativeState *state)
{
FmgrInfo *procedure;
+ bool satisfies;
RELATION_CHECKS;
+ SPECULATIVE_CHECKS;
GET_REL_PROCEDURE(aminsert);
if (!(indexRelation->rd_am->ampredlocks))
@@ -223,13 +277,106 @@ index_insert(Relation indexRelation,
/*
* have the am's insert proc do all the work.
*/
- return DatumGetBool(FunctionCall6(procedure,
+ satisfies = DatumGetBool(FunctionCall7(procedure,
PointerGetDatum(indexRelation),
PointerGetDatum(values),
PointerGetDatum(isnull),
PointerGetDatum(heap_t_ctid),
PointerGetDatum(heapRelation),
- Int32GetDatum((int32) checkUnique)));
+ Int32GetDatum((int32) checkUnique),
+ PointerGetDatum(state)));
+
+ /*
+ * AM will have released private speculative insertion state and locks, but
+ * the responsibility to release generic state rests here
+ */
+ if (state)
+ pfree(state);
+
+ return satisfies;
+}
+
+/* ----------------
+ * index_proceed - should we proceed with a speculative insertion?
+ *
+ * Returns whether or not to proceed (states may be NIL). A would-be duplicate
+ * key violation on any unique index is grounds for not proceeding with the
+ * second phase of speculative insertion (as is there never being a request for
+ * speculative insertion). Total consensus is required.
+ * ----------------
+ */
+bool
+index_proceed(List *specStates)
+{
+ ListCell *lc;
+
+ foreach(lc, specStates)
+ {
+ SpeculativeState *state = lfirst(lc);
+ SpecStatus outcome = state->outcome;
+
+ Assert(outcome != INSERT_TRY_AGAIN);
+
+ if (outcome == INSERT_NO_PROCEED)
+ return false;
+ }
+
+ return true;
+}
+
+/* ----------------
+ * index_release - release resources from speculative insertion.
+ *
+ * Unique index value locks and other resources are freed here. This is called
+ * just prior to the latter phase of speculative insertion, in respect of a
+ * single slot.
+ *
+ * The AM must free its own private resources here (principally, value locks).
+ * We release the AM-generic state itself.
+ *
+ * It's possible that the AM will call this function itself directly, passing
+ * back the list of other states that that particular AM-invocation is itself
+ * passed. The need to release all value locks immediately prior to waiting on
+ * the outcome of another, conflicting xact may arise.
+ *
+ * Returns a heap item pointer for locking where applicable.
+ * ----------------
+ */
+ItemPointer
+index_release(List *specStates, bool free)
+{
+ ListCell *lc;
+ ItemPointer conflict = NULL;
+
+ /*
+ * Note that unlocking occurs in the same order as insertion would have,
+ * and so is in the opposite order to the original lock acquisition (i.e.
+ * it is "right-way-around")
+ */
+ foreach(lc, specStates)
+ {
+ SpeculativeState *state = lfirst(lc);
+
+ /* Only expecting one tuple to lock */
+ Assert(!conflict || state->outcome != INSERT_NO_PROCEED);
+ if (state->outcome == INSERT_NO_PROCEED)
+ {
+ Assert(ItemPointerIsValid(state->conflictTid));
+ conflict = state->conflictTid;
+ }
+
+ /*
+ * Only release value locks (and other resources) for unique indexes
+ * that have indicated they require it
+ */
+ if (state->unlock)
+ (*state->unlock) (state);
+
+ if (free)
+ pfree(state);
+ }
+
+ return conflict;
}
/*
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index ecee5ac..3bd4e71 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -47,10 +47,16 @@ typedef struct
static Buffer _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf);
+static void _bt_speccleanup(SpeculativeState *specState);
+static void _bt_insertinitpg(Relation rel, IndexUniqueCheck checkUnique,
+ ScanKey itup_scankey, Buffer *buf,
+ BlockNumber *bufblock, BTStack stack,
+ BTSpecOpaque btspec);
static TransactionId _bt_check_unique(Relation rel, IndexTuple itup,
Relation heapRel, Buffer buf, OffsetNumber offset,
ScanKey itup_scankey,
- IndexUniqueCheck checkUnique, bool *is_unique);
+ IndexUniqueCheck checkUnique, bool *is_unique,
+ ItemPointer dup_htid);
static void _bt_findinsertloc(Relation rel,
Buffer *bufptr,
OffsetNumber *offsetptr,
@@ -84,56 +90,359 @@ static void _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel);
/*
+ * _bt_speccleanup --- speculative insertion won't proceed callback.
+ *
+ * This is called by the core code when it turns out that an insertion of a
+ * btree index tuple will not proceed, despite the fact that an individual
+ * phased-lock call (the one whose state is passed) believed that it should.
+ *
+ * Sometimes a call that instructs the core code to call this method for each
+ * unique index (or an analogous method for possible other amcanunique access
+ * methods) may come from nbtree itself.
+ *
+ * In general, we prefer to clean up directly if it is immediately clear that
+ * speculative insertion won't proceed.
+ */
+static void
+_bt_speccleanup(SpeculativeState *specState)
+{
+ BTSpecOpaque btspec = (BTSpecOpaque) specState->privState;
+
+ if (specState->outcome == INSERT_NO_PROCEED)
+ {
+ /*
+ * We already found duplicate, and already released hwlock. Just have
+ * pin to release, but not yet.
+ *
+ * There will be a final callback for this unique index, since it is
+ * used to merge values. It is necessary to sit on the page buffer
+ * pin, to interlock against VACUUM.
+ *
+ * XXX: The index tuple whose heap tuple we are intent on locking may
+ * actually be on a later page than the currently pinned page.
+ * Discussion on VACUUM interlocking should revisit the need for this.
+ */
+ specState->outcome = INSERT_NEED_UNPIN;
+ }
+ else
+ {
+ ReleaseBuffer(btspec->lockedBuf);
+ /* Unset callback that brought control here (not strictly necessary) */
+ specState->unlock = NULL;
+ /* If already released all state other than pin, don't try it again */
+ if (specState->outcome == INSERT_NEED_UNPIN)
+ return;
+ /* Free private state memory */
+ _bt_freestack(btspec->stack);
+ pfree(btspec->itup);
+ pfree(btspec->itupScankey);
+ pfree(btspec);
+ }
+}
+
+/*
+ * _bt_lockinsert() -- Lock values for speculative insertion
+ *
+ * Obtain an exclusive heavyweight lock on appropriate leaf page, to
+ * implement a limited form of value locking that prevents concurrent
+ * insertion of conflicting values for an instant.
+ *
+ * It is the caller's responsibility to subsequently ask us to release
+ * value locks (by passing the state back to _bt_doinsert, or perhaps
+ * through our callback), though we can ask the core code to ask all other
+ * unique indexes to release just before we wait, and in general we don't
+ * hold locks if we know insertion will not proceed.
+ *
+ * specState is passed by the calling indexam proc, but responsibility for
+ * initializing its fields (with the sole exception of the cached index
+ * tuple) rests here. In addition, a list of states of those unique
+ * indexes participating in speculative insertion that have already
+ * committed to insertion is passed, in case it proves necessary to
+ * release values locks on all unique indexes and restart the first stage
+ * of speculative insertion.
+ */
+void
+_bt_lockinsert(Relation rel, Relation heapRel, SpeculativeState *specState,
+ List *otherSpecStates, bool priorConflict)
+{
+ int natts = rel->rd_rel->relnatts;
+ BTSpecOpaque btspec = specState->privState;
+ BTPageOpaque opaque;
+ ScanKey itup_scankey;
+ BTStack stack;
+ bool is_unique;
+ /* Heavyweight page lock and exclusive buffer lock acquisition avoidance */
+ bool hwlocked;
+ OffsetNumber nitems;
+ /* Buffer and pinned page for value locking */
+ Buffer buf;
+ BlockNumber bufblock;
+ Page bufpage;
+ Buffer presplitbuf;
+ OffsetNumber offset;
+ TransactionId xwait;
+
+ /* build insertion scan key */
+ itup_scankey = _bt_mkscankey(rel, btspec->itup);
+ /* find the first page containing this key */
+ stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);
+ /* set up state needed to consider skipping most locking */
+ bufblock = BufferGetBlockNumber(buf);
+ bufpage = BufferGetPage(buf);
+ opaque = (BTPageOpaque) PageGetSpecialPointer(bufpage);
+ nitems = PageGetMaxOffsetNumber(bufpage);
+
+ /*
+ * Consider trading in our read lock for a write lock. Checking if the
+ * value already exists is typically relatively cheap, so the threshold is,
+ * in a word, low. If the core code indicated a prior conflict, that means
+ * that an earlier call here returned recently, having either indicated
+ * that there was a conflicting xact or a conflicting row. A recent, known
+ * conflict against the value proposed for insertion is by far the most
+ * compelling reason to apply the optimization.
+ */
+ if (priorConflict || nitems > BTREE_PITEMS_NOLOCK)
+ {
+ /*
+ * Chances are good that a conflicting row will be returned (typically
+ * to be locked by caller). Optimistically try to find likely
+ * duplicate without acquiring hwlock, and without swapping read buffer
+ * lock for write buffer lock.
+ */
+ hwlocked = false;
+ }
+ else
+ {
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ /* now lock page of still pinned buffer, and write-lock buffer */
+hwlock:
+ /*
+ * Lock avoidance optimization did not initially appear promising, or
+ * has already been unsuccessfully attempted once, or there was a
+ * concurrent page split
+ */
+ hwlocked = true;
+ LockPage(rel, bufblock, ExclusiveLock);
+ LockBuffer(buf, BT_WRITE);
+
+ opaque = (BTPageOpaque) PageGetSpecialPointer(BufferGetPage(buf));
+
+ /*
+ * Consider the need to move right in the tree (see comments at end of
+ * _bt_insertinitpg), but defend against concurrent page splits by
+ * retrying. Respect the general protocol for heavyweight locking
+ * nbtree pages, which is that pages must be locked before any buffer
+ * is locked, and released after all buffer locks are released.
+ */
+ presplitbuf = buf;
+ buf = _bt_moveright(rel, buf, natts, itup_scankey, false, true, stack,
+ BT_WRITE);
+
+ if (buf != presplitbuf)
+ {
+ /*
+ * A page split between unlocking and relocking the first page that
+ * a would-be duplicate value could be on necessitates reacquiring
+ * all locks. The existing heavyweight lock is superficially a
+ * sufficient choke point for value locking against concurrent
+ * insertions of the value proposed for insertion, but it is
+ * actually necessary to hold only a single pinned buffer in which
+ * the heavyweight locked page is allocated, since the page is
+ * marked as locked with a flag. When regular inserters that hope
+ * to mostly avoid heavyweight lock acquisition are considered,
+ * just holding the existing lock is insufficient, since they rely
+ * on this flag to indicate that a heavyweight lock may be held by
+ * another backend on the same buffer's page.
+ *
+ * We unlock the buffer newly locked by _bt_moveright(), but keep
+ * the pin on that same buffer for next iteration.
+ */
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ UnlockPage(rel, bufblock, ExclusiveLock);
+ bufblock = BufferGetBlockNumber(buf);
+ goto hwlock;
+ }
+ }
+
+ /*
+ * Indicate that heavyweight lock is held. The flag bit may already be
+ * set, due to an aborted transaction setting it without having the
+ * opportunity to unset it, but this should be rare.
+ *
+ * Since we never release our buffer pin, we don't MarkBufferDirty().
+ */
+ if (hwlocked)
+ opaque->btpo_flags |= BTP_IS_LOCKED;
+
+ /* get page from buffer */
+ offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
+
+ /* check for duplicates in a similar fashion to _bt_insert */
+ xwait = _bt_check_unique(rel, btspec->itup, heapRel, buf, offset,
+ itup_scankey, UNIQUE_CHECK_SPEC, &is_unique,
+ specState->conflictTid);
+
+ if (TransactionIdIsValid(xwait))
+ {
+ if (hwlocked)
+ opaque->btpo_flags &= ~BTP_IS_LOCKED;
+ _bt_relbuf(rel, buf);
+ if (hwlocked)
+ UnlockPage(rel, bufblock, ExclusiveLock);
+ /* Free all state and amcanunique value locks -- not just our own */
+ index_release(otherSpecStates, true);
+ _bt_freestack(stack);
+ _bt_freeskey(itup_scankey);
+
+ /*
+ * Have the core code retry, re-locking any previous unique index
+ * values as required
+ */
+ specState->outcome = INSERT_TRY_AGAIN;
+ XactLockTableWait(xwait, heapRel, specState->conflictTid, XLTW_Lock);
+ return;
+ }
+
+ if (!hwlocked && is_unique)
+ {
+ /*
+ * A heavyweight page lock is not held.
+ *
+ * It was incorrectly predicted that insertion would not proceed. Now
+ * that it's clear that it could have safely proceeded if only the
+ * appropriate locks were held, restart. This strategy is abandoned
+ * from here (although when INSERT_TRY_AGAIN is passed back to core
+ * code, it may later inform this routine that there was a recent
+ * conflict in respect of the same unique index, which is always
+ * treated as a valid reason to apply the optimization, even
+ * repeatedly).
+ */
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+ /*
+ * Since the buffer lock is now released, it's unlikely though possible
+ * that upon reacquiring locks suitable for proceeding with insertion,
+ * insertion will turn out to be unnecessary.
+ */
+ goto hwlock;
+ }
+
+ /*
+ * This unique index individually assents to insertion proceeding, or vetos
+ * insertion from proceeding (though if row locking in the executor later
+ * fails, all bets are off and a fresh attempt at gaining consensus to
+ * proceed with "insertion proper" begins)
+ */
+ specState->outcome = is_unique? INSERT_PROCEED : INSERT_NO_PROCEED;
+ /* Stash stack for insertion proper */
+ btspec->stack = stack;
+ /* Return state to core code */
+ specState->uniqueIndex = rel;
+ btspec->lockedBuf = buf;
+
+ /* Free locks and other state no longer needed as appropriate */
+ if (specState->outcome == INSERT_NO_PROCEED)
+ {
+ /* Release buffer write lock and heavyweight lock here */
+ if (hwlocked)
+ opaque->btpo_flags &= ~BTP_IS_LOCKED;
+ /* However, keep pin */
+ LockBuffer(btspec->lockedBuf, BUFFER_LOCK_UNLOCK);
+ if (hwlocked)
+ UnlockPage(rel, bufblock, ExclusiveLock);
+ _bt_freestack(btspec->stack);
+ }
+ else
+ {
+ Assert(hwlocked);
+
+ /*
+ * Unlock buffer, but keep pin and heavyweight lock. Holding on to the
+ * buffer pin is at least necessary because we don't want to have a
+ * non-speculative inserter fail to observe that they require the
+ * heavyweight page lock associated with this leaf page (we could also
+ * mark the buffer dirty, but since we'll almost never hold a
+ * heavyweight page lock for more than an instant anyway, we prefer not
+ * to).
+ */
+ LockBuffer(btspec->lockedBuf, BUFFER_LOCK_UNLOCK);
+ /* Cache scankey for second phase too */
+ btspec->itupScankey = itup_scankey;
+ }
+
+ /* Give core code ability to undo/release */
+ specState->unlock = _bt_speccleanup;
+}
+
+/*
* _bt_doinsert() -- Handle insertion of a single index tuple in the tree.
*
- * This routine is called by the public interface routine, btinsert.
- * By here, itup is filled in, including the TID.
+ * This routine is called by the public interface routines, btbuild and
+ * btinsert. By here, itup is filled in, including the TID, unlike
+ * _bt_lockinsert where the heap tuple was not yet inserted (though we
+ * still use cached itup for speculative insertion).
+ *
+ * If checkUnique is UNIQUE_CHECK_NO or UNIQUE_CHECK_PARTIAL, this will
+ * allow duplicates. UNIQUE_CHECK_YES or UNIQUE_CHECK_EXISTING will
+ * throw errors for a duplicate. For UNIQUE_CHECK_EXISTING we merely run
+ * the duplicate check, and don't actually insert. In the
+ * UNIQUE_CHECK_SPEC case, a call here represents specualtive "insertion
+ * proper", with insertion already committed to, so uniqueness checking is
+ * redundant.
*
- * If checkUnique is UNIQUE_CHECK_NO or UNIQUE_CHECK_PARTIAL, this
- * will allow duplicates. Otherwise (UNIQUE_CHECK_YES or
- * UNIQUE_CHECK_EXISTING) it will throw error for a duplicate.
- * For UNIQUE_CHECK_EXISTING we merely run the duplicate check, and
- * don't actually insert.
+ * specState is state for the speculative insertion locking, and will be
+ * NULL for regular index tuple inserts. If speculative insertion reaches
+ * here, consensus has been reached to proceed and "insertion proper"
+ * finishing without unique constraint violations is a foregone
+ * conclusion.
*
* The result value is only significant for UNIQUE_CHECK_PARTIAL:
* it must be TRUE if the entry is known unique, else FALSE.
* (In the current implementation we'll also return TRUE after a
- * successful UNIQUE_CHECK_YES or UNIQUE_CHECK_EXISTING call, but
- * that's just a coding artifact.)
+ * successful UNIQUE_CHECK_YES, UNIQUE_CHECK_EXISTING or UNIQUE_CHECK_SPEC
+ * call, but that's just a coding artifact.)
*/
bool
_bt_doinsert(Relation rel, IndexTuple itup,
- IndexUniqueCheck checkUnique, Relation heapRel)
+ IndexUniqueCheck checkUnique, Relation heapRel,
+ SpeculativeState *specState)
{
bool is_unique = false;
int natts = rel->rd_rel->relnatts;
ScanKey itup_scankey;
BTStack stack;
Buffer buf;
- OffsetNumber offset;
-
- /* we need an insertion scan key to do our search, so build one */
- itup_scankey = _bt_mkscankey(rel, itup);
+ OffsetNumber offset = InvalidOffsetNumber;
+ BlockNumber bufblock = InvalidBlockNumber;
+ BTSpecOpaque btspec = NULL;
+ if (checkUnique != UNIQUE_CHECK_SPEC)
+ {
+ /* we need an insertion scan key to do our search, so build one */
+ itup_scankey = _bt_mkscankey(rel, itup);
top:
- /* find the first page containing this key */
- stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);
-
- offset = InvalidOffsetNumber;
-
- /* trade in our read lock for a write lock */
- LockBuffer(buf, BUFFER_LOCK_UNLOCK);
- LockBuffer(buf, BT_WRITE);
+ /* find the first page containing this key */
+ stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);
+ }
+ else
+ {
+ /* retrieve private state */
+ btspec = specState->privState;
+ /* pick up from value locking */
+ itup = btspec->itup;
+ itup_scankey = btspec->itupScankey;
+ stack = btspec->stack;
+ /* this value has already been determined to be unique */
+ is_unique = true;
+ }
/*
- * If the page was split between the time that we surrendered our read
- * lock and acquired our write lock, then this page may no longer be the
- * right place for the key we want to insert. In this case, we need to
- * move right in the tree. See Lehman and Yao for an excruciatingly
- * precise description.
+ * Convert read lock (buffer read lock or heavyweight exclusive lock that
+ * effectively functions as an intent exclusive lock) on first page
+ * containing this key to write lock as appropriate
*/
- buf = _bt_moveright(rel, buf, natts, itup_scankey, false,
- true, stack, BT_WRITE);
+ _bt_insertinitpg(rel, checkUnique, itup_scankey, &buf, &bufblock, stack,
+ btspec);
/*
* If we're not allowing duplicates, make sure the key isn't already in
@@ -142,32 +451,54 @@ top:
* NOTE: obviously, _bt_check_unique can only detect keys that are already
* in the index; so it cannot defend against concurrent insertions of the
* same key. We protect against that by means of holding a write lock on
- * the target page. Any other would-be inserter of the same key must
- * acquire a write lock on the same target page, so only one would-be
+ * the target page's buffer. Any other would-be inserter of the same key
+ * must acquire a write lock on the same target page, so only one would-be
* inserter can be making the check at one time. Furthermore, once we are
* past the check we hold write locks continuously until we have performed
- * our insertion, so no later inserter can fail to see our insertion.
- * (This requires some care in _bt_insertonpg.)
+ * our insertion, so no later inserter (or, in the analogous point in the
+ * function _btlock, the undecided speculative inserter) can fail to see
+ * our insertion (This requires some care in _bt_insertonpg). Speculative
+ * insertion staggers this process, and converts the buffer lock to a page
+ * heavyweight lock, so that if and when control reaches here, unique
+ * checking has already been performed, and the right to insert has already
+ * been established.
*
* If we must wait for another xact, we release the lock while waiting,
* and then must start over completely.
*
* For a partial uniqueness check, we don't wait for the other xact. Just
* let the tuple in and return false for possibly non-unique, or true for
- * definitely unique.
+ * definitely unique. This work is redundant for speculative insertion,
+ * and so that case also skips the check, but make sure it got things right
+ * on assert-enabled builds.
*/
- if (checkUnique != UNIQUE_CHECK_NO)
+ if (checkUnique != UNIQUE_CHECK_SPEC &&
+ checkUnique != UNIQUE_CHECK_NO)
{
- TransactionId xwait;
+ TransactionId xwait = InvalidTransactionId;
offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
xwait = _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey,
- checkUnique, &is_unique);
+ checkUnique, &is_unique, NULL);
if (TransactionIdIsValid(xwait))
{
/* Have to wait for the other guy ... */
_bt_relbuf(rel, buf);
+ /* Release heavyweight lock if held */
+ if (bufblock != InvalidBlockNumber)
+ UnlockPage(rel, bufblock, ExclusiveLock);
+
+ /*
+ * It may be necessary to sit on an already acquired page
+ * heavyweight "value lock" (on another unique index) if a
+ * speculative insertion has already established the right to
+ * insert into a single, user specified unique index, and the right
+ * to (non-speculatively) insert into this unique index is in the
+ * process of being established. It is assumed that this case is
+ * rare enough to make it not worth teaching this routine and
+ * associated executor code to release its heavyweight value lock.
+ */
XactLockTableWait(xwait, rel, &itup->t_tid, XLTW_InsertIndex);
/* start over... */
_bt_freestack(stack);
@@ -196,6 +527,10 @@ top:
_bt_relbuf(rel, buf);
}
+ /* Release heavyweight lock if held */
+ if (bufblock != InvalidBlockNumber)
+ UnlockPage(rel, bufblock, ExclusiveLock);
+
/* be tidy */
_bt_freestack(stack);
_bt_freeskey(itup_scankey);
@@ -204,6 +539,139 @@ top:
}
/*
+ * _bt_insertinitpg() -- Handle insertion's page write lock initialization
+ *
+ * This routine handles minutia relating to how each IndexUniqueCheck case must
+ * lock and unlock buffers, and perhaps the buffer's page. Some existing type
+ * of lock on *buf is converted to a buffer write lock.
+ *
+ * Caller passes a pinned buffer with the first page the scankey's value could
+ * be on. The buffer is generally already read locked, though not when the
+ * routine is called as part of the second stage of speculative insertion (i.e.
+ * UNIQUE_CHECK_SPEC), where no buffer lock is held, but rather a heavyweight
+ * exclusive page lock that similarly needs to be correctly converted to a
+ * buffer write lock. This routine is also tasked with considering the
+ * necessity of acquiring a heavyweight lock, and doing so, for relevant
+ * non-speculative insertion cases.
+ */
+static void
+_bt_insertinitpg(Relation rel, IndexUniqueCheck checkUnique,
+ ScanKey itup_scankey, Buffer *buf, BlockNumber *bufblock,
+ BTStack stack, BTSpecOpaque btspec)
+{
+ bool hwlock;
+ int natts = rel->rd_rel->relnatts;
+ BTPageOpaque opaque;
+
+ switch (checkUnique)
+ {
+ case UNIQUE_CHECK_NO:
+ /* Deferred unique constraints don't support speculative insertion */
+ case UNIQUE_CHECK_PARTIAL:
+ case UNIQUE_CHECK_EXISTING:
+ /* trade in our read lock for a write lock */
+ LockBuffer(*buf, BUFFER_LOCK_UNLOCK);
+ LockBuffer(*buf, BT_WRITE);
+
+ /* Break and consider the need to move right. */
+ break;
+ case UNIQUE_CHECK_YES:
+ /*
+ * The non-speculative unique index tuple insertion case. As with
+ * the prior case, some additional initialization is required that
+ * is already taken care of in the speculative insertion case, but
+ * unlike that case the implementation must be wary of speculative
+ * inserters.
+ */
+ opaque = (BTPageOpaque) PageGetSpecialPointer(BufferGetPage(*buf));
+
+ /* Trade in our read lock for a write lock */
+ hwlock = P_IS_LOCKED(opaque) != 0;
+do_hwlock:
+ LockBuffer(*buf, BUFFER_LOCK_UNLOCK);
+ if (hwlock)
+ {
+ /*
+ * Speculative insertion is underway for this page/buffer, so
+ * must obtain heavyweight lock to interlock against
+ * speculative inserters. Some may be between locking and
+ * insertion proper for this choke point page/buffer/value, and
+ * have only a heavyweight lock.
+ */
+ *bufblock = BufferGetBlockNumber(*buf);
+ LockPage(rel, *bufblock, ExclusiveLock);
+ }
+
+ LockBuffer(*buf, BT_WRITE);
+
+ /*
+ * When hwlock held, unset flag, preventing future unnecessary
+ * hwlock acquisition by this type of inserter
+ */
+ if (hwlock)
+ {
+ opaque->btpo_flags &= ~BTP_IS_LOCKED;
+ }
+ else if (P_IS_LOCKED(opaque))
+ {
+ /*
+ * Race -- make sure that lock does not need to be acquired
+ * once more. This should virtually never occur because the
+ * window is so small.
+ */
+ hwlock = true;
+ goto do_hwlock;
+ }
+
+ /*
+ * Finally, break and consider the need to move right. When
+ * speculative insertion is currently treating the page/buffer as a
+ * choke point, we may move right, but the existing heavyweight
+ * lock suffices because other non-speculative inserters will block
+ * on the heavyweight lock now held by this backend.
+ */
+ break;
+ case UNIQUE_CHECK_SPEC:
+ /*
+ * Speculative insertion case ("insertion proper").
+ *
+ * We already have a heavyweight page lock corresponding to this
+ * buffer from earlier value locking. Lock the buffer that the
+ * page is allocated into, acquiring a buffer lock equivalent to
+ * the one dropped at the end of value locking.
+ */
+ LockBuffer(btspec->lockedBuf, BT_WRITE);
+#ifndef USE_ASSERT_CHECKING
+ *buf = btspec->lockedBuf;
+#else
+ *buf = _bt_moveright(rel, btspec->lockedBuf, natts, itup_scankey,
+ false, true, btspec->stack, BT_WRITE);
+#endif
+ *bufblock = BufferGetBlockNumber(*buf);
+ opaque = (BTPageOpaque) PageGetSpecialPointer(BufferGetPage(*buf));
+ Assert(P_IS_LOCKED(opaque));
+ /*
+ * Mark the page as no longer heavyweight locked. From here we can
+ * rely on buffer locks to ensure no duplicates are inserted.
+ */
+ opaque->btpo_flags &= ~BTP_IS_LOCKED;
+ /* No break; no need to consider necessity of move to right */
+ Assert(*buf == btspec->lockedBuf);
+ return;
+ }
+
+ /*
+ * If the page was split between the time that we surrendered our read lock
+ * and acquired our write lock, then this page may no longer be the right
+ * place for the key we want to insert. In this case, we need to move
+ * right in the tree. See Lehman and Yao for an excruciatingly precise
+ * description.
+ */
+ *buf = _bt_moveright(rel, *buf, natts, itup_scankey, false, true, stack,
+ BT_WRITE);
+}
+
+/*
* _bt_check_unique() -- Check for violation of unique index constraint
*
* offset points to the first possible item that could conflict. It can
@@ -212,7 +680,8 @@ top:
*
* Returns InvalidTransactionId if there is no conflict, else an xact ID
* we must wait for to see if it commits a conflicting tuple. If an actual
- * conflict is detected, no return --- just ereport().
+ * conflict is detected, return when checkUnique == UNIQUE_CHECK_SPEC.
+ * Otherwise, just ereport().
*
* However, if checkUnique == UNIQUE_CHECK_PARTIAL, we always return
* InvalidTransactionId because we don't want to wait. In this case we
@@ -222,7 +691,8 @@ top:
static TransactionId
_bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
Buffer buf, OffsetNumber offset, ScanKey itup_scankey,
- IndexUniqueCheck checkUnique, bool *is_unique)
+ IndexUniqueCheck checkUnique, bool *is_unique,
+ ItemPointer dup_htid)
{
TupleDesc itupdesc = RelationGetDescr(rel);
int natts = rel->rd_rel->relnatts;
@@ -291,6 +761,9 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
curitup = (IndexTuple) PageGetItem(page, curitemid);
htid = curitup->t_tid;
+ if (dup_htid)
+ *dup_htid = htid;
+
/*
* If we are doing a recheck, we expect to find the tuple we
* are rechecking. It's not a duplicate, but we have to keep
@@ -312,6 +785,9 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
{
TransactionId xwait;
+ if (dup_htid)
+ *dup_htid = htid;
+
/*
* It is a duplicate. If we are only doing a partial
* check, then don't bother checking if the tuple is being
@@ -338,7 +814,7 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
{
if (nbuf != InvalidBuffer)
_bt_relbuf(rel, nbuf);
- /* Tell _bt_doinsert to wait... */
+ /* Tell _bt_doinsert or _bt_lockinsert to wait... */
return xwait;
}
@@ -349,6 +825,16 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
* This is a waste of time in normal scenarios but we must
* do it to support CREATE INDEX CONCURRENTLY.
*
+ * Naturally, we don't bother with this if there is no
+ * heap tuple that might have committed dead (as with
+ * speculative tuple insertion, during value locking).
+ * Furthermore, in that scenario we don't have to worry
+ * about this at all during speculative "insertion proper".
+ * This is because we now know that there never will be a
+ * heap tuple. Indeed, there won't even be a real
+ * insertion (unless row locking fails, in which case all
+ * bets are off).
+ *
* We must follow HOT-chains here because during
* concurrent index build, we insert the root TID though
* the actual tuple may be somewhere in the HOT-chain.
@@ -360,9 +846,18 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
* entry.
*/
htid = itup->t_tid;
- if (heap_hot_search(&htid, heapRel, SnapshotSelf, NULL))
+ if (checkUnique == UNIQUE_CHECK_SPEC ||
+ heap_hot_search(&htid, heapRel, SnapshotSelf, NULL))
{
- /* Normal case --- it's still live */
+ /*
+ * Normal case --- it's still live, or there is no heap
+ * tuple pointed to by itup->t_tid to begin with as
+ * this is speculative insertion value locking. Note
+ * that speculative insertion (which is now almost done
+ * with this attempt at uniqueness verification) will
+ * not save the NULL heap pointer, but htid is still
+ * set for the benefit of the case handled here.
+ */
}
else
{
@@ -379,10 +874,28 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
* release the buffer locks we're holding ---
* BuildIndexValueDescription could make catalog accesses,
* which in the worst case might touch this same index and
- * cause deadlocks.
+ * cause deadlocks. Speculative insertion also requires
+ * that this buffer be released.
*/
if (nbuf != InvalidBuffer)
_bt_relbuf(rel, nbuf);
+
+ *is_unique = false;
+
+ /*
+ * Done, but don't necessarily raise error. Where
+ * applicable, tell caller to report duplicate TID back to
+ * core code. Since it's a definite conflict TID, no xact
+ * to wait on.
+ */
+ if (checkUnique == UNIQUE_CHECK_SPEC)
+ return InvalidTransactionId;
+
+ /*
+ * The UNIQUE_CHECK_SPEC case needs the buffer to remain
+ * locked, but otherwise it must be released lest the
+ * deadlock scenario described above occur.
+ */
_bt_relbuf(rel, buf);
{
@@ -1010,10 +1523,15 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
isroot = P_ISROOT(oopaque);
isleaf = P_ISLEAF(oopaque);
- /* if we're splitting this page, it won't be the root when we're done */
- /* also, clear the SPLIT_END and HAS_GARBAGE flags in both pages */
+ /*
+ * if we're splitting this page, it won't be the root when we're done.
+ *
+ * also, clear the SPLIT_END, HAS_GARBAGE and BTP_IS_LOCKED flags in both
+ * pages.
+ */
lopaque->btpo_flags = oopaque->btpo_flags;
- lopaque->btpo_flags &= ~(BTP_ROOT | BTP_SPLIT_END | BTP_HAS_GARBAGE);
+ lopaque->btpo_flags &= ~(BTP_ROOT | BTP_SPLIT_END | BTP_HAS_GARBAGE |
+ BTP_IS_LOCKED);
ropaque->btpo_flags = lopaque->btpo_flags;
/* set flag in left page indicating that the right page has no downlink */
lopaque->btpo_flags |= BTP_INCOMPLETE_SPLIT;
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 117b18e..d77cb00 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -219,10 +219,52 @@ btbuildempty(PG_FUNCTION_ARGS)
}
/*
+ * btlock() -- First phase of speculative index tuple insertion.
+ *
+ * Descend the tree recursively, find the appropriate location for our new
+ * tuple, or an existing tuple with the same value as the value argument
+ * passed. Lock the value for an instant (the implementation performs
+ * locking at a granularity coarser than strictly necessary), preventing
+ * concurrent insertion of would-be duplicates, and let the caller know if
+ * insertion can proceed. Value locks may still be held, and exact
+ * details of who frees them and other state, and when, is arbitrated by
+ * implementation (sometimes it pro-actively releases its own locks) and
+ * is fully described within _bt_lockinsert().
+ */
+Datum
+btlock(PG_FUNCTION_ARGS)
+{
+ Relation rel = (Relation) PG_GETARG_POINTER(0);
+ Datum *values = (Datum *) PG_GETARG_POINTER(1);
+ bool *isnull = (bool *) PG_GETARG_POINTER(2);
+ Relation heapRel = (Relation) PG_GETARG_POINTER(3);
+ List *otherSpecStates = (List *) PG_GETARG_POINTER(4);
+ bool priorConflict = PG_GETARG_BOOL(5);
+ SpeculativeState *specState = palloc(sizeof(SpeculativeState));
+ BTSpecOpaque btspec = palloc(sizeof(BTSpecOpaqueData));
+
+ /* allocate conflict tid space */
+ specState->conflictTid = palloc0(sizeof(ItemPointerData));
+ /* generate an index tuple */
+ btspec->itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
+ /* set private state, set by _bt_lockinsert, for return to caller */
+ specState->privState = btspec;
+
+ _bt_lockinsert(rel, heapRel, specState, otherSpecStates, priorConflict);
+
+ PG_RETURN_POINTER(specState);
+}
+
+/*
* btinsert() -- insert an index tuple into a btree.
*
- * Descend the tree recursively, find the appropriate location for our
- * new tuple, and put it there.
+ * Descend the tree recursively, find the appropriate location for our new
+ * tuple (though in the speculative insertion case, btlock will have taken
+ * care of much of that for us already), and put it there. This can be
+ * called for a conventional insertion, or the latter phase of speculative
+ * insertion (it is called in the same manner as always for non-unique
+ * indexes during what the executor would formally consider to be the
+ * second phase of speculative insertion).
*/
Datum
btinsert(PG_FUNCTION_ARGS)
@@ -233,17 +275,34 @@ btinsert(PG_FUNCTION_ARGS)
ItemPointer ht_ctid = (ItemPointer) PG_GETARG_POINTER(3);
Relation heapRel = (Relation) PG_GETARG_POINTER(4);
IndexUniqueCheck checkUnique = (IndexUniqueCheck) PG_GETARG_INT32(5);
+ SpeculativeState *specState = (SpeculativeState *) PG_GETARG_POINTER(6);
+ BTSpecOpaque btspec;
bool result;
- IndexTuple itup;
+ IndexTuple itup = NULL;
- /* generate an index tuple */
- itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
+ if (!specState)
+ {
+ /* non-speculative insertion, generate an index tuple from scratch */
+ itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
+ }
+ else
+ {
+ btspec = (BTSpecOpaque) specState->privState;
+ /* Used cached index tuple from first phase (value locking) */
+ itup = btspec->itup;
+ }
itup->t_tid = *ht_ctid;
- result = _bt_doinsert(rel, itup, checkUnique, heapRel);
+ Assert(ItemPointerIsValid(&itup->t_tid));
+
+ result = _bt_doinsert(rel, itup, checkUnique, heapRel, specState);
pfree(itup);
+ /* free private state only - caller should always free specState */
+ if (specState)
+ pfree(specState->privState);
+
PG_RETURN_BOOL(result);
}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ee10594..68bfca2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1648,6 +1648,9 @@ BuildIndexInfo(Relation index)
ii->ii_Unique = indexStruct->indisunique;
ii->ii_ReadyForInserts = IndexIsReady(indexStruct);
+ /* merge on all indexes until we hear otherwise */
+ ii->ii_MergeOn = true;
+
/* initialize index-build state to default */
ii->ii_Concurrent = false;
ii->ii_BrokenHotChain = false;
@@ -2959,7 +2962,8 @@ validate_index_heapscan(Relation heapRelation,
&rootTuple,
heapRelation,
indexInfo->ii_Unique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+ NULL);
state->tups_inserted += 1;
}
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index 05aa56e..4c82d0d 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -139,7 +139,8 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
&(heapTuple->t_self), /* tid of heap tuple */
heapRelation,
relationDescs[i]->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO,
+ NULL);
}
ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index b0cad46..183a69d 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -162,7 +162,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
* that has already been inserted is unique.
*/
index_insert(indexRel, values, isnull, &(new_row->t_self),
- trigdata->tg_relation, UNIQUE_CHECK_EXISTING);
+ trigdata->tg_relation, UNIQUE_CHECK_EXISTING, NULL);
}
else
{
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index eb2844a..54cba48 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2333,7 +2333,7 @@ CopyFrom(CopyState cstate)
if (resultRelInfo->ri_NumIndices > 0)
recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
- estate);
+ estate, NIL);
/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, tuple,
@@ -2440,7 +2440,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
recheckIndexes =
ExecInsertIndexTuples(myslot, &(bufferedTuples[i]->t_self),
- estate);
+ estate, NIL);
ExecARInsertTriggers(estate, resultRelInfo,
bufferedTuples[i],
recheckIndexes);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 781a736..1546fa8 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -850,6 +850,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
const char *operation = NULL;
int save_indent = es->indent;
bool haschildren;
+ ModifyTable *mtplan;
switch (nodeTag(plan))
{
@@ -858,10 +859,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
break;
case T_ModifyTable:
sname = "ModifyTable";
- switch (((ModifyTable *) plan)->operation)
+ mtplan = (ModifyTable *) plan;
+ switch (mtplan->operation)
{
case CMD_INSERT:
- pname = operation = "Insert";
+ if (!mtplan->onConflictPlan)
+ pname = operation = "Insert";
+ else
+ pname = operation = "Insert or Update";
break;
case CMD_UPDATE:
pname = operation = "Update";
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d71e9b8..df77573 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2063,11 +2063,12 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode,
* case, so as to avoid the "Halloween problem" of
* repeated update attempts. In the latter case it might
* be sensible to fetch the updated tuple instead, but
- * doing so would require changing heap_lock_tuple as well
- * as heap_update and heap_delete to not complain about
- * updating "invisible" tuples, which seems pretty scary.
- * So for now, treat the tuple as deleted and do not
- * process.
+ * doing so would require changing heap_update and
+ * heap_delete to not complain about updating "invisible"
+ * tuples, which seems pretty scary (heap_lock_tuple will
+ * not complain, but few callers expect HeapTupleInvisible,
+ * and we're not one of them). So for now, treat the tuple
+ * as deleted and do not process.
*/
ReleaseBuffer(buffer);
return NULL;
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index d5e1273..05187e9 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -30,6 +30,7 @@
*
* ExecOpenIndices \
* ExecCloseIndices | referenced by InitPlan, EndPlan,
+ * ExecLockIndexValues | (optionally lock indexes first)
* ExecInsertIndexTuples / ExecInsert, ExecUpdate
*
* RegisterExprContextCallback Register function shutdown callback
@@ -54,6 +55,8 @@
static bool get_last_attnums(Node *node, ProjectionInfo *projInfo);
+static bool ExecCheckPartialPredicate(IndexInfo *info, EState *estate,
+ ExprContext *econtext);
static bool index_recheck_constraint(Relation index, Oid *constr_procs,
Datum *existing_values, bool *existing_isnull,
Datum *new_values);
@@ -978,6 +981,236 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
}
/* ----------------------------------------------------------------
+ * ExecCheckPartialPredicate
+ *
+ * Verify if predicate is satisfied -- if not, callers can skip
+ * index tuple insertion altogether
+ * ----------------------------------------------------------------
+ */
+static bool
+ExecCheckPartialPredicate(IndexInfo *info, EState *estate,
+ ExprContext *econtext)
+{
+ List *predicate;
+
+ /* No predicate, insertion must be required */
+ if (info->ii_Predicate == NIL)
+ return true;
+
+ /*
+ * Check for partial index.
+ *
+ * If predicate state not set up yet, create it (in the estate's per-query
+ * context).
+ */
+ predicate = info->ii_PredicateState;
+ if (predicate == NIL)
+ {
+ predicate = (List *)
+ ExecPrepareExpr((Expr *) info->ii_Predicate,
+ estate);
+ info->ii_PredicateState = predicate;
+ }
+
+ return ExecQual(predicate, econtext, false);
+}
+
+/* ----------------------------------------------------------------
+ * ExecLockIndexValues
+ *
+ * Acquire value locks on values that are proposed for insertion
+ * into unique indexes. Value locking prevents the concurrent
+ * insertion of conflicting index tuples into unique indexes.
+ * This is the first phase of speculative index tuple insertion.
+ *
+ * Lock values for each unique index such that it is possible to
+ * commit the core system to inserting a slot without any unique
+ * index violations, or to inform the core system to not proceed
+ * with heap insertion at all in respect of this slot.
+ *
+ * Returns a list of unique indexes that were locked. *conflict
+ * is set to the index offset of any index on which a conflict
+ * occurs, if any, and may be read back before being set locally
+ * to check for previous conflicts.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecLockIndexValues(TupleTableSlot *slot, EState *estate,
+ SpecType spec, int *conflict)
+{
+ ResultRelInfo *resultRelInfo;
+ int i;
+ int numIndices;
+ RelationPtr relationDescs;
+ Relation heapRelation;
+ IndexInfo **indexInfoArray;
+ ExprContext *econtext;
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ List *result = NIL;
+
+ /*
+ * Get information from the result relation info structure.
+ */
+ resultRelInfo = estate->es_result_relation_info;
+ numIndices = resultRelInfo->ri_NumIndices;
+ relationDescs = resultRelInfo->ri_IndexRelationDescs;
+ indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
+ heapRelation = resultRelInfo->ri_RelationDesc;
+
+ /*
+ * We will use the EState's per-tuple context for evaluating predicates
+ * and index expressions (creating it if it's not already there).
+ */
+ econtext = GetPerTupleExprContext(estate);
+
+ /* Arrange for econtext's scan tuple to be the tuple under test */
+ econtext->ecxt_scantuple = slot;
+
+lindex:
+
+ /*
+ * For each relevant unique index, form and lock the index tuple (this can
+ * be cached by amcanunique access methods for later use by
+ * ExecInsertIndexTuples())
+ *
+ * We lock the indexes in the opposite order to their usual
+ * ExecInsertIndexTuples()/release order here. That routine will always
+ * insert into any unique indexes first, before finally inserting into
+ * non-unique indexes, so as to minimize the window of lock contention.
+ * The fact that indexes are sorted in this way while locking in opposite
+ * order often implies that the locking window for the primary key is
+ * lowest of all, though this isn't strictly guaranteed.
+ */
+ for (i = numIndices - 1; i >= 0; i--)
+ {
+ Relation indexRelation = relationDescs[i];
+ IndexInfo *indexInfo;
+ SpeculativeState *state;
+
+ if (indexRelation == NULL)
+ continue;
+
+ indexInfo = indexInfoArray[i];
+
+ /* If the index is marked as read-only, ignore it */
+ if (!indexInfo->ii_ReadyForInserts)
+ continue;
+
+ /* Only value lock unique indexes */
+ if (!indexInfo->ii_Unique)
+ continue;
+
+ /* May be limited to merging on particular index */
+ if (!indexInfo->ii_MergeOn)
+ continue;
+
+ if (!indexRelation->rd_index->indimmediate)
+ {
+ switch (spec)
+ {
+ case SPEC_IGNORE:
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unsupported use of ON CONFLICT IGNORE with deferred unique constraint \"%s\"",
+ RelationGetRelationName(indexRelation)),
+ errhint("ON CONFLICT IGNORE performs immediate verification."),
+ errtableconstraint(heapRelation,
+ RelationGetRelationName(indexRelation))));
+ case SPEC_INSERT:
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("unsupported use of ON CONFLICT UPDATE with deferred unique constraint \"%s\"",
+ RelationGetRelationName(indexRelation)),
+ errhint("ON CONFLICT UPDATE performs immediate verification."),
+ errtableconstraint(heapRelation,
+ RelationGetRelationName(indexRelation))));
+ case SPEC_UPDATE:
+ case SPEC_NONE:
+ elog(ERROR, "invalid speculative insertion specification");
+ }
+ }
+
+ if (!ExecCheckPartialPredicate(indexInfo, estate, econtext))
+ continue;
+
+ /*
+ * FormIndexDatum fills in its values and isnull parameters with the
+ * appropriate values for the column(s) of the index.
+ */
+ FormIndexDatum(indexInfo,
+ slot,
+ estate,
+ values,
+ isnull);
+
+ /*
+ * The index AM does uniqueness checking.
+ */
+ state = index_lock(indexRelation,
+ values,
+ isnull,
+ heapRelation,
+ result,
+ *conflict == i);
+
+ if (state->outcome == INSERT_TRY_AGAIN)
+ {
+ /*
+ * AM indicated that we must try again, typically because it had to
+ * wait pending the outcome of another transaction.
+ *
+ * The AM lock function will have already used the list of
+ * locked-so-far states passed to it to free all locks and other
+ * resources for all previously locked indexes just before waiting.
+ *
+ * The lock starvation hazards here are minimal, since no sensible
+ * usage of speculative insertion is likely to see conflicts on
+ * multiple unique indexes at once when row locking (we only really
+ * concurrently lock all unique index values proposed for insertion
+ * to save the DML statement's author from specifying which unique
+ * index they intend to merge on; it will ordinarily be obvious to
+ * a person with application domain knowledge). When clients wish
+ * to avoid a tuple slot's insertion when a duplicate key violation
+ * would otherwise occur (i.e. IGNORE), only one
+ * conclusively-present conflict is required to give up, so lock
+ * starvation is improbable there too.
+ */
+ list_free(result);
+ result = NIL;
+
+ /* Record conflict, to hint to AM to expect this next time */
+ *conflict = i;
+
+ goto lindex;
+ }
+
+ /*
+ * Prepend the list, because later stages expect to find the state in
+ * right-way-around order, and we're working backwards
+ */
+ result = lcons(state, result);
+
+ /*
+ * There is no point in proceeding if we already have one would-be
+ * violation -- we require total consensus
+ */
+ if (state->outcome == INSERT_NO_PROCEED)
+ {
+ /*
+ * Record conflict, to hint to AM to expect this if row locking
+ * ultimately indicates that a full restart is required. Should
+ * control once again reach this function, the hint still carries.
+ */
+ *conflict = i;
+ break;
+ }
+ }
+
+ return result;
+}
+
+/* ----------------------------------------------------------------
* ExecInsertIndexTuples
*
* This routine takes care of inserting index tuples
@@ -990,7 +1223,10 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
*
* This returns a list of index OIDs for any unique or exclusion
* constraints that are deferred and that had
- * potential (unconfirmed) conflicts.
+ * potential (unconfirmed) conflicts. With speculative
+ * insertion, successful insertion without unique violations is
+ * already assured, and specStates allows the second phase to
+ * perform "insertion proper".
*
* CAUTION: this must not be called for a HOT update.
* We can't defend against that here for lack of info.
@@ -1000,11 +1236,13 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
List *
ExecInsertIndexTuples(TupleTableSlot *slot,
ItemPointer tupleid,
- EState *estate)
+ EState *estate,
+ List *specStates)
{
List *result = NIL;
ResultRelInfo *resultRelInfo;
int i;
+ int sn = 0;
int numIndices;
RelationPtr relationDescs;
Relation heapRelation;
@@ -1040,6 +1278,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
IndexInfo *indexInfo;
IndexUniqueCheck checkUnique;
bool satisfiesConstraint;
+ SpeculativeState *state;
if (indexRelation == NULL)
continue;
@@ -1050,38 +1289,22 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
if (!indexInfo->ii_ReadyForInserts)
continue;
- /* Check for partial index */
- if (indexInfo->ii_Predicate != NIL)
- {
- List *predicate;
-
- /*
- * If predicate state not set up yet, create it (in the estate's
- * per-query context)
- */
- predicate = indexInfo->ii_PredicateState;
- if (predicate == NIL)
- {
- predicate = (List *)
- ExecPrepareExpr((Expr *) indexInfo->ii_Predicate,
- estate);
- indexInfo->ii_PredicateState = predicate;
- }
+ if (!ExecCheckPartialPredicate(indexInfo, estate, econtext))
+ continue;
- /* Skip this index-update if the predicate isn't satisfied */
- if (!ExecQual(predicate, econtext, false))
- continue;
- }
+ state = specStates && indexInfo->ii_Unique && indexInfo->ii_MergeOn ?
+ list_nth(specStates, sn++) : NULL;
/*
* FormIndexDatum fills in its values and isnull parameters with the
- * appropriate values for the column(s) of the index.
+ * appropriate values for the column(s) of the index, where required.
*/
- FormIndexDatum(indexInfo,
- slot,
- estate,
- values,
- isnull);
+ if (!state)
+ FormIndexDatum(indexInfo,
+ slot,
+ estate,
+ values,
+ isnull);
/*
* The index AM does the actual insertion, plus uniqueness checking.
@@ -1089,12 +1312,21 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
* For an immediate-mode unique index, we just tell the index AM to
* throw error if not unique.
*
+ * For the latter phase of speculative insertion, uniqueness
+ * checks will already have been performed if we arrive here, and
+ * should not be repeated here. However, the user may have specified
+ * that speculative insertion is in respect of a particular unique
+ * index, in which case only that unique index will now have value
+ * locks; only tell am to insert that unique index speculatively.
+ *
* For a deferrable unique index, we tell the index AM to just detect
* possible non-uniqueness, and we add the index OID to the result
* list if further checking is needed.
*/
if (!indexRelation->rd_index->indisunique)
checkUnique = UNIQUE_CHECK_NO;
+ else if (specStates && indexInfo->ii_MergeOn)
+ checkUnique = UNIQUE_CHECK_SPEC;
else if (indexRelation->rd_index->indimmediate)
checkUnique = UNIQUE_CHECK_YES;
else
@@ -1106,7 +1338,8 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
isnull, /* null flags */
tupleid, /* tid of heap tuple */
heapRelation, /* heap relation */
- checkUnique); /* type of uniqueness check to do */
+ checkUnique, /* type of uniqueness check to do */
+ state); /* speculative state, if any */
/*
* If the index has an associated exclusion constraint, check that.
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 298d4b4..fbafcc5 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -147,10 +147,11 @@ lnext:
* case, so as to avoid the "Halloween problem" of repeated
* update attempts. In the latter case it might be sensible
* to fetch the updated tuple instead, but doing so would
- * require changing heap_lock_tuple as well as heap_update and
- * heap_delete to not complain about updating "invisible"
- * tuples, which seems pretty scary. So for now, treat the
- * tuple as deleted and do not process.
+ * require changing heap_update and heap_delete to not complain
+ * about updating "invisible" tuples, which seems pretty scary
+ * (heap_lock_tuple will not complain, but few callers expect
+ * HeapTupleInvisible, and we're not one of them). So for now,
+ * treat the tuple as deleted and do not process.
*/
goto lnext;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8ac6047..2350f60 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -52,6 +52,12 @@
#include "utils/tqual.h"
+static bool ExecLockUpdateTuple(EState *estate,
+ ResultRelInfo *relinfo,
+ ItemPointer tid,
+ TupleTableSlot *planSlot,
+ ModifyTableState *onConflict);
+
/*
* Verify that the tuples to be produced by INSERT or UPDATE match the
* target relation's rowtype
@@ -152,6 +158,38 @@ ExecProcessReturning(ProjectionInfo *projectReturning,
}
/* ----------------------------------------------------------------
+ * Verify that heap tuple is visible to transaction snapshot at higher
+ * isolation levels.
+ *
+ * It would not represent serializable behavior to proceed with avoiding
+ * insertion on the basis of another tuple that is not visible (at
+ * higher isolation levels).
+ * ----------------------------------------------------------------
+ */
+static void
+ExecCheckHeapTupleVisible(EState *estate,
+ ResultRelInfo *relinfo,
+ ItemPointer tid)
+{
+
+ Relation relation = relinfo->ri_RelationDesc;
+ Buffer buffer;
+ HeapTupleData tuple;
+
+ if (!IsolationUsesXactSnapshot())
+ return;
+
+ tuple.t_self = *tid;
+ if (!heap_fetch(relation, estate->es_snapshot, &tuple, &buffer, false,
+ NULL))
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not proceed on basis of version originating in still in progress transaction")));
+
+ ReleaseBuffer(buffer);
+}
+
+/* ----------------------------------------------------------------
* ExecInsert
*
* For INSERT, we have to insert the tuple into the target relation
@@ -163,6 +201,8 @@ ExecProcessReturning(ProjectionInfo *projectReturning,
static TupleTableSlot *
ExecInsert(TupleTableSlot *slot,
TupleTableSlot *planSlot,
+ ModifyTableState *onConflict,
+ SpecType spec,
EState *estate,
bool canSetTag)
{
@@ -171,6 +211,8 @@ ExecInsert(TupleTableSlot *slot,
Relation resultRelationDesc;
Oid newId;
List *recheckIndexes = NIL;
+ List *specStates = NIL;
+ int conflict;
/*
* get the heap tuple out of the tuple table slot, making sure we have a
@@ -228,6 +270,11 @@ ExecInsert(TupleTableSlot *slot,
}
else if (resultRelInfo->ri_FdwRoutine)
{
+ if (spec != SPEC_NONE)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("INSERT...ON CONFLICT is not supported on foreign tables")));
+
/*
* insert into foreign table: let the FDW do it
*/
@@ -259,6 +306,94 @@ ExecInsert(TupleTableSlot *slot,
ExecConstraints(resultRelInfo, slot, estate);
/*
+ * For now, if we're performing a speculative insertion, no conflict on
+ * any particular unique index is considered particularly likely. As
+ * and when conflicts emerge for this slot, further future conflicts on
+ * the same unique index are anticipated, which is hinted to the
+ * underlying AM's implementation in ExecLockIndexValues(), to
+ * facilitate optimizations around lock strength.
+ */
+ conflict = -1;
+ilock:
+ /*
+ * Lock unique index values, as required when performing speculative
+ * insertion
+ */
+ if (resultRelInfo->ri_NumIndices > 0 && spec != SPEC_NONE)
+ specStates = ExecLockIndexValues(slot, estate, spec, &conflict);
+
+ /*
+ * Check if it's required to proceed with the second phase ("insertion
+ * proper") of speculative insertion in respect of the slot. If
+ * insertion should not proceed, no firing of AFTER ROW INSERT triggers
+ * occurs.
+ *
+ * We don't suppress the effects (or, perhaps, side-effects) of BEFORE
+ * ROW INSERT triggers. This isn't ideal, but then we cannot proceed
+ * with even considering uniqueness violations until these triggers
+ * fire on the one hand, but on the other hand they have the ability to
+ * execute arbitrary user-defined code which may perform operations
+ * entirely outside the system's ability to nullify.
+ */
+ if (!index_proceed(specStates))
+ {
+ ItemPointer htup;
+
+ /*
+ * Release index value locks -- if insertion had proceeded, this
+ * would occur during index tuple insertion instead, as each index
+ * tuple is individually inserted...
+ */
+ htup = index_release(specStates, false);
+
+ /*
+ * ...locks were only needed to ensure that insertion could proceed
+ * without hindrance from concurrent insertions. Now that it's
+ * clear that insertion won't proceed (unless there is a later row
+ * lock conflict due to a concurrent deletion) there is no
+ * advantage in holding value locks until after row locks are
+ * acquired. Continuing to hold locks would not imply that there'd
+ * be fewer row locking conflicts in ExecLockUpdateTuple(), plus
+ * doing so introduces the risk of unprincipled deadlocks (i.e.
+ * deadlocks that the user cannot reasonably avoid).
+ *
+ * Value locks can only prevent concurrent unique index tuple
+ * insertion from completing, but UPDATEs and DELETEs still cause
+ * conflicts for row locking (even non-HOT updates).
+ */
+ switch (spec)
+ {
+ case SPEC_IGNORE:
+ ExecCheckHeapTupleVisible(estate, resultRelInfo, htup);
+ break;
+ case SPEC_INSERT:
+ if (!ExecLockUpdateTuple(estate, resultRelInfo, htup,
+ planSlot, onConflict))
+ {
+ /* Couldn't lock row due to conflict. Restart. */
+ index_release(specStates, true);
+ list_free(specStates);
+ goto ilock;
+ }
+ break;
+ case SPEC_UPDATE:
+ case SPEC_NONE:
+ elog(ERROR, "invalid speculative insertion specification");
+ }
+
+ /*
+ * Finally, unpin value lock buffer
+ */
+ index_release(specStates, true);
+
+ /*
+ * Speculative insertion does not project RETURNING tuples in
+ * respect of rows that were updated/ignored
+ */
+ return NULL;
+ }
+
+ /*
* insert the tuple
*
* Note: heap_insert returns the tid (location) of the new tuple in
@@ -269,10 +404,14 @@ ExecInsert(TupleTableSlot *slot,
/*
* insert index entries for tuple
+ *
+ * Locks will be acquired if needed, or the locks acquired by
+ * ExecLockIndexTuples() may be used instead.
*/
if (resultRelInfo->ri_NumIndices > 0)
recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
- estate);
+ estate,
+ specStates);
}
if (canSetTag)
@@ -285,6 +424,7 @@ ExecInsert(TupleTableSlot *slot,
/* AFTER ROW INSERT Triggers */
ExecARInsertTriggers(estate, resultRelInfo, tuple, recheckIndexes);
+ list_free(specStates);
list_free(recheckIndexes);
/* Check any WITH CHECK OPTION constraints */
@@ -768,7 +908,7 @@ lreplace:;
*/
if (resultRelInfo->ri_NumIndices > 0 && !HeapTupleIsHeapOnly(tuple))
recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
- estate);
+ estate, NIL);
}
if (canSetTag)
@@ -793,6 +933,213 @@ lreplace:;
}
+/* ----------------------------------------------------------------
+ * Try to lock tuple for update as part of speculative insertion. If
+ * a qual originating from ON CONFLICT UPDATE is satisfied, update
+ * (but still lock row, even though it may not satisfy estate's
+ * snapshot).
+ *
+ * Parameterized plans appearing in auxiliary update plan are rejected
+ * outright. Therefore, it is safe to perform update without
+ * parameters that may be intended for SPEC_INSERT plan. This
+ * effectively prevents any cases where the EvalPlanQual re-scanning
+ * performed here would be problematic (including, for example,
+ * subqueries in targetlists or quals). Some of these cases might
+ * still be possible (such as when only an initplan is required), but
+ * restrictions are applied more broadly during parsing and planning.
+ *
+ * Returns value indicating if we're done (with or without an
+ * update), or if the executor must start from scratch.
+ * ----------------------------------------------------------------
+ */
+static bool
+ExecLockUpdateTuple(EState *estate,
+ ResultRelInfo *relinfo,
+ ItemPointer tid,
+ TupleTableSlot *planSlot,
+ ModifyTableState *onConflict)
+{
+ Relation relation = relinfo->ri_RelationDesc;
+ HeapTupleData tuple;
+ HeapTuple copyTuple = NULL;
+ HeapUpdateFailureData hufd;
+ HTSU_Result test;
+ Buffer buffer;
+ TupleTableSlot *slot;
+
+ Assert(ItemPointerIsValid(tid));
+
+ /*
+ * Lock tuple for update.
+ *
+ * Like EvalPlanQualFetch(), don't follow updates. There is no actual
+ * benefit to doing so, since a conflict invalidates our previous
+ * conclusion that the tuple is the conclusively committed conflicting
+ * tuple.
+ */
+ tuple.t_self = *tid;
+ test = heap_lock_tuple(relation, &tuple,
+ estate->es_output_cid,
+ LockTupleExclusive, false, /* wait */
+ false, &buffer, &hufd);
+
+ if (test == HeapTupleMayBeUpdated)
+ copyTuple = heap_copytuple(&tuple);
+
+ switch (test)
+ {
+ case HeapTupleInvisible:
+ /*
+ * This may occur when an instantaneously invisible tuple is blamed
+ * as a conflict because multiple rows are inserted with the same
+ * constrained values.
+ *
+ * We cannot proceed, because to do so would leave users open to
+ * the risk that the same row will be updated a second time in the
+ * same command; allowing a second update affecting a single row
+ * within the same command a second time would leave the update
+ * order undefined. It is the user's responsibility to resolve
+ * these self-duplicates in advance of proposing for insertion a
+ * set of tuples, but warn them. These problems are why SQL-2003
+ * similarly specifies that for SQL MERGE, an exception must be
+ * raised in the event of an attempt to update the same row twice.
+ *
+ * XXX: It might be preferable to do something similar when a
+ * row is locked twice (and not updated twice) by the same
+ * speculative insertion, as if to take each lock acquisition as a
+ * indication of a discrete, unfulfilled intent to update (perhaps
+ * in some later command of the same xact). This does not seem
+ * feasible, though.
+ */
+ if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple.t_data)))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNIQUE_VIOLATION),
+ errmsg("could not lock instantaneously invisible tuple inserted in same transaction"),
+ errhint("Ensure that no rows proposed for insertion in the same command have constrained values that duplicate each other.")));
+
+ /* This shouldn't happen */
+ elog(ERROR, "attempted to lock invisible tuple");
+ return false; /* keep compiler quiet */
+ case HeapTupleSelfUpdated:
+ if (hufd.cmax != estate->es_output_cid)
+ {
+ /*
+ * A later command updated or deleted the tuple. Throw an
+ * error for the same reasons enumerated in ExecUpdate and
+ * ExecDelete.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
+ errmsg("tuple to be updated was already modified by an operation triggered by the current command"),
+ errhint("Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.")));
+ }
+ else
+ {
+ /*
+ * Handle this case as equivalent to the HeapTupleInvisible
+ * case.
+ *
+ * XXX: In practice this is dead code, and the problem will
+ * always be handled by the HeapTupleInvisible case, since if
+ * we updated this tuple, speculative insertion's first
+ * phase/the amcanunique AM wouldn't have concluded that it was
+ * a conflict tuple, since a dirty snapshot is used to do so,
+ * which would have seen the effects of that would-be update
+ * (i.e. this tuple would not have been visible to a dirty
+ * snapshot immediately before now).
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_UNIQUE_VIOLATION),
+ errmsg("could not lock tuple already updated or deleted by the current command"),
+ errhint("Ensure that no rows proposed for insertion in the same command have constrained values that duplicate each other.")));
+ }
+ return false; /* keep compiler quiet */
+ case HeapTupleMayBeUpdated:
+ /*
+ * Success -- we're done, as tuple is locked. Verify that the
+ * tuple is known to be visible to our snapshot under conventional
+ * MVCC rules if the current isolation level mandates that. In
+ * READ COMMITTED mode, we can lock + update a tuple still in
+ * progress according to our snapshot, but higher isolation levels
+ * cannot avail of that, and must actively defend against doing so.
+ *
+ * This loosening of snapshot isolation for the benefit of READ
+ * COMMITTED speculative insertions is used consistently:
+ * speculative quals are only tested in respect of already locked
+ * tuples. It would be rather inconsistent to UPDATE when no tuple
+ * version is visible on the one hand, while also failing to UPDATE
+ * only on the basis of the non-conclusive tuple version that
+ * happens to be the version visible to the estate snapshot.
+ */
+ if (IsolationUsesXactSnapshot() &&
+ XidInProgress(HeapTupleHeaderGetXmin(tuple.t_data),
+ estate->es_snapshot))
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not update row version originating in still in progress transaction")));
+
+ /*
+ * Conceptually, the parent ModifyTable is like a relation scan
+ * node that uses a dirty snapshot, returning rows which the
+ * auxiliary plan must operate on (if only to lock all such rows).
+ * EvalPlanQual() is involved in the evaluation of their UPDATE,
+ * regardless of whether or not the tuple is visible to the
+ * command's MVCC Snapshot.
+ */
+ EvalPlanQualBegin(&onConflict->mt_epqstate, onConflict->ps.state);
+ EvalPlanQualSetTuple(&onConflict->mt_epqstate,
+ onConflict->ps.state->es_result_relation_info->ri_RangeTableIndex,
+ copyTuple);
+ slot = EvalPlanQualNext(&onConflict->mt_epqstate);
+
+ if (!TupIsNull(slot))
+ ExecUpdate(&tuple.t_data->t_ctid, NULL, slot, planSlot,
+ &onConflict->mt_epqstate, onConflict->ps.state, false);
+
+ ReleaseBuffer(buffer);
+
+ /*
+ * As when executing a ModifyTable node in the conventional manner,
+ * Reset the per-output-tuple exprcontext
+ */
+ ResetPerTupleExprContext(onConflict->ps.state);
+ return true;
+ case HeapTupleUpdated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * Tell caller to try again from the very start. We don't use the
+ * usual EvalPlanQual looping pattern here, fundamentally because
+ * we don't have a useful qual to verify the next tuple with.
+ *
+ * We might devise a means of verifying, by way of binary equality
+ * in a similar manner to HOT codepaths, if any unique indexed
+ * columns changed, but this would only serve to ameliorate the
+ * fundamental problem. It might well not be good enough, because
+ * those columns could change too. It's not clear that doing any
+ * better here would be worth it.
+ *
+ * At this point, all bets are off -- it might actually turn out to
+ * be okay to proceed with insertion instead of locking now (the
+ * tuple we attempted to lock could have been deleted, for
+ * example). On the other hand, it might not be okay, but for an
+ * entirely different reason, with an entirely separate TID to
+ * blame and lock. This TID may not even be part of the same
+ * update chain.
+ */
+ ReleaseBuffer(buffer);
+ return false;
+ default:
+ elog(ERROR, "unrecognized heap_lock_tuple status: %u", test);
+ }
+
+ return false;
+}
+
+
/*
* Process BEFORE EACH STATEMENT triggers
*/
@@ -852,6 +1199,8 @@ ExecModifyTable(ModifyTableState *node)
{
EState *estate = node->ps.state;
CmdType operation = node->operation;
+ ModifyTableState *onConflict = (ModifyTableState *) node->onConflict;
+ SpecType spec = node->spec;
ResultRelInfo *saved_resultRelInfo;
ResultRelInfo *resultRelInfo;
PlanState *subplanstate;
@@ -1022,7 +1371,8 @@ ExecModifyTable(ModifyTableState *node)
switch (operation)
{
case CMD_INSERT:
- slot = ExecInsert(slot, planSlot, estate, node->canSetTag);
+ slot = ExecInsert(slot, planSlot, onConflict, spec, estate,
+ node->canSetTag);
break;
case CMD_UPDATE:
slot = ExecUpdate(tupleid, oldtuple, slot, planSlot,
@@ -1070,6 +1420,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
{
ModifyTableState *mtstate;
CmdType operation = node->operation;
+ Plan *onConflictPlan = node->onConflictPlan;
int nplans = list_length(node->plans);
ResultRelInfo *saved_resultRelInfo;
ResultRelInfo *resultRelInfo;
@@ -1097,6 +1448,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
mtstate->resultRelInfo = estate->es_result_relations + node->resultRelIndex;
mtstate->mt_arowmarks = (List **) palloc0(sizeof(List *) * nplans);
mtstate->mt_nplans = nplans;
+ mtstate->spec = node->spec;
/* set up epqstate with dummy subplan data for the moment */
EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
@@ -1135,8 +1487,39 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
+ {
ExecOpenIndices(resultRelInfo);
+ /*
+ * If explicit ON CONFLICT WITHIN unique index was specified,
+ * represent that value locking should only occur in that index
+ */
+ if (node->spec != SPEC_NONE && node->mergeIndex != InvalidOid)
+ {
+ RelationPtr relationDescs = resultRelInfo->ri_IndexRelationDescs;
+ IndexInfo **indexInfoArray;
+ int numIndices = resultRelInfo->ri_NumIndices;
+ int j;
+ IndexInfo *indexInfo;
+
+ indexInfoArray = resultRelInfo->ri_IndexRelationInfo;
+
+ resultRelInfo = mtstate->resultRelInfo;
+ for (j = 0; j < numIndices; j++)
+ {
+ Relation indexRelation = relationDescs[j];
+ indexInfo = indexInfoArray[j];
+
+ if (indexRelation->rd_id != node->mergeIndex)
+ indexInfo->ii_MergeOn = false;
+ else if (!indexInfo->ii_Unique)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION ),
+ errmsg("ON CONFLICT WITHIN index is not a unique index")));
+ }
+ }
+ }
+
/* Now init the plan for this result rel */
estate->es_result_relation_info = resultRelInfo;
mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
@@ -1331,7 +1714,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
resultRelInfo->ri_RelationDesc->rd_att->tdhasoid,
ExecInitExtraTupleSlot(estate));
- if (operation == CMD_UPDATE || operation == CMD_DELETE)
+ if ((operation == CMD_UPDATE || operation == CMD_DELETE) &&
+ node->spec == SPEC_NONE)
{
/* For UPDATE/DELETE, find the appropriate junk attr now */
char relkind;
@@ -1373,6 +1757,27 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
}
/*
+ * Initialize auxiliary ModifyTable node for INSERT...ON CONFLICT UPDATE.
+ *
+ * The UPDATE portion of the query is essentially represented as auxiliary
+ * to INSERT state at all stages of query processing, with a representation
+ * at each stage that is analogous to a regular UPDATE, but it is not
+ * directly represented in user-visible output such as EXPLAIN.
+ *
+ * ExecModifyTable() is never called in respect of an auxiliary
+ * ModifyTableState. Execution of the auxiliary plan is driven by its
+ * parent in an ad-hoc fashion.
+ */
+ if (onConflictPlan)
+ {
+ PlanState *pstate;
+
+ Assert(mtstate->spec == SPEC_INSERT);
+ pstate = ExecInitNode(onConflictPlan, estate, eflags);
+ mtstate->onConflict = pstate;
+ }
+
+ /*
* Set up a tuple table slot for use for trigger output tuples. In a plan
* containing multiple ModifyTable nodes, all can share one such slot, so
* we keep it in the estate.
@@ -1387,11 +1792,19 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* ModifyTable node too, but there's no need.) Note the use of lcons not
* lappend: we need later-initialized ModifyTable nodes to be shut down
* before earlier ones. This ensures that we don't throw away RETURNING
- * rows that need to be seen by a later CTE subplan.
+ * rows that need to be seen by a later CTE subplan. We do not want to
+ * append an auxiliary ON CONFLICT UPDATE node, since it must have a
+ * primary ModifyTable node that merely uses its state to selectively
+ * execute some portions of the would-be unqualified UPDATE. There are
+ * restrictions imposed upon the UPDATE that make the primary and auxiliary
+ * plans isomorphic. This is sufficient to make ExecLockUpdateTuple()
+ * safe.
*/
- if (!mtstate->canSetTag)
+ if (!mtstate->canSetTag && mtstate->spec != SPEC_UPDATE)
+ {
estate->es_auxmodifytables = lcons(mtstate,
estate->es_auxmodifytables);
+ }
return mtstate;
}
@@ -1442,6 +1855,8 @@ ExecEndModifyTable(ModifyTableState *node)
*/
for (i = 0; i < node->mt_nplans; i++)
ExecEndNode(node->mt_plans[i]);
+
+ ExecEndNode(node->onConflict);
}
void
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9f5bcd7..079b3aa 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -178,6 +178,9 @@ _copyModifyTable(const ModifyTable *from)
COPY_NODE_FIELD(resultRelations);
COPY_SCALAR_FIELD(resultRelIndex);
COPY_NODE_FIELD(plans);
+ COPY_NODE_FIELD(onConflictPlan);
+ COPY_SCALAR_FIELD(spec);
+ COPY_SCALAR_FIELD(mergeIndex);
COPY_NODE_FIELD(withCheckOptionLists);
COPY_NODE_FIELD(returningLists);
COPY_NODE_FIELD(fdwPrivLists);
@@ -2090,6 +2093,19 @@ _copyWithClause(const WithClause *from)
return newnode;
}
+static ConflictClause *
+_copyConflictClause(const ConflictClause *from)
+{
+ ConflictClause *newnode = makeNode(ConflictClause);
+
+ COPY_SCALAR_FIELD(specClause);
+ COPY_NODE_FIELD(stmt);
+ COPY_NODE_FIELD(relation);
+ COPY_LOCATION_FIELD(location);
+
+ return newnode;
+}
+
static CommonTableExpr *
_copyCommonTableExpr(const CommonTableExpr *from)
{
@@ -2494,6 +2510,9 @@ _copyQuery(const Query *from)
COPY_NODE_FIELD(jointree);
COPY_NODE_FIELD(targetList);
COPY_NODE_FIELD(withCheckOptions);
+ COPY_NODE_FIELD(onConflict);
+ COPY_SCALAR_FIELD(specClause);
+ COPY_SCALAR_FIELD(mergeIndex);
COPY_NODE_FIELD(returningList);
COPY_NODE_FIELD(groupClause);
COPY_NODE_FIELD(havingQual);
@@ -2518,6 +2537,7 @@ _copyInsertStmt(const InsertStmt *from)
COPY_NODE_FIELD(cols);
COPY_NODE_FIELD(selectStmt);
COPY_NODE_FIELD(returningList);
+ COPY_NODE_FIELD(confClause);
COPY_NODE_FIELD(withClause);
return newnode;
@@ -4653,6 +4673,9 @@ copyObject(const void *from)
case T_WithClause:
retval = _copyWithClause(from);
break;
+ case T_ConflictClause:
+ retval = _copyConflictClause(from);
+ break;
case T_CommonTableExpr:
retval = _copyCommonTableExpr(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 8d13c69..1ec7192 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -862,6 +862,9 @@ _equalQuery(const Query *a, const Query *b)
COMPARE_NODE_FIELD(jointree);
COMPARE_NODE_FIELD(targetList);
COMPARE_NODE_FIELD(withCheckOptions);
+ COMPARE_NODE_FIELD(onConflict);
+ COMPARE_SCALAR_FIELD(specClause);
+ COMPARE_SCALAR_FIELD(mergeIndex);
COMPARE_NODE_FIELD(returningList);
COMPARE_NODE_FIELD(groupClause);
COMPARE_NODE_FIELD(havingQual);
@@ -2400,6 +2403,16 @@ _equalWithClause(const WithClause *a, const WithClause *b)
}
static bool
+_equalConflictClause(const ConflictClause *a, const ConflictClause *b)
+{
+ COMPARE_SCALAR_FIELD(specClause);
+ COMPARE_NODE_FIELD(relation);
+ COMPARE_LOCATION_FIELD(location);
+
+ return true;
+}
+
+static bool
_equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
{
COMPARE_STRING_FIELD(ctename);
@@ -3117,6 +3130,9 @@ equal(const void *a, const void *b)
case T_WithClause:
retval = _equalWithClause(a, b);
break;
+ case T_ConflictClause:
+ retval = _equalConflictClause(a, b);
+ break;
case T_CommonTableExpr:
retval = _equalCommonTableExpr(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 41e973b..12735ab 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1474,6 +1474,9 @@ exprLocation(const Node *expr)
case T_WithClause:
loc = ((const WithClause *) expr)->location;
break;
+ case T_ConflictClause:
+ loc = ((const ConflictClause *) expr)->location;
+ break;
case T_CommonTableExpr:
loc = ((const CommonTableExpr *) expr)->location;
break;
@@ -1958,6 +1961,8 @@ query_tree_walker(Query *query,
return true;
if (walker((Node *) query->withCheckOptions, context))
return true;
+ if (walker(query->onConflict, context))
+ return true;
if (walker((Node *) query->returningList, context))
return true;
if (walker((Node *) query->jointree, context))
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index da75e29..3c35e4f 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -332,6 +332,9 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
WRITE_NODE_FIELD(resultRelations);
WRITE_INT_FIELD(resultRelIndex);
WRITE_NODE_FIELD(plans);
+ WRITE_NODE_FIELD(onConflictPlan);
+ WRITE_ENUM_FIELD(spec, SpecType);
+ WRITE_OID_FIELD(mergeIndex);
WRITE_NODE_FIELD(withCheckOptionLists);
WRITE_NODE_FIELD(returningLists);
WRITE_NODE_FIELD(fdwPrivLists);
@@ -2268,6 +2271,9 @@ _outQuery(StringInfo str, const Query *node)
WRITE_NODE_FIELD(jointree);
WRITE_NODE_FIELD(targetList);
WRITE_NODE_FIELD(withCheckOptions);
+ WRITE_NODE_FIELD(onConflict);
+ WRITE_ENUM_FIELD(specClause, SpecType);
+ WRITE_OID_FIELD(mergeIndex);
WRITE_NODE_FIELD(returningList);
WRITE_NODE_FIELD(groupClause);
WRITE_NODE_FIELD(havingQual);
@@ -2341,6 +2347,16 @@ _outWithClause(StringInfo str, const WithClause *node)
}
static void
+_outConflictClause(StringInfo str, const ConflictClause *node)
+{
+ WRITE_NODE_TYPE("CONFLICTCLAUSE");
+
+ WRITE_ENUM_FIELD(specClause, SpecType);
+ WRITE_NODE_FIELD(relation);
+ WRITE_LOCATION_FIELD(location);
+}
+
+static void
_outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
{
WRITE_NODE_TYPE("COMMONTABLEEXPR");
@@ -3187,6 +3203,9 @@ _outNode(StringInfo str, const void *obj)
case T_WithClause:
_outWithClause(str, obj);
break;
+ case T_ConflictClause:
+ _outConflictClause(str, obj);
+ break;
case T_CommonTableExpr:
_outCommonTableExpr(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 65584d1..62c0f3e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -213,6 +213,9 @@ _readQuery(void)
READ_NODE_FIELD(jointree);
READ_NODE_FIELD(targetList);
READ_NODE_FIELD(withCheckOptions);
+ READ_NODE_FIELD(onConflict);
+ READ_ENUM_FIELD(specClause, SpecType);
+ READ_OID_FIELD(mergeIndex);
READ_NODE_FIELD(returningList);
READ_NODE_FIELD(groupClause);
READ_NODE_FIELD(havingQual);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 0cdb790..de2c171 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -280,7 +280,13 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count)
allclauses = baserel->baserestrictinfo;
}
- if (!enable_indexscan)
+ /*
+ * TODO: Clean this up. There is no obvious way to generalize from the
+ * example of isCurrentOf, which seems more appropriate, because there
+ * isn't necessarily something to add disable_cost to within
+ * cost_qual_eval().
+ */
+ if (!enable_indexscan || root->parse->specClause == SPEC_UPDATE)
startup_cost += disable_cost;
/* we don't need to check enable_indexonlyscan; indxpath.c does that */
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4b641a2..b5966d0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4719,7 +4719,8 @@ make_modifytable(PlannerInfo *root,
CmdType operation, bool canSetTag,
List *resultRelations, List *subplans,
List *withCheckOptionLists, List *returningLists,
- List *rowMarks, int epqParam)
+ List *rowMarks, Plan *onConflictPlan, SpecType spec,
+ Oid mergeIndex, int epqParam)
{
ModifyTable *node = makeNode(ModifyTable);
Plan *plan = &node->plan;
@@ -4735,6 +4736,11 @@ make_modifytable(PlannerInfo *root,
Assert(returningLists == NIL ||
list_length(resultRelations) == list_length(returningLists));
+ if (spec != SPEC_NONE && list_length(resultRelations) > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("INSERT...ON CONFLICT does not support table inheritance")));
+
/*
* Compute cost as sum of subplan costs.
*/
@@ -4768,6 +4774,9 @@ make_modifytable(PlannerInfo *root,
node->resultRelations = resultRelations;
node->resultRelIndex = -1; /* will be set correctly in setrefs.c */
node->plans = subplans;
+ node->onConflictPlan = onConflictPlan;
+ node->spec = spec;
+ node->mergeIndex = mergeIndex;
node->withCheckOptionLists = withCheckOptionLists;
node->returningLists = returningLists;
node->rowMarks = rowMarks;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e1480cd..282b9b2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -610,7 +610,41 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
withCheckOptionLists,
returningLists,
rowMarks,
+ NULL,
+ parse->specClause,
+ parse->mergeIndex,
SS_assign_special_param(root));
+
+ if (parse->onConflict)
+ {
+ Query *conflictQry = (Query*) parse->onConflict;
+ ModifyTable *parent = (ModifyTable *) plan;
+ Plan *planOnConflict;
+
+ /*
+ * An ON CONFLICT UPDATE query is a subquery of its parent
+ * INSERT ModifyTable, but isn't formally a subplan.
+ *
+ * XXX: It may be worth blending the costs associated with
+ * this plan into its parent. Since it isn't formally a
+ * subplan, that does not occur at present.
+ */
+ planOnConflict = subquery_planner(glob, conflictQry, root,
+ hasRecursion, 0, NULL);
+
+ if (contain_subplans((Node *) conflictQry->targetList))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("UPDATE portion of ON CONFLICT contains subplans"),
+ errhint("Only trivial targetlist entries and predicates are supported.")));
+
+ if (bms_num_members(planOnConflict->allParam) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("paramaterized auxiliary UPDATE queries are unsupported")));
+
+ parent->onConflictPlan = planOnConflict;
+ }
}
}
@@ -1045,6 +1079,11 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (parse->onConflict)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("ON CONFLICT UPDATE is not supported within inheritance hierarchy")));
+
/* And last, tack on a ModifyTable node to do the UPDATE/DELETE work */
return (Plan *) make_modifytable(root,
parse->commandType,
@@ -1054,6 +1093,9 @@ inheritance_planner(PlannerInfo *root)
withCheckOptionLists,
returningLists,
rowMarks,
+ NULL,
+ parse->specClause,
+ InvalidOid,
SS_assign_special_param(root));
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 4d717df..fb378b2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -765,6 +765,14 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
root->glob->resultRelations =
list_concat(root->glob->resultRelations,
list_copy(splan->resultRelations));
+
+ if (splan->onConflictPlan)
+ {
+ splan->onConflictPlan = (Plan *) set_plan_refs(root,
+ (Plan *) splan->onConflictPlan,
+ rtoffset);
+ }
+
}
break;
case T_Append:
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 3e7dc85..81b90b0 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2305,6 +2305,8 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
valid_params,
scan_params));
}
+
+ /* No need to directly handle onConflict here */
}
break;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c0b1fe3..cdc0ab2 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -391,6 +391,8 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt)
/* done building the range table and jointree */
qry->rtable = pstate->p_rtable;
qry->jointree = makeFromExpr(pstate->p_joinlist, qual);
+ qry->specClause = SPEC_NONE;
+ qry->onConflict = NULL;
qry->hasSubLinks = pstate->p_hasSubLinks;
qry->hasWindowFuncs = pstate->p_hasWindowFuncs;
@@ -412,6 +414,7 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
{
Query *qry = makeNode(Query);
SelectStmt *selectStmt = (SelectStmt *) stmt->selectStmt;
+ SpecType spec = stmt->confClause? stmt->confClause->specClause : SPEC_NONE;
List *exprList = NIL;
bool isGeneralSelect;
List *sub_rtable;
@@ -482,8 +485,9 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
* mentioned in the SELECT part. Note that the target table is not added
* to the joinlist or namespace.
*/
- qry->resultRelation = setTargetTable(pstate, stmt->relation,
- false, false, ACL_INSERT);
+ qry->resultRelation = setTargetTable(pstate, stmt->relation, false, false,
+ ACL_INSERT |
+ (spec == SPEC_INSERT ? ACL_UPDATE : 0));
/* Validate stmt->cols list, or build default list if no list given */
icolumns = checkInsertTargets(pstate, stmt->cols, &attrnos);
@@ -762,8 +766,52 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
/* done building the range table and jointree */
qry->rtable = pstate->p_rtable;
qry->jointree = makeFromExpr(pstate->p_joinlist, NULL);
-
+ qry->specClause = spec;
qry->hasSubLinks = pstate->p_hasSubLinks;
+ qry->onConflict = NULL;
+
+ if (stmt->confClause)
+ {
+ /*
+ * ON CONFLICT UPDATE requires special parse analysis of auxiliary
+ * update Query
+ */
+ if (stmt->confClause->stmt)
+ {
+ UpdateStmt *pupd = (UpdateStmt *) stmt->confClause->stmt;
+ ParseState *sub_pstate = make_parsestate(pstate);
+ Query *dqry;
+ RangeTblEntry *subTarget;
+
+ if (!IsA(pupd, UpdateStmt))
+ elog(ERROR, "unrecognized statement in ON CONFLICT clause");
+
+ /* Assign same target relation as parent InsertStmt */
+ pupd->relation = stmt->relation;
+
+ dqry = transformStmt(sub_pstate, (Node *) pupd);
+ dqry->specClause = SPEC_UPDATE;
+ dqry->canSetTag = false;
+
+ /* Save auxiliary query */
+ qry->onConflict = (Node *) dqry;
+
+ /*
+ * Mark parent Query as requiring appropriate UPDATE/SELECT
+ * privileges
+ */
+ subTarget = sub_pstate->p_target_rangetblentry;
+
+ rte->updatedCols = bms_copy(subTarget->updatedCols);
+ rte->selectedCols = bms_union(rte->selectedCols,
+ subTarget->selectedCols);
+
+ free_parsestate(sub_pstate);
+ }
+
+ /* Look up index to limit speculative insertion merge to */
+ qry->mergeIndex = transformConflictWithinClause(pstate, stmt);
+ }
assign_query_collations(pstate, qry);
@@ -1010,6 +1058,8 @@ transformSelectStmt(ParseState *pstate, SelectStmt *stmt)
qry->rtable = pstate->p_rtable;
qry->jointree = makeFromExpr(pstate->p_joinlist, qual);
+ qry->specClause = SPEC_NONE;
+ qry->onConflict = NULL;
qry->hasSubLinks = pstate->p_hasSubLinks;
qry->hasWindowFuncs = pstate->p_hasWindowFuncs;
@@ -1951,6 +2001,8 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
qry->rtable = pstate->p_rtable;
qry->jointree = makeFromExpr(pstate->p_joinlist, qual);
+ qry->specClause = SPEC_NONE;
+ qry->onConflict = NULL;
qry->hasSubLinks = pstate->p_hasSubLinks;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 6f4d645..275600a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -215,6 +215,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
RangeVar *range;
IntoClause *into;
WithClause *with;
+ ConflictClause *conf;
A_Indices *aind;
ResTarget *target;
struct PrivTarget *privtarget;
@@ -315,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
opt_class opt_inline_handler opt_validator validator_clause
opt_collate
-%type <range> qualified_name OptConstrFromTable
+%type <range> qualified_name OptConstrFromTable OptConfWithinIndex
%type <str> all_Op MathOp
@@ -410,6 +411,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <defelt> SeqOptElem
%type <istmt> insert_rest
+%type <conf> opt_on_conflict
%type <vsetstmt> generic_set set_rest set_rest_more SetResetClause FunctionSetResetClause
@@ -507,6 +509,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <list> cte_list
%type <list> within_group_clause
+%type <node> UpdateInsertStmt
%type <node> filter_clause
%type <list> window_clause window_definition_list opt_partition_clause
%type <windef> window_definition over_clause window_specification
@@ -545,8 +548,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
CACHE CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
CLUSTER COALESCE COLLATE COLLATION COLUMN COMMENT COMMENTS COMMIT
- COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS
- CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE
+ COMMITTED CONCURRENTLY CONFIGURATION CONFLICT CONNECTION CONSTRAINT
+ CONSTRAINTS CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE
CROSS CSV CURRENT_P
CURRENT_CATALOG CURRENT_DATE CURRENT_ROLE CURRENT_SCHEMA
CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
@@ -566,7 +569,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
HANDLER HAVING HEADER_P HOLD HOUR_P
- IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
+ IDENTITY_P IF_P IGNORE ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P
INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -646,6 +649,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%nonassoc OVERLAPS
%nonassoc BETWEEN
%nonassoc IN_P
+%nonassoc DISTINCT
+%nonassoc ON
%left POSTFIXOP /* dummy for postfix Op rules */
/*
* To support target_el without AS, we must give IDENT an explicit priority
@@ -9095,11 +9100,13 @@ DeallocateStmt: DEALLOCATE name
*****************************************************************************/
InsertStmt:
- opt_with_clause INSERT INTO qualified_name insert_rest returning_clause
+ opt_with_clause INSERT INTO qualified_name insert_rest
+ opt_on_conflict returning_clause
{
$5->relation = $4;
- $5->returningList = $6;
+ $5->returningList = $7;
$5->withClause = $1;
+ $5->confClause = $6;
$$ = (Node *) $5;
}
;
@@ -9143,6 +9150,35 @@ insert_column_item:
}
;
+opt_on_conflict:
+ ON CONFLICT OptConfWithinIndex UpdateInsertStmt
+ {
+ $$ = makeNode(ConflictClause);
+ $$->relation = $3;
+ $$->stmt = $4;
+ $$->specClause = SPEC_INSERT;
+ $$->location = @1;
+ }
+ |
+ ON CONFLICT OptConfWithinIndex IGNORE
+ {
+ $$ = makeNode(ConflictClause);
+ $$->relation = $3;
+ $$->stmt = NULL;
+ $$->specClause = SPEC_IGNORE;
+ $$->location = @1;
+ }
+ | /*EMPTY*/
+ {
+ $$ = NULL;
+ }
+ ;
+
+OptConfWithinIndex:
+ WITHIN qualified_name { $$ = $2; }
+ | /*EMPTY*/ { $$ = NULL; }
+ ;
+
returning_clause:
RETURNING target_list { $$ = $2; }
| /* EMPTY */ { $$ = NIL; }
@@ -9236,6 +9272,21 @@ UpdateStmt: opt_with_clause UPDATE relation_expr_opt_alias
}
;
+UpdateInsertStmt: UPDATE
+ SET set_clause_list
+ where_clause
+ {
+ UpdateStmt *n = makeNode(UpdateStmt);
+ n->relation = NULL;
+ n->targetList = $3;
+ n->fromClause = NULL;
+ n->whereClause = $4;
+ n->returningList = NULL;
+ n->withClause = NULL;
+ $$ = (Node *)n;
+ }
+ ;
+
set_clause_list:
set_clause { $$ = $1; }
| set_clause_list ',' set_clause { $$ = list_concat($1,$3); }
@@ -12898,6 +12949,7 @@ unreserved_keyword:
| COMMIT
| COMMITTED
| CONFIGURATION
+ | CONFLICT
| CONNECTION
| CONSTRAINTS
| CONTENT_P
@@ -12957,6 +13009,7 @@ unreserved_keyword:
| HOUR_P
| IDENTITY_P
| IF_P
+ | IGNORE
| IMMEDIATE
| IMMUTABLE
| IMPLICIT_P
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 4931dca..c3d4e0c 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -17,6 +17,7 @@
#include "access/heapam.h"
#include "catalog/heap.h"
+#include "catalog/index.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
#include "nodes/makefuncs.h"
@@ -177,8 +178,8 @@ setTargetTable(ParseState *pstate, RangeVar *relation,
* free_parsestate() will eventually do the corresponding heap_close(),
* but *not* release the lock.
*/
- pstate->p_target_relation = parserOpenTable(pstate, relation,
- RowExclusiveLock);
+ pstate->p_target_relation = parserOpenRelation(pstate, relation,
+ RowExclusiveLock);
/*
* Now build an RTE.
@@ -2166,6 +2167,43 @@ get_matching_location(int sortgroupref, List *sortgrouprefs, List *exprs)
}
/*
+ * transformConflictWithinClause -
+ * transform WITHIN portion of ON CONFLICT UPDATE/IGNORE.
+ *
+ * Handles adding named unique index relation to range table. Returns Oid of
+ * index relation.
+ */
+Oid
+transformConflictWithinClause(ParseState *pstate, InsertStmt *stmt)
+{
+ RangeTblEntry *uniqueIndex;
+ Oid heapOid;
+ RangeVar *indexVar = stmt->confClause->relation;
+
+ if (!stmt->confClause->relation)
+ return InvalidOid;
+
+ uniqueIndex = addRangeTableEntry(pstate, indexVar, NULL, false, false);
+
+ addRTEtoQuery(pstate, uniqueIndex, false, true, false);
+
+ heapOid = IndexGetRelation(uniqueIndex->relid, true);
+
+ if (heapOid != RelationGetRelid(pstate->p_target_relation))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("ON CONFLICT WITHIN relation \"%s\" is not an index on target relation \"%s\"",
+ indexVar->relname, RelationGetRelationName(pstate->p_target_relation)),
+ parser_errposition(pstate,
+ exprLocation((Node *) stmt->confClause->relation))));
+ }
+
+ /* assume new rte is at end */
+ return uniqueIndex->relid;
+}
+
+/*
* addTargetToSortList
* If the given targetlist entry isn't already in the SortGroupClause
* list, add it to the end of the list, using the given sort ordering
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 364ced0..0fa9245 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -949,13 +949,13 @@ chooseScalarFunctionAlias(Node *funcexpr, char *funcname,
* LOCKMODE is typedef'd as int anyway, that seems like overkill.
*/
Relation
-parserOpenTable(ParseState *pstate, const RangeVar *relation, int lockmode)
+parserOpenRelation(ParseState *pstate, const RangeVar *relation, int lockmode)
{
Relation rel;
ParseCallbackState pcbstate;
setup_parser_errposition_callback(&pcbstate, pstate, relation->location);
- rel = heap_openrv_extended(relation, lockmode, true);
+ rel = relation_openrv_extended(relation, lockmode, true);
if (rel == NULL)
{
if (relation->schemaname)
@@ -1021,7 +1021,7 @@ addRangeTableEntry(ParseState *pstate,
* depending on whether we're doing SELECT FOR UPDATE/SHARE.
*/
lockmode = isLockedRefname(pstate, refname) ? RowShareLock : AccessShareLock;
- rel = parserOpenTable(pstate, relation, lockmode);
+ rel = parserOpenRelation(pstate, relation, lockmode);
rte->relid = RelationGetRelid(rel);
rte->relkind = rel->rd_rel->relkind;
@@ -1037,7 +1037,7 @@ addRangeTableEntry(ParseState *pstate,
* so that the table can't be deleted or have its schema modified
* underneath us.
*/
- heap_close(rel, NoLock);
+ relation_close(rel, NoLock);
/*
* Set flags and access permissions.
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index cc967f0..244085c 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3089,6 +3089,11 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
rt_entry_relation->rd_rel->relkind == RELKIND_VIEW &&
!view_has_instead_trigger(rt_entry_relation, event))
{
+ if (parsetree->specClause == SPEC_INSERT)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED ),
+ errmsg("INSERT ON CONFLICT is not supported on updatable views")));
+
/*
* This throws an error if the view can't be automatically
* updated, but that's OK since the query would fail at runtime
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 5c3f5ad..03fd12c 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1525,6 +1525,29 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
}
/*
+ * XidInProgress
+ * Public variant of XidInMVCCSnapshot.
+ *
+ * This routine indicates if a transaction is still-in-progress to a snapshot
+ * according to a classic notion of MVCC. Unlike XidInMVCCSnapshot, it
+ * accounts for the current subtransaction, as well as ancestor and child
+ * subcommitted transactions.
+ *
+ * Speculative insertion effectively avails of a special MVCC exception,
+ * because a tuple may be locked/updated that has no version visible to the
+ * updating command's MVCC snapshot. It's okay for READ COMMITTED mode to not
+ * care about having availed of this special exception (that's what it's there
+ * for), but higher isolation levels must not, and should actively call this
+ * routine to ensure that the issue has not occurred.
+ */
+bool
+XidInProgress(TransactionId xid, Snapshot snapshot)
+{
+ return !TransactionIdIsCurrentTransactionId(xid) &&
+ XidInMVCCSnapshot(xid, snapshot);
+}
+
+/*
* Is the tuple really only locked? That is, is it not updated?
*
* It's easy to check just infomask bits if the locker is not a multi; but
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index d99158f..37bbac8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -102,15 +102,103 @@ typedef struct SysScanDescData *SysScanDesc;
* call is made with UNIQUE_CHECK_EXISTING. The tuple is already in the
* index in this case, so it should not be inserted again. Rather, just
* check for conflicting live tuples (possibly blocking).
+ *
+ * INSERT...ON CONFLICT UPDATE operations involve "speculative" insertion of
+ * tuples. There is a call to establish the uniqueness of a tuple and take
+ * appropriate value locks (generally once per unique index per table slot).
+ * These locks prevent concurrent insertion of conflicting key values, and will
+ * only be held for an instant. Values are locked in the abstract; existing
+ * index tuples are not locked, because they don't yet exist. Subsequently,
+ * there may be a second, corresponding phase where insertion proper actually
+ * occurs -- index_insert() calls are made with UNIQUE_CHECK_SPEC, where
+ * insertion proper picks up from the first phase, and is guaranteed to be
+ * successful (unless "WITHIN named_unique_index" was originally specified, in
+ * which case only "named_unique_index" has been value locked, implying that
+ * other unique indexes could have duplicate key violations when index_insert()
+ * is called).
*/
typedef enum IndexUniqueCheck
{
UNIQUE_CHECK_NO, /* Don't do any uniqueness checking */
UNIQUE_CHECK_YES, /* Enforce uniqueness at insertion time */
UNIQUE_CHECK_PARTIAL, /* Test uniqueness, but no error */
- UNIQUE_CHECK_EXISTING /* Check if existing tuple is unique */
+ UNIQUE_CHECK_EXISTING, /* Check if existing tuple is unique */
+ UNIQUE_CHECK_SPEC /* Speculative phased locking insertion */
} IndexUniqueCheck;
+/*
+ * For speculative insertion's phased locking, it is necessary for the core
+ * system to have callbacks to amcanunique AMs that allow them to clean-up in
+ * the duplicate found case. This is because one first-phase call in respect
+ * of some unique index might indicate that it's okay to proceed, while another
+ * such call relating to another unique index (but the same executor-level
+ * tuple slot) indicates that it is not. In general, we cannot rely on
+ * actually reaching the second phase even if one check in the first phase says
+ * that it assents to insertion proceeding, because we need total consensus
+ * from all unique indexes that may be inserted into as part of proceeding with
+ * inserting the tuple slot (assuming there was no WITHIN unique index
+ * specification).
+ *
+ * It is the responsibility of supporting AMs to set the callback if there is
+ * clean-up to be done when not proceeding. Note, however, that the executor
+ * will not call the callback if insertion of the relevant index tuple
+ * proceeds, because the AM should take care of this itself in the second,
+ * final, optional step ("insertion proper").
+ *
+ * The callback is called twice when the core code goes to UPDATE. The first
+ * call releases all functional value locks, which is necessary to prevent
+ * unprincipled deadlocks, while the second post-lock/update call releases an
+ * interlock against VACUUM (typically a buffer pin) on the merge-on unique
+ * index.
+ *
+ * We must interlock against VACUUM because the executor may need to
+ * lock/UPDATE a tuple not visible to the command's MVCC snapshot, and it would
+ * be bad news to lock/update a heap tuple distinct from the one that the AM
+ * made a determination about in the first phase just because it happened to
+ * occupy the same heap slot. XXX: Perhaps not. Discussion on VACUUM
+ * interlocking should revisit the need for this.
+ */
+struct SpeculativeState;
+typedef void (*ReleaseIndexCallback) (struct SpeculativeState *state);
+
+/*
+ * Speculative status returned.
+ *
+ * It may be necessary to start the first phase of speculative insertion from
+ * scratch, in the event of needing to block pending the completion of another
+ * transaction, where the AM cannot reasonably sit on value locks (associated
+ * with some other, previously locked amcanunique indexes) indefinitely.
+ */
+typedef enum
+{
+ INSERT_TRY_AGAIN, /* had xact conflict - restart from first index */
+ INSERT_NO_PROCEED, /* duplicate conclusively found */
+ INSERT_PROCEED, /* no duplicate key in any index */
+ INSERT_NEED_UNPIN /* only unpin buffer */
+} SpecStatus;
+
+/*
+ * Struct representing state passed to and from clients for speculative phased
+ * insertion. This is allocated by amcanunique access methods, and passed back
+ * and forth for phased index locking.
+ *
+ * There is one such piece of state associated with every unique index
+ * participating in insertion of a slot (or a single named unique index in the
+ * event of a WITHIN specification).
+ */
+typedef struct SpeculativeState
+{
+ /* Index under consideration */
+ Relation uniqueIndex;
+ /* Opaque amcanunique state */
+ void *privState;
+ /* Callback to tell AM to clean-up */
+ ReleaseIndexCallback unlock;
+ /* Outcome of first phase (current attempt) for uniqueIndex */
+ SpecStatus outcome;
+ /* Conflicting heap tuple, if any */
+ ItemPointer conflictTid;
+} SpeculativeState;
/*
* generalized index_ interface routines (in indexam.c)
@@ -125,11 +213,17 @@ typedef enum IndexUniqueCheck
extern Relation index_open(Oid relationId, LOCKMODE lockmode);
extern void index_close(Relation relation, LOCKMODE lockmode);
+extern SpeculativeState *index_lock(Relation indexRelation,
+ Datum *values, bool *isnull,
+ Relation heapRelation,
+ List *otherSpecStates, bool priorConflict);
extern bool index_insert(Relation indexRelation,
Datum *values, bool *isnull,
ItemPointer heap_t_ctid,
- Relation heapRelation,
- IndexUniqueCheck checkUnique);
+ Relation heapRelation, IndexUniqueCheck checkUnique,
+ SpeculativeState * state);
+extern bool index_proceed(List *specStates);
+extern ItemPointer index_release(List *specStates, bool free);
extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 9fa943f..ccb673b 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -74,6 +74,7 @@ typedef BTPageOpaqueData *BTPageOpaque;
#define BTP_SPLIT_END (1 << 5) /* rightmost page of split group */
#define BTP_HAS_GARBAGE (1 << 6) /* page has LP_DEAD tuples */
#define BTP_INCOMPLETE_SPLIT (1 << 7) /* right sibling's downlink is missing */
+#define BTP_IS_LOCKED (1 << 8) /* page is heavyweight locked */
/*
* The max allowed value of a cycle ID is a bit less than 64K. This is
@@ -132,6 +133,24 @@ typedef struct BTMetaPageData
#define BTREE_NONLEAF_FILLFACTOR 70
/*
+ * BTREE_PITEMS_NOLOCK is a number of items on leaf page, representing a
+ * threshold after which speculative insertion prefers to attempt lock
+ * avoidance.
+ *
+ * This value may seem suspect, because it does not account for btree leaf
+ * fillfactor, nor BLCKSZ; in general it does not carefully consider how many
+ * items present on a leaf page meet some particular definition of "almost
+ * full", and therefore that a value proposed for insertion is likely to be
+ * found rather than newly inserted. Perhaps most woolly of all, the page
+ * under consideration may not be the only page inspected for a would-be
+ * duplicate. The value is only intended as a very rough approximation of the
+ * tipping point at which the optimization becomes profitable, and there are
+ * other, more important factors that will frequently independently make the
+ * optimization applicable.
+ */
+#define BTREE_PITEMS_NOLOCK 150
+
+/*
* Test whether two btree entries are "the same".
*
* Old comments:
@@ -180,6 +199,7 @@ typedef struct BTMetaPageData
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
#define P_INCOMPLETE_SPLIT(opaque) ((opaque)->btpo_flags & BTP_INCOMPLETE_SPLIT)
+#define P_IS_LOCKED(opaque) ((opaque)->btpo_flags & BTP_IS_LOCKED)
/*
* Lehman and Yao's algorithm requires a ``high key'' on every non-rightmost
@@ -611,6 +631,28 @@ typedef struct BTScanOpaqueData
typedef BTScanOpaqueData *BTScanOpaque;
/*
+ * BTSpecOpaqueStateData is the btree-private state that is managed as part of
+ * speculative index tuple insertion, particular to this implementation. In
+ * the first phase, the implementation stashes this private state. The state
+ * is passed back during the second phase, or resources are freed using a
+ * callback.
+ *
+ * lockedBuf is always pinned when passed back to the caller at the end of the
+ * first phase, unless the current attempt to get consensus was unsuccessful.
+ * An exlusive heavyweight lock will also be held on the page which lockedBuf
+ * is allocated into.
+ */
+typedef struct BTSpecOpaqueData
+{
+ Buffer lockedBuf; /* Buffer whose page (a leaf page) is locked */
+ BTStack stack; /* Saved stack in case of page split */
+ IndexTuple itup; /* Cached index tuple */
+ ScanKey itupScankey; /* Cached insertion scankey - redundant */
+} BTSpecOpaqueData;
+
+typedef BTSpecOpaqueData *BTSpecOpaque;
+
+/*
* We use some private sk_flags bits in preprocessed scan keys. We're allowed
* to use bits 16-31 (see skey.h). The uppermost bits are copied from the
* index's indoption[] array entry for the index attribute.
@@ -626,6 +668,7 @@ typedef BTScanOpaqueData *BTScanOpaque;
*/
extern Datum btbuild(PG_FUNCTION_ARGS);
extern Datum btbuildempty(PG_FUNCTION_ARGS);
+extern Datum btlock(PG_FUNCTION_ARGS);
extern Datum btinsert(PG_FUNCTION_ARGS);
extern Datum btbeginscan(PG_FUNCTION_ARGS);
extern Datum btgettuple(PG_FUNCTION_ARGS);
@@ -642,8 +685,12 @@ extern Datum btoptions(PG_FUNCTION_ARGS);
/*
* prototypes for functions in nbtinsert.c
*/
+extern void _bt_lockinsert(Relation rel, Relation heapRel,
+ SpeculativeState *specState, List *otherSpecStates,
+ bool priorConflict);
extern bool _bt_doinsert(Relation rel, IndexTuple itup,
- IndexUniqueCheck checkUnique, Relation heapRel);
+ IndexUniqueCheck checkUnique, Relation heapRel,
+ SpeculativeState *specState);
extern Buffer _bt_getstackbuf(Relation rel, BTStack stack, int access);
extern void _bt_finish_split(Relation rel, Buffer bbuf, BTStack stack);
diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
index 759ea70..f3b17c5 100644
--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -52,6 +52,7 @@ CATALOG(pg_am,2601)
bool amclusterable; /* does AM support cluster command? */
bool ampredlocks; /* does AM handle predicate locks? */
Oid amkeytype; /* type of data in index, or InvalidOid */
+ regproc amlock; /* "speculative insertion" function */
regproc aminsert; /* "insert this tuple" function */
regproc ambeginscan; /* "prepare for index scan" function */
regproc amgettuple; /* "next valid tuple" function, or 0 */
@@ -80,7 +81,7 @@ typedef FormData_pg_am *Form_pg_am;
* compiler constants for pg_am
* ----------------
*/
-#define Natts_pg_am 30
+#define Natts_pg_am 31
#define Anum_pg_am_amname 1
#define Anum_pg_am_amstrategies 2
#define Anum_pg_am_amsupport 3
@@ -96,40 +97,41 @@ typedef FormData_pg_am *Form_pg_am;
#define Anum_pg_am_amclusterable 13
#define Anum_pg_am_ampredlocks 14
#define Anum_pg_am_amkeytype 15
-#define Anum_pg_am_aminsert 16
-#define Anum_pg_am_ambeginscan 17
-#define Anum_pg_am_amgettuple 18
-#define Anum_pg_am_amgetbitmap 19
-#define Anum_pg_am_amrescan 20
-#define Anum_pg_am_amendscan 21
-#define Anum_pg_am_ammarkpos 22
-#define Anum_pg_am_amrestrpos 23
-#define Anum_pg_am_ambuild 24
-#define Anum_pg_am_ambuildempty 25
-#define Anum_pg_am_ambulkdelete 26
-#define Anum_pg_am_amvacuumcleanup 27
-#define Anum_pg_am_amcanreturn 28
-#define Anum_pg_am_amcostestimate 29
-#define Anum_pg_am_amoptions 30
+#define Anum_pg_am_amlock 16
+#define Anum_pg_am_aminsert 17
+#define Anum_pg_am_ambeginscan 18
+#define Anum_pg_am_amgettuple 19
+#define Anum_pg_am_amgetbitmap 20
+#define Anum_pg_am_amrescan 21
+#define Anum_pg_am_amendscan 22
+#define Anum_pg_am_ammarkpos 23
+#define Anum_pg_am_amrestrpos 24
+#define Anum_pg_am_ambuild 25
+#define Anum_pg_am_ambuildempty 26
+#define Anum_pg_am_ambulkdelete 27
+#define Anum_pg_am_amvacuumcleanup 28
+#define Anum_pg_am_amcanreturn 29
+#define Anum_pg_am_amcostestimate 30
+#define Anum_pg_am_amoptions 31
/* ----------------
* initial contents of pg_am
* ----------------
*/
-DATA(insert OID = 403 ( btree 5 2 t f t t t t t t f t t 0 btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcanreturn btcostestimate btoptions ));
+DATA(insert OID = 403 ( btree 5 2 t f t t t t t t f t t 0 btlock btinsert btbeginscan btgettuple btgetbitmap btrescan btendscan btmarkpos btrestrpos btbuild btbuildempty btbulkdelete btvacuumcleanup btcanreturn btcostestimate btoptions ));
DESCR("b-tree index access method");
#define BTREE_AM_OID 403
-DATA(insert OID = 405 ( hash 1 1 f f t f f f f f f f f 23 hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup - hashcostestimate hashoptions ));
+DATA(insert OID = 405 ( hash 1 1 f f t f f f f f f f f 23 - hashinsert hashbeginscan hashgettuple hashgetbitmap hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbuildempty hashbulkdelete hashvacuumcleanup - hashcostestimate hashoptions ));
DESCR("hash index access method");
#define HASH_AM_OID 405
-DATA(insert OID = 783 ( gist 0 8 f t f f t t f t t t f 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup - gistcostestimate gistoptions ));
+DATA(insert OID = 783 ( gist 0 8 f t f f t t f t t t f 0 - gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup - gistcostestimate gistoptions ));
DESCR("GiST index access method");
#define GIST_AM_OID 783
-DATA(insert OID = 2742 ( gin 0 6 f f f f t t f f t f f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup - gincostestimate ginoptions ));
+DATA(insert OID = 2742 ( gin 0 6 f f f f t t f f t f f 0 - gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup - gincostestimate ginoptions ));
DESCR("GIN index access method");
#define GIN_AM_OID 2742
-DATA(insert OID = 4000 ( spgist 0 5 f f f f f t f t f f f 0 spginsert spgbeginscan spggettuple spggetbitmap spgrescan spgendscan spgmarkpos spgrestrpos spgbuild spgbuildempty spgbulkdelete spgvacuumcleanup spgcanreturn spgcostestimate spgoptions ));
+DATA(insert OID = 4000 ( spgist 0 5 f f f f f t f t f f f 0 - spginsert spgbeginscan spggettuple spggetbitmap spgrescan spgendscan spgmarkpos spgrestrpos spgbuild spgbuildempty spgbulkdelete spgvacuumcleanup spgcanreturn spgcostestimate spgoptions ));
DESCR("SP-GiST index access method");
#define SPGIST_AM_OID 4000
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index a84595e..eb7f203 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -538,6 +538,8 @@ DATA(insert OID = 330 ( btgettuple PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0
DESCR("btree(internal)");
DATA(insert OID = 636 ( btgetbitmap PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 20 "2281 2281" _null_ _null_ _null_ _null_ btgetbitmap _null_ _null_ _null_ ));
DESCR("btree(internal)");
+DATA(insert OID = 3218 ( btlock PGNSP PGUID 12 1 0 0 0 f f f f t f v 6 0 2278 "2281 2281 2281 2281 2281 16" _null_ _null_ _null_ _null_ btlock _null_ _null_ _null_ ));
+DESCR("btree(internal)");
DATA(insert OID = 331 ( btinsert PGNSP PGUID 12 1 0 0 0 f f f f t f v 6 0 16 "2281 2281 2281 2281 2281 2281" _null_ _null_ _null_ _null_ btinsert _null_ _null_ _null_ ));
DESCR("btree(internal)");
DATA(insert OID = 333 ( btbeginscan PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ btbeginscan _null_ _null_ _null_ ));
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 239aff3..1237c82 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -349,8 +349,10 @@ extern void ExecCloseScanRelation(Relation scanrel);
extern void ExecOpenIndices(ResultRelInfo *resultRelInfo);
extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
+extern List *ExecLockIndexValues(TupleTableSlot *slot, EState *estate,
+ SpecType specReason, int *conflict);
extern List *ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid,
- EState *estate);
+ EState *estate, List *specStates);
extern bool check_exclusion_constraint(Relation heap, Relation index,
IndexInfo *indexInfo,
ItemPointer tupleid,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b271f21..efb171d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -63,6 +63,7 @@ typedef struct IndexInfo
Oid *ii_ExclusionProcs; /* array with one entry per column */
uint16 *ii_ExclusionStrats; /* array with one entry per column */
bool ii_Unique;
+ bool ii_MergeOn;
bool ii_ReadyForInserts;
bool ii_Concurrent;
bool ii_BrokenHotChain;
@@ -1087,6 +1088,8 @@ typedef struct ModifyTableState
int mt_whichplan; /* which one is being executed (0..n-1) */
ResultRelInfo *resultRelInfo; /* per-subplan target relations */
List **mt_arowmarks; /* per-subplan ExecAuxRowMark lists */
+ SpecType spec; /* reason for speculative insertion */
+ PlanState *onConflict; /* associated OnConflict state */
EPQState mt_epqstate; /* for evaluating EvalPlanQual rechecks */
bool fireBSTriggers; /* do we need to fire stmt triggers? */
} ModifyTableState;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a031b88..f173d11 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -407,6 +407,7 @@ typedef enum NodeTag
T_RowMarkClause,
T_XmlSerialize,
T_WithClause,
+ T_ConflictClause,
T_CommonTableExpr,
/*
@@ -619,4 +620,16 @@ typedef enum JoinType
(1 << JOIN_RIGHT) | \
(1 << JOIN_ANTI))) != 0)
+/* SpecType - "Speculative insertion" clause
+ *
+ * This also appears across various subsystems
+ */
+typedef enum
+{
+ SPEC_NONE, /* Not involved in speculative insertion */
+ SPEC_IGNORE, /* INSERT of "ON CONFLICT IGNORE" */
+ SPEC_INSERT, /* INSERT of "ON CONFLICT UPDATE" */
+ SPEC_UPDATE /* UPDATE of "ON CONFLICT UPDATE" */
+} SpecType;
+
#endif /* NODES_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2c5d842..21b81ef 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -130,6 +130,14 @@ typedef struct Query
List *withCheckOptions; /* a list of WithCheckOption's */
+ Node *onConflict; /* ON CONFLICT Query */
+
+ SpecType specClause; /* speculative insertion clause */
+
+ Oid mergeIndex; /* rtable index of unique index
+ * relation for speculative insertion; 0 when
+ * unspecified */
+
List *returningList; /* return-values list (of TargetEntry) */
List *groupClause; /* a list of SortGroupClause's */
@@ -996,6 +1004,21 @@ typedef struct WithClause
} WithClause;
/*
+ * ConflictClause -
+ * representation of ON CONFLICT clause
+ *
+ * Note: ConflictClause does not propagate into the Query representation
+ */
+typedef struct ConflictClause
+{
+ NodeTag type;
+ SpecType specClause; /* Variant specified */
+ Node *stmt; /* Update/Delete parse stmt */
+ RangeVar *relation; /* unique index to merge on, or NULL */
+ int location; /* token location, or -1 if unknown */
+} ConflictClause;
+
+/*
* CommonTableExpr -
* representation of WITH list element
*
@@ -1046,6 +1069,7 @@ typedef struct InsertStmt
List *cols; /* optional: names of the target columns */
Node *selectStmt; /* the source SELECT/VALUES, or NULL */
List *returningList; /* list of expressions to return */
+ ConflictClause *confClause; /* ON CONFLICT clause */
WithClause *withClause; /* WITH clause */
} InsertStmt;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 3b9c683..2ff83ad 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -172,6 +172,9 @@ typedef struct ModifyTable
List *resultRelations; /* integer list of RT indexes */
int resultRelIndex; /* index of first resultRel in plan's list */
List *plans; /* plan(s) producing source data */
+ Plan *onConflictPlan; /* Plan for ON CONFLICT UPDATE auxiliary query */
+ SpecType spec; /* speculative insertion specification */
+ Oid mergeIndex; /* Oid of merge index relation */
List *withCheckOptionLists; /* per-target-table WCO lists */
List *returningLists; /* per-target-table RETURNING tlists */
List *fdwPrivLists; /* per-target-table FDW private data lists */
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 4504250..bd93f67 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -84,7 +84,8 @@ extern ModifyTable *make_modifytable(PlannerInfo *root,
CmdType operation, bool canSetTag,
List *resultRelations, List *subplans,
List *withCheckOptionLists, List *returningLists,
- List *rowMarks, int epqParam);
+ List *rowMarks, Plan *onConflictPlan, SpecType spec,
+ Oid mergeIndex, int epqParam);
extern bool is_projection_capable_plan(Plan *plan);
/*
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 17888ad..d5d0589 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -87,6 +87,7 @@ PG_KEYWORD("commit", COMMIT, UNRESERVED_KEYWORD)
PG_KEYWORD("committed", COMMITTED, UNRESERVED_KEYWORD)
PG_KEYWORD("concurrently", CONCURRENTLY, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("configuration", CONFIGURATION, UNRESERVED_KEYWORD)
+PG_KEYWORD("conflict", CONFLICT, UNRESERVED_KEYWORD)
PG_KEYWORD("connection", CONNECTION, UNRESERVED_KEYWORD)
PG_KEYWORD("constraint", CONSTRAINT, RESERVED_KEYWORD)
PG_KEYWORD("constraints", CONSTRAINTS, UNRESERVED_KEYWORD)
@@ -180,6 +181,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD)
PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD)
PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD)
PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD)
+PG_KEYWORD("ignore", IGNORE, UNRESERVED_KEYWORD)
PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD)
PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD)
PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD)
diff --git a/src/include/parser/parse_clause.h b/src/include/parser/parse_clause.h
index e9e7cdc..6c7cf67 100644
--- a/src/include/parser/parse_clause.h
+++ b/src/include/parser/parse_clause.h
@@ -41,6 +41,7 @@ extern List *transformDistinctClause(ParseState *pstate,
List **targetlist, List *sortClause, bool is_agg);
extern List *transformDistinctOnClause(ParseState *pstate, List *distinctlist,
List **targetlist, List *sortClause);
+extern Oid transformConflictWithinClause(ParseState *pstate, InsertStmt *stmt);
extern List *addTargetToSortList(ParseState *pstate, TargetEntry *tle,
List *sortlist, List *targetlist, SortBy *sortby,
diff --git a/src/include/parser/parse_relation.h b/src/include/parser/parse_relation.h
index d8b9493..56cd10e 100644
--- a/src/include/parser/parse_relation.h
+++ b/src/include/parser/parse_relation.h
@@ -40,8 +40,8 @@ extern Node *colNameToVar(ParseState *pstate, char *colname, bool localonly,
int location);
extern void markVarForSelectPriv(ParseState *pstate, Var *var,
RangeTblEntry *rte);
-extern Relation parserOpenTable(ParseState *pstate, const RangeVar *relation,
- int lockmode);
+extern Relation parserOpenRelation(ParseState *pstate,
+ const RangeVar *relation, int lockmode);
extern RangeTblEntry *addRangeTableEntry(ParseState *pstate,
RangeVar *relation,
Alias *alias,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 37b6cbb..8f8583c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -52,6 +52,7 @@ typedef LockInfoData *LockInfo;
*/
typedef struct RelationAmInfo
{
+ FmgrInfo amlock;
FmgrInfo aminsert;
FmgrInfo ambeginscan;
FmgrInfo amgettuple;
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index ae285c3..3e8ed88 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -89,6 +89,7 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup,
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid);
extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
+extern bool XidInProgress(TransactionId xid, Snapshot snapshot);
/*
* To avoid leaking to much knowledge about reorderbuffer implementation
--
1.9.1
On 08/28/2014 04:43 AM, Peter Geoghegan wrote:
-- Nesting within wCTE:
WITH t AS (
INSERT INTO z SELECT i, 'insert'
FROM generate_series(0, 16) i
ON CONFLICT UPDATE SET v = v || 'update' -- use of
operators/functions in targetlist
RETURNING * -- only projects inserted tuples, never updated tuples
)
SELECT * FROM t JOIN y ON t.k = y.a ORDER BY a, k;
Personally I would find it surprising if RETURNING did not also return
the updated tuples. In many use cases for upsert the user does not care
if the row was new or not.
What I think would be useful is if all tuples were returned but there
was some way to filter out only the inserted ones.
--
Andreas Karlsson
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 28, 2014 at 7:29 AM, Andreas Karlsson <andreas@proxel.se> wrote:
Personally I would find it surprising if RETURNING did not also return the
updated tuples. In many use cases for upsert the user does not care if the
row was new or not.
I'm not attached to that particular behavior, but it does seem kind of
similar to the behavior of BEFORE triggers, where a NULL return value
("do nothing") will also cause RETURNING to not project the tuple.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/28/2014 09:05 PM, Peter Geoghegan wrote:
On Thu, Aug 28, 2014 at 7:29 AM, Andreas Karlsson <andreas@proxel.se> wrote:
Personally I would find it surprising if RETURNING did not also return the
updated tuples. In many use cases for upsert the user does not care if the
row was new or not.I'm not attached to that particular behavior, but it does seem kind of
similar to the behavior of BEFORE triggers, where a NULL return value
("do nothing") will also cause RETURNING to not project the tuple.
I see. So we have three cases where we may or may not want to project a
tuple.
1) The tuple was inserted
2) We got a conflict and updated the tuple
3) We got a conflict but skipped updating the tuple
My personal intuition was that (1) and (2) would be returned but not
(3). But I am not sure if that is the most useful behavior.
--
Andreas Karlsson
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan <pg@heroku.com> wrote:
There are some restrictions on what this auxiliary update may do, but
FWIW there are considerably fewer than those that the equivalent MySQL
or SQLite feature imposes on their users.
I realized that I missed a few cases here. For one thing, the posted
patch fails to arrange for the UPDATE post-parse-analysis tree
representation to go through the rewriter stage (on the theory that
user-defined rules shouldn't be able to separately affect the
auxiliary UPDATE query tree), but rewriting is at least necessary so
that rewriteTargetListIU() can expand a "SET val = DEFAULT"
targetlist, as well as normalize the ordering of the UPDATE's tlist.
Separately, the patch fails to defend against certain queries that
ought to be disallowed, where a subselect is specified with a subquery
expression in the auxiliary UPDATE's WHERE clause.
These are garden-variety bugs that aren't likely to affect the kind of
high-level design discussion that I'm looking for here. I'll post a
fixed version in a few days time.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 28, 2014 at 8:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
I realized that I missed a few cases here. For one thing, the posted
patch fails to arrange for the UPDATE post-parse-analysis tree
representation to go through the rewriter stage (on the theory that
user-defined rules shouldn't be able to separately affect the
auxiliary UPDATE query tree), but rewriting is at least necessary so
that rewriteTargetListIU() can expand a "SET val = DEFAULT"
targetlist, as well as normalize the ordering of the UPDATE's tlist.
Separately, the patch fails to defend against certain queries that
ought to be disallowed, where a subselect is specified with a subquery
expression in the auxiliary UPDATE's WHERE clause.
Attached revision fixes all of these issues. I've added regression
tests for each bug, too, although all changes are rebased into my
original commits.
I decided to explicitly rely on a simpler approach to VACUUM
interlocking. I no longer bother holding on to a buffer pin for a
period longer than the period that associated "value locks" are held,
which was something I talked about at the start of this thread. There
is a note on this added to the nbtree README, just after the master
branch's current remarks on B-Tree VACUUM interlocking.
I've also pushed the responsibility for supporting this new feature on
foreign tables onto FDWs themselves. The only writable FDW we
currently ship, postgres_fdw, lacks support for the new feature, but
this can be revisited in due course. My impression is that the task of
adding support is not quite a straightforward matter of adding a bit
more deparsing logic, but also isn't significantly more difficult than
that.
--
Peter Geoghegan
Attachments:
On Wed, Aug 27, 2014 at 10:43 PM, Peter Geoghegan <pg@heroku.com> wrote:
Example usage:
INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT UPDATE
SET val = 'update';
I think that syntax is a dramatic improvement over your previous
proposals. The only part I don't entire like is this:
INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN
upsert_pkey UPDATE SET val = 'update';
It seems to me that it would be better to specify a conflicting column
set rather than a conflicting index name.
I don't have much in the way of comments about the implementation, at
least not right at the moment, but...
Essentially, the implementation has all stages of query processing
During the execution of the parent ModifyTable, a special auxiliary
subquery (the UPDATE ModifyTable) is considered as a special case.
This is not a subplan of the ModifyTable node in the conventional
sense, and so does not appear within EXPLAIN output.
...that sounds wonky.
I already mentioned the inability to reference rejected rows in an
UPDATE, as well as my unease about VACUUM interlocking, both of which
are open item. Also, some of the restrictions that I already mentioned
- on updatable views, inheritance, and foreign tables - are probably
unnecessary. We should be able to come with reasonable behavior for at
least some of those.
If you've noted my comments on the UPDATE/DELETE .. ORDER BY .. LIMIT
thread, you won't be surprised to hear that I think those restrictions
will need to be lifted - especially for inheritance, but probably the
others as well.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Essentially, the implementation has all stages of query processing
During the execution of the parent ModifyTable, a special auxiliary
subquery (the UPDATE ModifyTable) is considered as a special case.
This is not a subplan of the ModifyTable node in the conventional
sense, and so does not appear within EXPLAIN output....that sounds wonky.
Which part? It certainly wouldn't be helpful if the (say) auxiliary
plan's "sequential scan" appeared within EXPLAIN output. That's just
an implementation detail. Note that the structure of the plan is
highly restricted, since it needs to be "driven by the insert" (or,
rather, the insert's conflicts, including conflicts not visible to the
command's MVCC snapshot). There won't be any interesting variation in
the plan. Although, that said, the implementation should probably
display any "Filter: ..." conditions implied by the special UPDATE
qual.
If you've noted my comments on the UPDATE/DELETE .. ORDER BY .. LIMIT
thread, you won't be surprised to hear that I think those restrictions
will need to be lifted - especially for inheritance, but probably the
others as well.
Sure.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN
upsert_pkey UPDATE SET val = 'update';It seems to me that it would be better to specify a conflicting column
set rather than a conflicting index name.
I'm open to pursuing that, provided there is a possible implementation
that's robust against things like BEFORE triggers that modify
constrained attributes. It must also work well with partial unique
indexes. So I imagine we'd have to determine a way of looking up the
unique index only after BEFORE triggers fire. Unless you're
comfortable with punting on some of these cases by throwing an error,
then all of this is actually surprisingly ticklish. You've already
expressed concerns about the feature not playing nice with existing,
peripheral features though.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 3, 2014 at 2:13 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Essentially, the implementation has all stages of query processing
During the execution of the parent ModifyTable, a special auxiliary
subquery (the UPDATE ModifyTable) is considered as a special case.
This is not a subplan of the ModifyTable node in the conventional
sense, and so does not appear within EXPLAIN output....that sounds wonky.
Which part? It certainly wouldn't be helpful if the (say) auxiliary
plan's "sequential scan" appeared within EXPLAIN output. That's just
an implementation detail. Note that the structure of the plan is
highly restricted, since it needs to be "driven by the insert" (or,
rather, the insert's conflicts, including conflicts not visible to the
command's MVCC snapshot). There won't be any interesting variation in
the plan. Although, that said, the implementation should probably
display any "Filter: ..." conditions implied by the special UPDATE
qual.
I think there shouldn't be any plan nodes in the system that don't get
displayed by explain. If you're using a plan node for something, and
think it shouldn't be displayed by explain, then either (1) you are
wrong or (2) you are abusing the plan node.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 4, 2014 at 8:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I think there shouldn't be any plan nodes in the system that don't get
displayed by explain. If you're using a plan node for something, and
think it shouldn't be displayed by explain, then either (1) you are
wrong or (2) you are abusing the plan node.
Maybe. I admit that I'm not entirely confident that the representation
of the auxiliary state during planning and execution is ideal.
However, it sure is convenient to be able to separately plan the
auxiliary query as a subquery, and not have to specially fish it out
of the subplan list later. Maybe we should add a mechanism that
essentially generates an equivalent, single ModifyTable plan. Or maybe
that would be adding a lot of code for no tangible benefit. I don't
see much point in making one ModifyTable node pull up from the other
for the benefit of this feature (which is another thing entirely to
having there be a single ModifyTable plan). For now, I'm glad to have
something that will allow us to drive discussion of the feature to the
next level. I don't have a good enough understanding of the optimizer
to be able to say with confidence what we should do, or to be able to
see the big picture of making any particular trade-off. It's not an
immediate concern, though.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 4, 2014 at 11:55 AM, Peter Geoghegan <pg@heroku.com> wrote:
It's not an immediate concern, though.
My immediate concern is to get some level of buy-in about how
everything fits together at a high level. Separately, as discussed in
my opening mail, there is the question of how value locking should
ultimately be implemented. These are two orthogonal questions, or are
pretty close to orthogonal. That helps. It also helps that people have
stopped being confused by the term "value locking" (I think).
I'm tempted to believe that the silence on the question of how things
fit together (such as the lack of discussion of my pgCon talk's
characterization of a "pick any 2" trade-off) means that that's
because everyone agrees with that. That seems pretty naive, though,
because a lot of the issues are very subtle. I think that various
interested people, including Robert and Andres have yet to make their
minds up on that. I'm not sure what Tom thinks of it.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan <pg@heroku.com> wrote:
Omission
=======The patch currently lacks a way of referencing datums rejected for
insertion when updating.
Attached revision of the patch set (which I'll call v1.2) adds this
capability in a separate commit. It now becomes possible to add a
CONFLICTING expression within the ON CONFLICT UPDATE targetlist or
predicate. Example use:
"""
postgres=# CREATE TABLE upsert(key int4 PRIMARY KEY, val text);
CREATE TABLE
postgres=# INSERT INTO upsert VALUES(1, 'Giraffe');
INSERT 0 1
postgres=# SELECT * FROM upsert;
key | val
-----+---------
1 | Giraffe
(1 row)
postgres=# INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON
CONFLICT UPDATE SET val = CONFLICTING(val);
INSERT 0 1
postgres=# SELECT * FROM upsert;
key | val
-----+------
1 | Bear
2 | Lion
(2 rows)
"""
Note that the effects of BEFORE INSERT triggers are carried here,
which I slightly favor over the alternative of not having it work that
way.
I've also expanded upon my explanation for the structure of the query
tree and plan within (revised/rebased versions of) earlier commits. I
am clearer on why there is a special subquery planning step for the
auxiliary UPDATE, rather than making the UPDATE directly accessible as
a subquery within the post-parse-analysis query tree. Basically, the
optimizer has no basis for understanding that a DML sublink isn't
optimizable. It'll try to pull-up the subquery and so on, which of
course does not and cannot work. Whereas treating it as an
independently planned subquery of the top-level query, kind of like a
data-modifying CTE makes sense (with such CTEs, the executor is
prepared for the possibility that not all rows will be pulled up - so
there too, the executor drives execution more directly than makes
sense when not dealing with DML: it finishes off the data-modifying
CTE's DML for any still-unconsumed tuples, within
ExecPostprocessPlan()).
It's certainly possible that a more unified representation makes sense
(i.e. one ModifyTable plan, likely still having seperate INSERT/UPDATE
representations at earlier stages of query processing), but that would
require serious refactoring of the representation of ModifyTable
operations -- just for example, consider the need for a
unified-though-separate targetlist, one for the INSERT part, the other
for the UPDATE part. For now, I continue to find it very convenient to
represent the UPDATE as a selectively executed, auxiliary, distinct
ModifyTable plan, rather than adding a subquery rangetable directly
during parse analysis.
There is another significant change. In this revision, I am at least
"honest" about the plan representation within EXPLAIN:
"""
postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2,
'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Insert on upsert (cost=0.00..0.03 rows=2 width=36) (actual
time=0.115..0.115 rows=0 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=36)
(actual time=0.003..0.005 rows=2 loops=1)
-> Conflict Update on upsert (cost=0.00..22.30 rows=1230
width=36) (actual time=0.042..0.051 rows=0 loops=1)
-> Seq Scan on upsert (cost=0.00..22.30 rows=1230 width=36)
(never executed)
Planning time: 0.065 ms
Execution time: 0.158 ms
(6 rows)
postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2,
'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val) where key = 2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Insert on upsert (cost=0.00..0.03 rows=2 width=36) (actual
time=0.075..0.075 rows=0 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=36)
(actual time=0.001..0.002 rows=2 loops=1)
-> Conflict Update on upsert (cost=4.16..8.17 rows=1 width=36)
(actual time=0.012..0.026 rows=0 loops=1)
-> Bitmap Heap Scan on upsert (cost=4.16..8.17 rows=1
width=36) (never executed)
Recheck Cond: (key = 2)
-> Bitmap Index Scan on upsert_pkey (cost=0.00..4.16
rows=1 width=0) (never executed)
Index Cond: (key = 2)
Planning time: 0.090 ms
Execution time: 0.125 ms
(9 rows)
"""
The second query gets a bitmap scan because plain index scans have
been disabled for the UPDATE (a temporary kludge), since index-only
scans can break things - IndexOnlyRecheck() throws an error. Not quite
sure why the optimizer doesn't care about resjunk for the UPDATE,
which is presumably why in general regular updates never use
index-only scans. Since I think the actual auxiliary plan generation
needs work, so as to not have uselessly complicated plans, I didn't
try too hard to figure that out. This plan structure is not
acceptable, of course, but maybe almost the same thing would be
acceptable if the auxiliary plan shown here wasn't unnecessarily
complex - if we forced a simple pseudo-scan placeholder, without
wasting optimizer cycles, somewhat in the style of WHERE CURRENT OF.
This is something discussed in newly expanded comments within
planner.c. I would have made the optimizer produce a suitably simple
plan myself, but I don't have a good enough understanding of it to
figure out how (at least in a reasonable amount of time). Pointers on
how this might be accomplished are very welcome.
With this addition, the feature is functionally complete. That just
leaves the small matter of how it has been implemented. :-)
This is still clearly a work in progress implementation, with design
trade-offs that are very much in need of fairly high level discussion.
--
Peter Geoghegan
Attachments:
0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch.gzapplication/x-gzip; name=0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch.gzDownload
�
!"T 0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch �[{w�F��>Eg�9��_��;I<6�aw�����gw�H-PFHD-��L���UuKH l����yHU�������y��<�w���/w��<�L��z^_����39��r�w�I%.�P�����E�����n������[q��*���T����g*�>�'�_?��z+~QnK���I:ro_��v{o�vE���v����W�$o�?�OF�������m�XDq"�������t:��R�^]����w�g'���"�.�n�����E�m�������:�L�2�-�.> @$j��D�4�E�I$�B9��TOf2~(�����"O(���S�O�A��t(-&����*N�[wq�wdB!��k���LV�s����R%�Xh���o]w�P'��C�������-%dW=���@%� �J,d2s�FD"?��#�h���9��F�de����$����IL�l"q�-bv����YH�9>�"H�����@���4��J�A��"�@w���?~v�\��f$���h�!���9X"�48�'~&w�t������x.�U������1�<���L��mcLkg������I��x��S'IA����
�W�������Q���
�����H��?�%$yQ�3%]�E�@f^c� �e���bK�e�P�Z�S�`����T��������a���q�� T�_*U��R�L��A��%��"�~��0���l�����s�
��E���j
����z_��bU���X��gX���"�3)+P9��"����tQiM���U�
\�|5�/)�L����z��zR�6�h��8-k24� �]2�)�A�)�����z^O�Na��a(<��>��/�'� ��fpr��xq1�g�AXA�
H'���O�N��+�N(����H��X*��������h���jq������\�Y�@����dLEt��t�7���^H�#0��&fy� o�JW�y��U����Jc:�{2��>�iU�~� ���F,�i�����Pb�~�)cvK}~��q|�����E��!f>�r~�=���Jh����� �m\�U$� ���v�~�L6Z�$ ������������<�U ��G����n�p��dx��hag
��K�\�R�$Yf.����i�G�;4M�F��8��Kn�Mvg$P�����mO��������+B{��b���������M1��m�G���n#=�"�����ul:g���|�a���LLh��
U����1/����E�(����l9�=����s����9�*O'L�Bcg�1����0^���7���0��WLW-��$0A�PeOn���"�U'Q�Z`7�)|P��G������C ��{��%��;�%9K�*$G�t�E\Z�<lr����3���X��`.���bI� ���3�w���n�������`AW�/9+�"���E6J�W�~K����8��K���}�c�9��3���S�[i�}�4��U�`y6rp�:��L"����
`���Zuf"d�!��q�,������2������mS��Y/YM�f`���7W�6�����������n������;
�]����� �)��Y>-�6�eg�h��n��$ho��������K��K���b���d
{�C���/B< D;;�����-�;������`��]�����)&rC��i���g����_PL�^���$���#��/��g��^����c����5V��� ��9��-���J�p�*d��[�4+�����d
�Z����3��o�T�~�����f��E5*a����C���s��/6j6'F���L�ur�So�0KM����;\��B�o���S1�����_DMl�$�������7!"���if�U���>;G�6�|9��"R����O�hM?L,_;O�N���V0MV���{��gQ�B���z�;zf����`ekb�-#b Fh��H��~_J9���zh������
��lD���]D+"�~wM�%�X��v������OT���&�PKo��R�;�*�I�Y6����|*�\����
!��F���+q�ZQ�LZ�3��X�7K���aR�^�[2��H�Kj9�g�6��mrM��T�d{�~6b�6�7 �7�e���
��[^�3Lk��(����u�T��*:!��E��k����I�%@���
CZ�b�zF�q����R]��L�9�����%��n�����Me����:�WO��S?r�k����b�M}�?>����t:�����'�����SN��\�#�N~�Q�w{�G-��u��5�#���J��b���}�����L�%�����Y5tC��T��2D���+��w9��q�@�K���jf��F�+�^o��O�`l�N�����x�����;eK���]�!��o����D��ms5q�V�L���D�K����*Ic*��^�4E���'U�b!Q��6uAXm���p��@Y�v����������T�Q?Cg����$����d���l�2(j��G����I��&��^$��(JZ�I�� �U��x'.���[���^�c�$��!��P�6v&�y'n������[
�}jn477W7-�R�5T;P(]<�:��'����o����nF�����]���6����N��gg[W����OQXo�M�k����6�� I����������e�X�O^�6!U���"����}�m�&^$b�E0���^����m�/d�
���tL�K!�(S��A��Z#�Ws���$���?N.aMl���R�[��h$������>�CG>�Ww������x�ja~������_m1���TY4O��9���o�$2��2�`?C*sq���D%53C6^����sK��Y^m�kh[��4[u���x�
�R�8����^a��I�E2����J��s�(�|���@�i�]@��`�����-��F� ��4�| �=G�1���X������l�/�PwU1��6?�+'����/��|64�Hbj���0�[e�h����>&3a�>7�:C�� �u�|��:d1�r�����Em�:��$���:���� ���/��!�{H��u1����1^�9��ss*��-��|�� ��0��:�(�h�T����0�$6��{W�MJ����k���8�Ax�z��[�D@�B�f�+��/�>%����7���($����������������w���k ��d���6�3AM��yk+����X�]{Cey(A�_hS��c#E���B�\Z;m]<�Q���;�S���>?9��k�e2�R�dv@��>Y�Sg)1����C��u�P��'�y���~��y��W-���������cTXL\,�0��=����D����� ��^��GBf>q�����WcL<�7nl�
-qF���)3/����<R�1-P��]�G�+�|�Q����i��������������r���`|�q ����V�-_��2�Z��������p��xEB��Q)� ���=�;>�:�P������U2��
�Z=�x���.T��B�!��r��Z�� e����c�L��c�Xx|����&��=m���S ��Y���(�I1�D����*F���8�\����D�2�q���Y&���oL����CB �8.�C26����7�9MBa�49$�"?���<���W��y'.�� r��eH�M��D��g���O%�lL�^+`�F^�C�mu������dd�
����z{le�M��N5��QLx���!�ap!�'T����`��*J���hR��_��(������h����[�l7��<��]�������P��������hd�
��v�J������%S��*�o!���Y2Y|o"���q��c�����)�u8A�U)���(��_��d��"������U~q:�t7���q��X7�n��� 8�����g{��7�2�g��d��s�~D�D�Z�`2�U��t�N�7n���S>�GGR0O��!��lI��#S���t����)-D�/���$����N����O���1%�X�c�(�3>�����N-2M��d3�g�������2�4���
�J'�Ln�b�}���RSK����0X�o
f���u��5���!7^��}�-f��a6�����~��y>���b��E6n����U�� ���<���:���rN�))�]�����9o/�t,�%�����L9����V�>�wjOf�
�����t@�F��5y�_����<�}��k�,sq��o���ew���6�l�k�i��XrM�Q$�YK�l������
^x���^ 6��1�������
����-��e��>��$�����h��&�����n�{D�����cj?�A����|����_�X�"aGg������I�q�c���l���9�u��`������ak4�Z-;��M[����W_J��<���6(_�����=J#�w��V�{�:��
rT^x�(9�5��p#>,�'0��s��w�������>�������8�"��}@��<�9����%���s@�|d��vt&{�9 f�����6���0�!y�c����|r�L�e��5��U���`�a��P����[����^D��'~`�
�R�,���TUhN������f��b��o��5���m|��-�H�B�8�zP�*EV����L�a�8?AnCiX������h_?5�i���p^�0����p�t(�t�mv�v��
g���.�a�L�*��f$c��JkHm�T��c�����P
dV����D�?@/;��l�Fa2���,&u����Ft�T� j��|��%��i�B�0jk�[�����m`g��T�E�4q�(�;a�{���L����yg��G7�|8^fMA2��Xvye���,C�x=h^;�,�t��u��9��~��������Sf�wE+[>��)�j��� y��~�e�i{��� #��d�����/d�j�t\�Z'���#:��2���A�����>�=�y�Z2��M R����Pa�Mc������=�h�#���[�?����N@~�:��Y!/�23��pb��(L��q)�r������� ���"�����G�]����r ���G�p0E[6?b��d>G�@|V���A��y������5g�iE���{!P��2���V�5�4]i$g�:�cY[�>����H�<���0�"v~/E�ZFj"�Z�Fz��LyI����4��-?c�%����RW��-C�|JG��0��kHe�����?�G�L}�����&~G�Z �$:o�����|jK^�4���k�du�4����2�#����l4�N1�6J~�K����pZ�����]8W�7�O�B6\A�yK����!9� TA��@�����=s���4*�d�
������F���m��s��S�jB��}��4���u��U�(J)�j�T^V��~�7�^l�W�������em�c���v:��~�;x�M_Aic���������
��i�\���O�9 ��'�x��C�@bV�f,@0I��G�\�m[*���2
�2f���F����UT5B?YLWxx�^������=��.��EN�sL;�9�{2��t>�7&��#�FM9z_�hWI�<Ih
4u��h��<4�`������12�0iK��b>g�E~�yZ%���d��36E��"������:��XR�U��~<�:p)�\��<+�.��N���E��K2�2B��d�G���C�}�m�����{/�4�������$K�,��l��%l!#����C�lk
�.{��DJ���R��L1�u~��w]��!\�g�9�z�@fddd�����q�0��4��/��k�t��
�(p �ht�zq�"�U�!���2�/8j�bn%����|��-���l���+%�W$�yJR#��r8O$v����~����'H��
:�H��sn�b�H����n����I.�\�J��W�Cp<� �
��Q��ps�:�<x�2^Nr�.P��E����6]OX �(�����������\�&2�y�w"q'���H��1�"��2.�,��2&�Q���Z�i�)B�G��^>�������1s�����NH�������B���H\F���~�� �I�e�O!�P}�U�BC$��+����[�D�@��y�_�A�G��
G�t%�}I�)M�H"<y_]��<��� ���q�{N@��EL��)���n��
�s��a���Q�&%���tT<>o`,�8�oy��)Z�q2N������&t����_4N��:���W|��zvv�Y?�i]7�1B4��s!?n�tw�j��E�.�= N���>!������2���"�"��go �����3g4��y2�80V�b�rFP4!,fH9V���j}u��"����l�����v�����3�s�����rG����^��Y�r�G�f��z��i2���B��FD�t���@�z�qNhf������L��0L<��o�|��B��`�����������Q_u��}���>/�d�$���N#{��3�������[1��(*�\��"���o&��Gf)=~�%S@�����\x��'�����A�
�I6t��������}$�X=�S��O*��Z ���`;,��XLx���DQqC��/I����O����r���6Xl�r���}U����s��KB��(1/9���U����J���$����1o&_�LFt�M�d���Zq���f.�yJ?�%���y�/�����b���V����<V �(��$����B������!�tz����p�|1�6�)+����M��pAo{�-u��M�f(�}N�k���&����7��S�c����`h� �r���~;Rq�� *<'8 ��y���,&�q�Y��N�'T��ip�+.DZ�
ZL�;3Ra��;&�j��X��7����_w��\���6�L�����d ������}N�������6�������G�F�Y�V��g:f����z��M[-�����dB2OJ�����j�m}9T�m>���q���E^`6e�����f]�����G]������G�'G 0
8N&L
��3TH�0k����3�XG�=8� w�:�W,g,���F���}�*���c��-!�d�J�k{$@4"��R�6��'�u!%.�i���3����A�DX����~�y"�1�0��h���jO�i�g�Y�YJf�1Sfl,��
86��BG���q�nw/���x�=;�hu5�Z���]<Q��m���Z� ?n���[�P�f�0���,��;xN���BFs:P�G�����u�F��2�0��,C���>��L���\����t��z?�]���t�����u��4 �9�) d����L�$�`��$P�
}F����y��
�H���(�}�Ed��LxP��1a*B����@���o �����X6-�[$8{�D:qz#=�*�K�@b�|ZX3c[O�S�Y+�k������k�A�5��\d���!�s�, ��O[2�Q��@z�[n�Y��'
��P:���*8��R<�x�jf]�i��i���:H^D�EL6xC�����V�hw++*�5v�0�x,�JH<�I<��X�C����<���H*5��
3�4��D�� �Fh�������W��m����1P�����������H5l��~���9
�
3n�-�CVy`�g���/��E- z�@���'�_������ ��]�9a_k� Z�L��!����`cr';���������2yY��Ab��j7xs��U�B b���9�Q���R@�J��3����8Kn������_��ibG������Q���1K��i�}�a s� �duuE� ��y�D���v�J2�@L�Lnr|�&�Gw ��v����LS4Crc�r �@r��>�S"����M����'�{[8��$��K;%�^��^nB��9)gV��<e�sY�W<�V�j���|�: ��y'X������]N+y3T e=��'8��*&��{I�h������D�~.�oA�Y�n��nr�|jf��M~�J�R%BL
!P5d��B6�M+�h�O��p6���R�
��X���f>M���,������n���}b�U$��N
��'<��+G���EV�|�@�{���,�V�]��U�]�3J$,���jy}��,h�
g����7�������0l����Oy�?��eV
��a�[���8e���m����Fvv�������h���'�I���\����^��o��pi0 M��IE����w~�p7`���� O~�8i���H�0?6�<�����J'b\�]�'K��h
��4j�����~2��K�_9�4��O����g����P�����?�����^l���$_��7�:sR8�Ig�:c��u����|�8P��Ue/�9������R��x��n�Qe>�Df�tB��1d]��j���G @���Knm�];fW���G�m���_;��/
�U���!�{���?3�Vh����mN�ckAhiO��������K($S�������A0�V��R��u_UX�TH��J}�E����#Dld28��G]�>�,�d^w�^�l1$�&���"����S����+�] #�bbEb��sE(L����E�pz7�0I���7{z��0�������0{��xa�=� �d%u0�K��`@�9F\nv�����bv������ �~-������o�E�X�-3�S�s�.���tb��v��\����9��A^�W�*��lY��E��xeg��k�@�z�3���Gc��Q��[���aGzL��O,���1�!_A�m�.�y���3����.�t���9�����a����B��n4D��y�����{�W�z �I�����
9t��`�*5<�y�������.�)8(���`G�����pUr�{E��B%V��cK�df�����5-�PE�&�P�;=D~���u.~Q��C[z*��6�V��s���n:�x�w��(G�1�hF�[��AY�m�*�?����Ug��E����F�� �>}��fF��!"Xk��}d����Q���.�P����1����~c.|qb��cvc)��u;`s�@��~��3����Tx�}
&O�B'}���y�X_%�S�t(`��<��f��n?����Ji�����K]������JB~��<M�
%�H8�BO�t���6J�"nF6}���z���%��b?���-8����0�������V�]�bl!,t��1;."�� �Y,/�C���n~x>��A? �"��@������s�����{5���}��U�E��D����{ C?$3F�����!Q��\s�$�A���>� ��2
m�������rep�����st�����=Q��������]��\]��������J��\��U�bgt�/&��o���d��Ya�D/Ije�V0�P�PC#{�7
���4�F��D�y� ����S%��]"�;�ZL=���4B�o67��=�;�;�c���^���^���l������"�����;Xq]��%�^���$��%��@+��D1R�4)?�$z��|��2�(��C�e=U�Q] n\r�j���k���&�g`���NV��"���.���m� Ln���t\�x��LTk�I9�����<�����^���#�/?������R(�B����p��N��M~�.8��X��Sa KE�^".L&�/��UR�9g�o07��M�9p��gV9�u���5*^�?x�,� z���A~������r��(�a'�����p������xV�wq�D)>����Pd�\p������#E|�V��d.�F����� +���-j ��9���8��h .��8;_W�0���7+c
]s�v>��Ep�*����Q���E���{^^�V����5/��7�M�!�`�R� L/y<H����5�����p�o���{_qxR�r��f�$`=��&B<�jYRK����&�)!�5������C_������P��=�Dv% eT�����j|���@@�A0
)J����}��k8�(��Rwd�q��M�e�D����oXg��8j:������&^�X��Ii���M�=��T1.qt���c�m���K����-`�����q�I��d��p��!��J��Qu��+��) .v�XB1���q�
�@�8�bTy/|v����ihhaR�m��Pk���:.Ja����pK�����h;�m�����`��aEu����d�^J
���
.w�-D���piT�PC��l�l5����;������O/9;�n����/\�Q��SI�\z��B�3�I�2�-��
�2�w�:~��}�9���q�����H�{��f<��@�7�����p-����#-��"��1���aN!Ss?� �/q����L��H}��d<F��=���q_$���E��Eq�(uS�
�t�s���"��3����6��Pw6�%��5��x�FH�q(/���68�:';8����^����Z��z�VK*bL�g��O��SlU�J���jR{�(�dz�d��Jo����P���P���Y��*{��ho�S�qx=�]%����]�R����n���c ���X9����|�k�8v��E�
�`g&��5mU�t��i�q~���KR�Xp��{�]��>#�y������JP�A�#�gsZ�`�4�'��;��5�>��0J��_�^���U���G{T��'��(���;��5Hdk�x[�����?�3Bm��v��H��p�����6"�K�0�L�f�f������i�E�����s�L�mJ1v����\���<���F2AtA�����(>�r������E����~�3)�"zh�X�u��7Y,�]qy���auK����@Hj �����em�=��hL]�X��b�%4,-������l����$��
����Q�+��������}��<�?�,��h��0�;����>���-8���&�f��eb�v���>^n:�9��3N�=�t[vP5�T��0��P��~�����^c��y1v'sl���d$2���������h���Yu�tG�n�L���hu0�!��
�����I��E�+�wY�����������s!� ��D"��C�h1�1��3�='�F�=R��d��=`��n�K?Q*�*��U����)�*�4\����������_0��}��a��}���7�STK��������; �$�����McI���;8y���1�����-?B�9s�d9�P�_��S ����V� ��vY$dcM�B��#k����9">�0W�b"A���]���0BR�$�_^2M���k���(�\��l��/+:�c�I�:��X�D�I�����$��Wr,AIp� ���sff�qZNO�AI?���
��'��FQ����(6�3��� 86
"za��(2�Fysf���Y���(�3,JE\f�j�U�>�� �*����������H �#A�B; �,�� ��j��r��$�F'�����y-3�4��c��TT|��D��U'�k���`���(�X��Tl�Kg$t �����.�[m� n�������}�:'�Pl�����a+����) �K���~��V���,:H�:Ynn� ��U��������
��x
��B'����!��|2�9������~N2��i���.X��� �h�i���]/�� �"IYQ��Y%
�/��t
i��������F4r _�Z��aA�"���5���j�,39e*�Z��/��>_��c��e�q���lA'd��\�BX�����Y6\�Y���y���K����]��(5S�����
�����a��9^��U��9�*��!�Qya�SAX+GB�c��j��^����8 b��~�p(�����+P[���^BA�6��&�g6�V3B�vEbE��(t�H��[g���3yB���"(Hf������E}�h8CnV?(��+���� ��Dz�V���%�-�v�-���TcV�3�����N2;�l�
H�$�j?�N(pdI64��@��}�
�8���Dz�^�Z/I��#�#�+b�IR~[_t+���N4�.r� \�f�O�0u[#��9"�]O��F /�� �o��d�[X/�o�v
�DV�p3rC
f��0j����p0����v���RYhq��F=���|/xT�l�2�k��e�J�p@TQ���Y��W=���Q�Ie��M��?j�=��k�@��5���h��6���.�1���FV0��kvq<s����9�
�����{e�'� �'��k�e2ML %d$����� �?�#^&������~_PMwT����������t`�w���B� �`>�!�.�Q���x5y�J�4Q[[<t�eZ�A,D
�L0X'! B���0�j�����l6�1O��5�\/C;K8�CV#��,A�uN���Q�����y�4���1c�)�������Lb/`����[@�S�j�Su)�R��<��W�s���������F�5:u�{�i����S�� z��P/�_�Q���m)$hp}6^�����e����
�_j�X�?<��
A�E��v��L_yb����}N����f`(`�+��b�U��^���d@�Fn$h M��G�����s�5L�\:��T4~I�0�p\����0�{��cw��%wE!Z|s� �Y6pD�u�sy�����������4��|T����o����1�?���<o��y>������Rw��}-sV:i��f�M����e���x3�O��p����I�������3�#!2��lm�E�v���7�-;]4%�KK��B�y���Rq��|��l~~��{Mk��j��r���ger�z�58�h�v������{�V��������d� ��>�tf��(�%�x6L3��1G�H}�*$�s�S_����Q����st|~v����}v}�K�>$�p��|6���`�qi������W�D6��.��Y� �!�`�H5�����J��?�n�_2O�C�5����=��
KS ���SQ���������� X�CHa����eGw?a_�64rX����#���Z[LFp��P���8wE�=��hx���'��)%�}8�����@�#��y�_��J�FDS���&���_wN����0A����
�>{ E��i����4<����^�u��7&e��-��B�xpbnXFG�Z�R�����b��$�f��U2H���wt<&�F������z��f�Y����@~&�
N�a����:�>������ay��"s�>6�>�3�F�[�1�)f�����-�yk
�WAyDu�F�M,)�}�Q��;����x1��#��H���[l�:$��E����T!�3D\C�I�Y2��J}c�5�]u�,�Eb,@DH���O�K)
���?�*f�;��o������4�A��N5����6�%?b%O��.�,��#�Xj%��RYE���g��K���`���*���k9�7sh�`=���U�d%#����.�J&����������
�����%��z��'