BUG #17812: LOCK TABLE IN ACCESS EXCLUSIVE MODE with a view returns an empty tuple set
The following bug has been logged on the website:
Bug reference: 17812
Logged by: Zhenglong Li
Email address: smartkeyerror@gmail.com
PostgreSQL version: 15.2
Operating system: Ubuntu 20.04
Description:
Step to reproduce the behavior
We need 2 sessions to reproduce this bug.
Firstly, we create a simple view that just fetches all the data from
table1.
And then we start a transaction and lock table1 with AccessExclusive Mode in
Read Committed Transaction Isolation Level.
After that, we try to use CTAS to create a temp table table2 using the data
from table1 in session2, and this will be blocked by AccessExclusive Lock.
Finally, we insert some data into table1 in session1 and commit it, session2
will continue, but there is no data in table2.
```sql
--------[ Sessions 1 ]--------
show transaction_isolation;
transaction_isolation
-----------------------
read committed
DROP TABLE IF EXISTS table1 CASCADE;
CREATE TABLE table1 (
zahl integer,
upd_dat timestamp without time zone
);
CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1;
BEGIN;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;
TRUNCATE TABLE table1;
--------[ Sessions 2 ]--------
show transaction_isolation;
transaction_isolation
-----------------------
read committed
DROP TABLE IF EXISTS table2 CASCADE;
CREATE TEMP TABLE table2 AS select zahl,upd_dat from view1;
-- this will hang now waiting for a lock form session1
--------[ Sessions 1 ]--------
SELECT clock_timestamp();
INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM
generate_series(1,10) a(i);
COMMIT;
--------[ Sessions 2 ]--------
SELECT * FROM table2 limit 10;
zahl | upd_dat
------+---------
(0 rows)
```
But when we used a table, there will have data:
```sql
--------[ Sessions 1 ]--------
DROP TABLE IF EXISTS table1 CASCADE;
CREATE TABLE table1 (
zahl integer,
upd_dat timestamp without time zone
);
CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1;
BEGIN;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;
TRUNCATE TABLE table1;
--------[ Sessions 2 ]--------
DROP TABLE IF EXISTS table2 CASCADE;
CREATE TEMP TABLE table2 AS select zahl,upd_dat from table1;
-- this will hang now waiting for a lock form session1
--------[ Sessions 1 ]--------
SELECT clock_timestamp();
INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM
generate_series(1,10) a(i);
COMMIT;
--------[ Sessions 2 ]--------
SELECT * FROM table2 limit 10;
zahl | upd_dat
------+----------------------------
1 | 2023-02-28 03:10:15.743068
2 | 2023-02-28 03:10:15.743196
3 | 2023-02-28 03:10:15.743203
4 | 2023-02-28 03:10:15.743206
5 | 2023-02-28 03:10:15.743208
6 | 2023-02-28 03:10:15.74321
7 | 2023-02-28 03:10:15.743212
8 | 2023-02-28 03:10:15.743214
9 | 2023-02-28 03:10:15.743216
10 | 2023-02-28 03:10:15.743218
(10 rows)
```
Hi,
On Tue, Feb 28, 2023 at 03:12:38AM +0000, PG Bug reporting form wrote:
The following bug has been logged on the website:
Bug reference: 17812
Logged by: Zhenglong Li
Email address: smartkeyerror@gmail.com
PostgreSQL version: 15.2
Operating system: Ubuntu 20.04
Description:Step to reproduce the behavior
We need 2 sessions to reproduce this bug.
Firstly, we create a simple view that just fetches all the data from
table1.And then we start a transaction and lock table1 with AccessExclusive Mode in
Read Committed Transaction Isolation Level.After that, we try to use CTAS to create a temp table table2 using the data
from table1 in session2, and this will be blocked by AccessExclusive Lock.Finally, we insert some data into table1 in session1 and commit it, session2
will continue, but there is no data in table2.```sql
[...]
TRUNCATE TABLE table1;
[...]
This is not a bug, this is a documented corner case. In postgres TRUNCATE is
not fully MVCC, see
https://www.postgresql.org/docs/current/sql-truncate.html and
https://www.postgresql.org/docs/current/mvcc-caveats.html:
"TRUNCATE is not MVCC-safe. After truncation, the table will appear empty to
concurrent transactions, if they are using a snapshot taken before the
truncation occurred".
I guess the difference between referencing the table rather than the view is
that the query get stuck at execution time rather than planning time, meaning
that you do get a snapshot older than the INSERT in the session 1. Change the
TRUNCATE with a DELETE and you will get the same behavior for both cases.
Import Notes
Reply to msg id not found: 17812-206eaec1d133a94a@postgresql.org17812-206eaec1d133a94a@postgresql.org
But when I removed the TRUNCATE STATEMENT in session1, session2 will still
not have data:
```sql
--------[ Sessions1 ]--------
DROP TABLE IF EXISTS table1 CASCADE;
CREATE TABLE table1 (
zahl integer,
upd_dat timestamp without time zone
);
CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1;
BEGIN;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;
--------[ Sessions 2 ]--------
DROP TABLE IF EXISTS table2 CASCADE;
CREATE TEMP TABLE table2 AS select zahl,upd_dat from view1;
-- this will hang now waiting for a lock form session1
--------[ Sessions 1 ]--------
INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM
generate_series(1,10) a(i);
COMMIT;
--------[ Sessions 2 ]--------
SELECT * FROM table2 limit 10;
zahl | upd_dat
------+---------
(0 rows)
```
So I think we can not blame the TRUNCATE is not fully MVCC.
Julien Rouhaud <rjuju123@gmail.com> 于2023年2月28日周二 21:52写道:
Show quoted text
Hi,
On Tue, Feb 28, 2023 at 03:12:38AM +0000, PG Bug reporting form wrote:
The following bug has been logged on the website:
Bug reference: 17812
Logged by: Zhenglong Li
Email address: smartkeyerror@gmail.com
PostgreSQL version: 15.2
Operating system: Ubuntu 20.04
Description:Step to reproduce the behavior
We need 2 sessions to reproduce this bug.
Firstly, we create a simple view that just fetches all the data from
table1.And then we start a transaction and lock table1 with AccessExclusive
Mode in
Read Committed Transaction Isolation Level.
After that, we try to use CTAS to create a temp table table2 using the
data
from table1 in session2, and this will be blocked by AccessExclusive
Lock.
Finally, we insert some data into table1 in session1 and commit it,
session2
will continue, but there is no data in table2.
```sql
[...]
TRUNCATE TABLE table1;
[...]This is not a bug, this is a documented corner case. In postgres TRUNCATE
is
not fully MVCC, see
https://www.postgresql.org/docs/current/sql-truncate.html and
https://www.postgresql.org/docs/current/mvcc-caveats.html:"TRUNCATE is not MVCC-safe. After truncation, the table will appear empty
to
concurrent transactions, if they are using a snapshot taken before the
truncation occurred".I guess the difference between referencing the table rather than the view
is
that the query get stuck at execution time rather than planning time,
meaning
that you do get a snapshot older than the INSERT in the session 1. Change
the
TRUNCATE with a DELETE and you will get the same behavior for both cases.
On Wed, Mar 01, 2023 at 10:01:29AM +0800, Keyerror Smart wrote:
But when I removed the TRUNCATE STATEMENT in session1, session2 will still
not have data:```sql
--------[ Sessions1 ]--------
DROP TABLE IF EXISTS table1 CASCADE;CREATE TABLE table1 (
zahl integer,
upd_dat timestamp without time zone
);CREATE OR REPLACE VIEW view1 as select zahl,upd_dat from table1;
BEGIN;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;--------[ Sessions 2 ]--------
DROP TABLE IF EXISTS table2 CASCADE;CREATE TEMP TABLE table2 AS select zahl,upd_dat from view1;
-- this will hang now waiting for a lock form session1--------[ Sessions 1 ]--------
INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM
generate_series(1,10) a(i);
COMMIT;--------[ Sessions 2 ]--------
SELECT * FROM table2 limit 10;
zahl | upd_dat
------+---------
(0 rows)
```So I think we can not blame the TRUNCATE is not fully MVCC.
Right, I was testing with a plain SELECT instead of CREATE TABLE AS.
This is however again due to when the lock is actually conflicting, leading to
different snapshot acquisition time.
For CTAS referencing the view, the lock is conflicting during execution, so it
has a snapshot that sees the data as they were before that transaction, as
required by ACID rules.
The CTAS referencing the table conflicts during parse analysis, so when the
lock is finally acquired and the parse analysis is done, it then acquires a new
snapshot for execution, which now sees the transaction as committed and thus
the newly added rows.
I still don't think this is a bug, however I'm not sure if/where those details
are documented.
OK, I get it. Thanks for your explanation.
On Tue, Feb 28, 2023 at 8:02 PM Keyerror Smart <smartkeyerror@gmail.com>
wrote:
OK, I get it. Thanks for your explanation.
You may find the following to be informative as well.
https://github.com/postgres/postgres/commit/532994299e2ff208a58376134fab75f5ae471e41
/messages/by-id/5075D8DF.6050500@fuzzy.cz
In short this seems to fall into "maybe we can do better in the future" -
though it seems that future has not yet come to fruition.
To my knowledge this dynamic isn't called out anywhere though it seems
almost obvious once pointed out.
There still seems to be something off here:
unfortunately it causes execution to use a snapshot that has been
acquired before locking any of the tables mentioned in the query
I went down the path that the planner should be the one realizing that the
view-using query has a lock it cannot grab - on the rewritten query's
introduction of the new table - and it should have blocked in the planner,
resulting in the acquisition of the execution-time snapshot in that case as
well.
Apparently the planner gets to bypass a check (that blocks) that parse
analysis has to perform. Thus locking of tables does happen for the ones
named in the query text itself?
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
You may find the following to be informative as well.
A general comment on these examples is that we don't intend MVCC
guarantees to hold with respect to tables that are created/dropped
by other sessions midway through your transaction. The reason can
be understood from this example:
Session 1:
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM table1; -- establishes session1's snapshot
Session 2:
DROP TABLE table2;
CREATE TABLE table2 ...
Session 1:
SELECT * FROM table2;
Strict MVCC semantics would require that session 1 now see
the old contents of table2, which implies basically that we can't
ever drop tables --- at the point of the DROP, there was absolutely
nothing indicating that session 1 would take any interest in table2,
but after the fact it wants to know about it. That's not how it
works; if DROP is able to obtain exclusive lock on table2, that
table is toast, immediately.
(I think that if you use SERIALIZABLE level, you might get a serialization
failure from the "SELECT * FROM table2", or if you don't then perhaps
it'd be reasonable to fix it so you do. But under no circumstances
are we going to sit on the data in table2 on the off chance that
some transaction might want it later.)
Like Julien, I'm not entirely sure how well this is documented.
There is plenty of detail in [1]https://www.postgresql.org/docs/current/mvcc.html though.
regards, tom lane
On Tue, Feb 28, 2023 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
You may find the following to be informative as well.
A general comment on these examples is that we don't intend MVCC
guarantees to hold with respect to tables that are created/dropped
by other sessions midway through your transaction.
That isn't what is in question here though...but also my comment regarding
planning seems incomplete...
table1 and view1 already exist, table1 is empty
view1 is select * from table1
Given that Session 1 does:
BEGIN;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;
-- sessions 2-5 now issue their commands
INSERT INTO table1 SELECT i zahl,clock_timestamp() upd_dat FROM
generate_series(1,10) a(i);
COMMIT;
The command and result from Sessions 2-5 are below:
Session 2:
select * from view1;
10 Rows
Session 3:
select * from table1;
10 Rows
Session 4:
create table tbl2 as select * from view1;
SELECT 0
Session 5:
create table tbl3 as select * from table1;
SELECT 10
Why is it OK for session 4 to be different here?
The argument that it got blocked after getting an execution snapshot that
doesn't include the insertion into table1, when all the other cases were
apparently blocked somewhere before then, doesn't sit well. In particular,
the difference between it (Session 4) and Session 2. That one sent the
result rows to a newly created temporary table and the other to the client
doesn't seem like it should be affecting/affected-by MVCC.
It is unclear to me whether you were instead talking about other sessions
dropping tables as another way of saying "ACCESS EXCLUSIVE" in which case
at what lock level should this anomaly go away, and does it? (I haven't
checked).
David J.
Sessions 2 and 4:
postgres=# select * from view1;
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:33:28.628112
2 | 2023-03-01 04:33:28.628227
3 | 2023-03-01 04:33:28.628229
4 | 2023-03-01 04:33:28.62823
5 | 2023-03-01 04:33:28.62823
6 | 2023-03-01 04:33:28.62823
7 | 2023-03-01 04:33:28.628231
8 | 2023-03-01 04:33:28.628231
9 | 2023-03-01 04:33:28.628232
postgres=# create table tbl2 as select * from view1;
SELECT 0
postgres=# select * from tbl2;
zahl | upd_dat
------+---------
(0 rows)
Sessions 3 and 5:
postgres=# select * from table1;
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:33:28.628112
2 | 2023-03-01 04:33:28.628227
3 | 2023-03-01 04:33:28.628229
4 | 2023-03-01 04:33:28.62823
5 | 2023-03-01 04:33:28.62823
6 | 2023-03-01 04:33:28.62823
7 | 2023-03-01 04:33:28.628231
8 | 2023-03-01 04:33:28.628231
postgres=# create table tbl3 as select * from table1;
SELECT 10
postgres=# select * from tbl3;
zahl | upd_dat
------+----------------------------
1 | 2023-03-01 04:36:00.762788
2 | 2023-03-01 04:36:00.762911
3 | 2023-03-01 04:36:00.762913
4 | 2023-03-01 04:36:00.762914
5 | 2023-03-01 04:36:00.762915
6 | 2023-03-01 04:36:00.762915
7 | 2023-03-01 04:36:00.762915
8 | 2023-03-01 04:36:00.762916
"David G. Johnston" <david.g.johnston@gmail.com> writes:
Session 4:
create table tbl2 as select * from view1;
SELECT 0
Why is it OK for session 4 to be different here?
Maybe it isn't. The code flow for CREATE TABLE AS is a bit weird
IIRC, and maybe it's missing a step where we should update the
active snapshot.
It is unclear to me whether you were instead talking about other sessions
dropping tables as another way of saying "ACCESS EXCLUSIVE" in which case
at what lock level should this anomaly go away, and does it?
The originally-proposed tests seemed to all involve either TRUNCATE
or DROP TABLE, which are outside what I consider to be our MVCC
guarantees. Your example here does seem a bit strange though.
Views generally ought not have different semantics from writing
out the view query in-line.
regards, tom lane
On Wed, Mar 01, 2023 at 01:53:22AM -0500, Tom Lane wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
Session 4:
create table tbl2 as select * from view1;
SELECT 0Why is it OK for session 4 to be different here?
Maybe it isn't. The code flow for CREATE TABLE AS is a bit weird
IIRC, and maybe it's missing a step where we should update the
active snapshot.
I think it comes from this chunk in ExecCreateTableAs():
rewritten = QueryRewrite(query);
/* SELECT should never rewrite to more or less than one SELECT query */
if (list_length(rewritten) != 1)
elog(ERROR, "unexpected rewrite result for %s",
is_matview ? "CREATE MATERIALIZED VIEW" :
"CREATE TABLE AS SELECT");
query = linitial_node(Query, rewritten);
Assert(query->commandType == CMD_SELECT);
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
/*
* Use a snapshot with an updated command ID to ensure this query sees
* results of any previously executed queries. (This could only
* matter if the planner executed an allegedly-stable function that
* changed the database contents, but let's do it anyway to be
* parallel to the EXPLAIN code path.)
*/
PushCopiedSnapshot(GetActiveSnapshot());
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
It seems to clearly be in contradiction with our usual rule that planning and
execution should have a different snapshot.
On Tuesday, February 28, 2023, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
It is unclear to me whether you were instead talking about other sessions
dropping tables as another way of saying "ACCESS EXCLUSIVE" in which case
at what lock level should this anomaly go away, and does it?The originally-proposed tests seemed to all involve either TRUNCATE
or DROP TABLE,
The only purpose of those was to produce a self-contained script that could
be run repeatedly. The commands were outside the transaction flow
demonstrating the issue.
David J.
Julien Rouhaud <rjuju123@gmail.com> writes:
On Wed, Mar 01, 2023 at 01:53:22AM -0500, Tom Lane wrote:
Maybe it isn't. The code flow for CREATE TABLE AS is a bit weird
IIRC, and maybe it's missing a step where we should update the
active snapshot.
I think it comes from this chunk in ExecCreateTableAs():
Hm. There are a lot of places that do this:
PushCopiedSnapshot(GetActiveSnapshot());
UpdateActiveSnapshotCommandId();
rather than
PushActiveSnapshot(GetTransactionSnapshot());
which is what would have the effect of noticing changes from other
sessions. Digging into the history of that, I found this commit:
Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_1_BR [c0b007603] 2011-02-28 23:28:06 -0500
Rearrange snapshot handling to make rule expansion more consistent.
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion. Fetching a whole
new snapshot now happens only between original queries. This is equivalent
to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the
best choice since it eliminates one source of concurrency hazards for
rules. The patch should also make things marginally faster by reducing the
number of snapshot push/pop operations.
So I guess it's intended behavior that we don't notice other-session
changes between steps of a single command. Whether that rule should
apply to CREATE TABLE AS is perhaps debatable --- but I see several
other places that are doing it exactly like this, so it's not like
CREATE TABLE AS is alone in its folly.
I'm pretty hesitant to change this without substantial thought.
regards, tom lane
On Wed, Mar 1, 2023 at 8:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Wed, Mar 01, 2023 at 01:53:22AM -0500, Tom Lane wrote:
Maybe it isn't. The code flow for CREATE TABLE AS is a bit weird
IIRC, and maybe it's missing a step where we should update the
active snapshot.I think it comes from this chunk in ExecCreateTableAs():
Hm. There are a lot of places that do this:
PushCopiedSnapshot(GetActiveSnapshot());
UpdateActiveSnapshotCommandId();rather than
PushActiveSnapshot(GetTransactionSnapshot());
which is what would have the effect of noticing changes from other
sessions. Digging into the history of that, I found this commit:Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_1_BR [c0b007603] 2011-02-28 23:28:06 -0500Rearrange snapshot handling to make rule expansion more consistent.
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion.
[...]
So I guess it's intended behavior that we don't notice other-session
changes between steps of a single command. Whether that rule should
apply to CREATE TABLE AS is perhaps debatable --- but I see several
other places that are doing it exactly like this, so it's not like
CREATE TABLE AS is alone in its folly.I'm pretty hesitant to change this without substantial thought.
So both Sessions 4 (view) and 5 (table) make it into tcop/postgres.c and
into parse analysis.
The view session gets past there because it is a utility command whose body
is parsable. postgres.c tcop invokes the executor (with a new snapshot but
still of the empty table) for the utility command CTAS passing the parsable
query along. CTAS performs rewrite and blocks there while holding the
empty table snapshot, and when it continues on to execution CTAS doesn't
pull forward the snapshot.
The table session fails to get past parse analysis while the lock is held
and once it does postgres.c tcop assigns a new snapshot which does contain
the data in table1, then invokes the CTAS utility.
To get parity in behavior either CTAS would need to assign a new snapshot
OR postgres.c tcop would need to perform rewriting on the query under the
CTAS prior to beginning execution of CTAS, since that rewriting would then
block the process before the new snapshot is assigned. (That is all in a
mostly questioning, not authoritative, tone
The non-CTAS view query gets blocked on rewrite while the non-CTAS table
query gets blocked in parse analysis. When unblocked, postgres.c tcop
assigns both a new snapshot just prior to execution.
I experimented with putting the create temp table into a function body then
calling the function - same results.
I agree this does seem like a poor risk/reward on the fixing side,
especially absent a concrete live use case problem. I am curious what led
to this discovery.
In terms of documentation; "query began" is the terminology we use but as
demonstrated here, from an end-user's perspective, and in the presence of
locking, the point at which the query began seems undefined. If it was
when I send the statement to the server then none of the four sessions
should be producing results, instead I get three that seem to violate the
read-committed rule. Which moves expectations to "begins retrieving rows"
in which case the CTAS+View combination, getting stuck in between, when the
CTAS+table one doesn't, indeed seems odd and maybe should at least be
addressed, even in general terms.
David J.