BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery
The following bug has been logged on the website:
Bug reference: 15715
Logged by: Lars Vonk
Email address: lars.vonk@gmail.com
PostgreSQL version: 11.2
Operating system: Ubuntu 18.04.2 LTS
Description:
Hi,
We are using a ruby library called delayed-job
(https://github.com/collectiveidea/delayed_job) that generates the following
sql
LOG: duration: 82797.554 ms execute <unnamed>: UPDATE "delayed_jobs" SET
locked_at = '2019-03-26 13:25:20.808244', locked_by = 'host:myhost pid:9958'
WHERE id IN (SELECT "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at
<= '2019-03-26 13:25:20.807815' AND (locked_at IS NULL OR locked_at <
'2019-03-26 09:25:20.807828') OR locked_by = 'host:myhost pid:9958') AND
failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE)
RETURNING *
We noticed that all the "delayed_jobs" were getting locked instead of just
1. There is already a bug report (and a recent "fix") for this in
delayed_job:
https://github.com/collectiveidea/delayed_job_active_record/issues/143
We also found some related stackoverflow content which sort of hints that
this is a bug:
- https://dba.stackexchange.com/questions/69471/postgres-update-limit-1
-
https://github.com/feikesteenbergen/demos/blob/master/bugs/update_limit_bug.txt
So is this a postgres bug? Or is "limit 1" not garantueed in subqueries?
Kind regards,
Lars
PG Bug reporting form <noreply@postgresql.org> writes:
We are using a ruby library called delayed-job
(https://github.com/collectiveidea/delayed_job) that generates the following
sql
LOG: duration: 82797.554 ms execute <unnamed>: UPDATE "delayed_jobs" SET
locked_at = '2019-03-26 13:25:20.808244', locked_by = 'host:myhost pid:9958'
WHERE id IN (SELECT "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at
<= '2019-03-26 13:25:20.807815' AND (locked_at IS NULL OR locked_at <
'2019-03-26 09:25:20.807828') OR locked_by = 'host:myhost pid:9958') AND
failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE)
RETURNING *
We noticed that all the "delayed_jobs" were getting locked instead of just
1.
That'd depend on the query plan you're getting, but in principle this
query is unstable if there are other transactions changing the table
contents: if the sub-SELECT is executed more than once then it could
return (and lock) a different row each time, since the FOR UPDATE
implies it will see latest values not those that applied when the
query started.
For that matter it could be unstable even without any concurrent
transactions, seeing that the UPDATE's action is to change some of
the fields the sub-SELECT's WHERE examines, and again the FOR UPDATE
means that the sub-SELECT can see those changes. Given these particular
values it seems like a row that satisfied the WHERE conditions
to start with still would do so after the UPDATE, but I wonder whether
that's always true in your usage.
Any given execution of the sub-SELECT is definitely going to lock only one
(or at most one) row, but whether it's the same row each time is harder to
say. You might be well advised to replace the sub-SELECT with a CTE so
that it's guaranteed to execute only once, ie
WITH x AS (SELECT id ... FOR UPDATE LIMIT 1)
UPDATE delayed_jobs SET ... WHERE id IN (SELECT id FROM x)
We also found some related stackoverflow content which sort of hints that
this is a bug:
- https://dba.stackexchange.com/questions/69471/postgres-update-limit-1
-
https://github.com/feikesteenbergen/demos/blob/master/bugs/update_limit_bug.txt
In that case the query is intentionally changing the sub-query's results
(by changing the sort ordering field). I don't think it's a bug; it's
really closer to being unspecified behavior.
regards, tom lane
Hi,
Thanks for the explanation. Our hotfix was to replace the IN with a = so
the query fails and the transaction is rolled back when more than one is
returned from the sub-SELECT. We will follow up on your advice and will
rewrite the query to your recommendation.
I understand how this query is unstable and can lock an extra row. What I
still don't fully understand yet how this explains what happened though,
since we never so this in almost 7 years and because of the rapid increase
in rows being locked by this one process. So I hope you are willing to
help me understand (I am happy to take this to another mailing list if
needed).
To add a bit more context:
- The library delayed_job is a background processor, (this) one process
locks a row, and than executes the "delayed_job" (a row in the table). It
will delete the row on success or marks it as failed afterwards.
- Multiple "delayed_job"s per second can be inserted by other processes.
- This is the first time this behavior happened in almost 7 years. We
recently upgraded to postgres 11.2 (from 9.6). The ruby library is pretty
stable, that part of the code generating the query did not change for a
long time.
- After 4 minutes or so > 20000 rows were locked by this process
Coming back at your explanation:
That'd depend on the query plan you're getting, but in principle this
query is unstable if there are other transactions changing the table
contents: if the sub-SELECT is executed more than once then it could
return (and lock) a different row each time, since the FOR UPDATE
implies it will see latest values not those that applied when the
query started.
So if I understand this correctly if in the query plan the sub-SELECT is
executed 10 times, than 10 rows will be locked?
Is it plauseable for a query plan to execute a sub-SELECT so many times?
Otherwise I can't explain the rapid increase in number of rows.
What also is strange is the duration of that query (the query normally
takes couple of milliseconds).
Almost as if it was in an endless loop or something, immediately updating
new rows after insterted by other processes.
Is that also possible given your explanation?
Again thanks for your help.
Lars
On Tue, Mar 26, 2019 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
PG Bug reporting form <noreply@postgresql.org> writes:
We are using a ruby library called delayed-job
(https://github.com/collectiveidea/delayed_job) that generates thefollowing
sql
LOG: duration: 82797.554 ms execute <unnamed>: UPDATE "delayed_jobs"
SET
locked_at = '2019-03-26 13:25:20.808244', locked_by = 'host:myhost
pid:9958'
WHERE id IN (SELECT "delayed_jobs"."id" FROM "delayed_jobs" WHERE
((run_at
<= '2019-03-26 13:25:20.807815' AND (locked_at IS NULL OR locked_at <
'2019-03-26 09:25:20.807828') OR locked_by = 'host:myhost pid:9958') AND
failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE)
RETURNING *We noticed that all the "delayed_jobs" were getting locked instead of
just
1.
That'd depend on the query plan you're getting, but in principle this
query is unstable if there are other transactions changing the table
contents: if the sub-SELECT is executed more than once then it could
return (and lock) a different row each time, since the FOR UPDATE
implies it will see latest values not those that applied when the
query started.For that matter it could be unstable even without any concurrent
transactions, seeing that the UPDATE's action is to change some of
the fields the sub-SELECT's WHERE examines, and again the FOR UPDATE
means that the sub-SELECT can see those changes. Given these particular
values it seems like a row that satisfied the WHERE conditions
to start with still would do so after the UPDATE, but I wonder whether
that's always true in your usage.Any given execution of the sub-SELECT is definitely going to lock only one
(or at most one) row, but whether it's the same row each time is harder to
say. You might be well advised to replace the sub-SELECT with a CTE so
that it's guaranteed to execute only once, ieWITH x AS (SELECT id ... FOR UPDATE LIMIT 1)
UPDATE delayed_jobs SET ... WHERE id IN (SELECT id FROM x)We also found some related stackoverflow content which sort of hints that
this is a bug:
- https://dba.stackexchange.com/questions/69471/postgres-update-limit-1
-https://github.com/feikesteenbergen/demos/blob/master/bugs/update_limit_bug.txt
In that case the query is intentionally changing the sub-query's results
(by changing the sort ordering field). I don't think it's a bug; it's
really closer to being unspecified behavior.regards, tom lane
Lars Vonk <lars.vonk@gmail.com> writes:
Thanks for the explanation. Our hotfix was to replace the IN with a = so
the query fails and the transaction is rolled back when more than one is
returned from the sub-SELECT.
And have you actually seen any such failures? If you have, then there
is certainly something going wrong here. The LIMIT should never allow
more than one row to be returned by the sub-SELECT, period, full stop.
The issue here is that the sub-SELECT is, in principle, executed afresh
for every row of the outer query, and you do not have a guarantee that
each such execution returns the *same* single row. Ordinary query
execution would provide such a guarantee, but you're using FOR UPDATE
on a table that's being modified concurrently (including by this query
itself), and therefore the guarantee disappears.
You might think that since the sub-SELECT is uncorrelated with the outer
query, there's no need to execute it more than once ... but that's an
optimization, not part of the guaranteed semantics. Without seeing an
EXPLAIN for this query on your system, we can't know whether it's being
done like that (though the fact that you're complaining suggests that
it isn't).
Interestingly, it's quite likely that your "hotfix" made the problem
go away, because PG generally *does* optimize uncorrelated sub-SELECTs
to be executed only once if they return scalar results. That doesn't
happen for IN subqueries though, as those are treated as joins.
Either way, a WITH is probably preferable because that does get you
into a place where we guarantee single-evaluation semantics.
regards, tom lane
And have you actually seen any such failures?
No not yet. And probably won't, given your explanation.
So what I understand so far is that allthough the query is not garantueed
to return the same single row, it is not possible it returned and updated
multiple rows in a single execution?
Without seeing an
EXPLAIN for this query on your system, we can't know whether it's being
done like that
I ran the query again with EXPLAIN ANALYZE (but I am unsure if this means
anything since we do not have the exact EXPLAIN of that particular query):
Update on delayed_jobs (cost=146.81..154.85 rows=1 width=758) (actual
time=1.035..1.035 rows=0 loops=1)
-> Nested Loop (cost=146.81..154.85 rows=1 width=758) (actual
time=1.035..1.035 rows=0 loops=1)
-> HashAggregate (cost=146.54..146.55 rows=1 width=32) (actual
time=1.035..1.035 rows=0 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=146.51..146.53
rows=1 width=32) (actual time=1.034..1.034 rows=0 loops=1)
-> Limit (cost=146.51..146.52 rows=1 width=22)
(actual time=1.033..1.034 rows=0 loops=1)
-> LockRows (cost=146.51..146.52 rows=1
width=22) (actual time=1.033..1.033 rows=0 loops=1)
-> Sort (cost=146.51..146.51 rows=1
width=22) (actual time=1.033..1.033 rows=0 loops=1)
Sort Key: delayed_jobs_1.priority,
delayed_jobs_1.run_at
Sort Method: quicksort Memory: 25kB
-> Seq Scan on delayed_jobs
delayed_jobs_1 (cost=0.00..146.50 rows=1 width=22) (actual
time=1.010..1.010 rows=0 loops=1)
Filter: ((failed_at IS NULL)
AND ((queue)::text = 'workflows'::text) AND (((run_at <= '2019-03-26
13:25:22.208747'::timestamp without time zone) AND ((locked_at IS NULL) OR
(locked_at < '2019-03-26 09:25:22.20877'::timestamp without time zone))) OR
((locked_by)::text = 'host:job01.prod.jortt.nl pid:10029'::text)))
Rows Removed by Filter: 160
-> Index Scan using delayed_jobs_pkey on delayed_jobs
(cost=0.28..8.29 rows=1 width=206) (never executed)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.367 ms
Execution Time: 1.076 ms
(17 rows)
Either way, a WITH is probably preferable because that does get you
into a place where we guarantee single-evaluation semantics.
Will do.
Thanks again for your time and patience to explain.
Lars
On Tue, Mar 26, 2019 at 11:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Lars Vonk <lars.vonk@gmail.com> writes:
Thanks for the explanation. Our hotfix was to replace the IN with a = so
the query fails and the transaction is rolled back when more than one is
returned from the sub-SELECT.And have you actually seen any such failures? If you have, then there
is certainly something going wrong here. The LIMIT should never allow
more than one row to be returned by the sub-SELECT, period, full stop.The issue here is that the sub-SELECT is, in principle, executed afresh
for every row of the outer query, and you do not have a guarantee that
each such execution returns the *same* single row. Ordinary query
execution would provide such a guarantee, but you're using FOR UPDATE
on a table that's being modified concurrently (including by this query
itself), and therefore the guarantee disappears.You might think that since the sub-SELECT is uncorrelated with the outer
query, there's no need to execute it more than once ... but that's an
optimization, not part of the guaranteed semantics. Without seeing an
EXPLAIN for this query on your system, we can't know whether it's being
done like that (though the fact that you're complaining suggests that
it isn't).Interestingly, it's quite likely that your "hotfix" made the problem
go away, because PG generally *does* optimize uncorrelated sub-SELECTs
to be executed only once if they return scalar results. That doesn't
happen for IN subqueries though, as those are treated as joins.Either way, a WITH is probably preferable because that does get you
into a place where we guarantee single-evaluation semantics.regards, tom lane
Lars Vonk <lars.vonk@gmail.com> writes:
So what I understand so far is that allthough the query is not garantueed
to return the same single row, it is not possible it returned and updated
multiple rows in a single execution?
The LIMIT node will absolutely not return more than one row per execution
of the subquery. However ... what you're asking about is how many rows
can get locked, which is a slightly different question.
I ran the query again with EXPLAIN ANALYZE (but I am unsure if this means
anything since we do not have the exact EXPLAIN of that particular query):
AFAICS, this plan shape wouldn't have the issue because the subquery is
on the outside of a nestloop and so would only get run once. What I'm
supposing is that you get trouble if the planner decides to go with a
nestloop semijoin (with the IN subquery on the inside of that loop).
Now either way, the plan tree for the subquery itself ought to look
like what you have here:
-> Limit (cost=146.51..146.52 rows=1 width=22) (actual time=1.033..1.034 rows=0 loops=1)
-> LockRows (cost=146.51..146.52 rows=1 width=22) (actual time=1.033..1.033 rows=0 loops=1)
-> Sort (cost=146.51..146.51 rows=1 width=22) (actual time=1.033..1.033 rows=0 loops=1)
Sort Key: delayed_jobs_1.priority, delayed_jobs_1.run_at
Sort Method: quicksort Memory: 25kB -> Seq Scan on delayed_jobs
delayed_jobs_1 (cost=0.00..146.50 rows=1 width=22) (actual time=1.010..1.010 rows=0 loops=1)
Filter: ((failed_at IS NULL) AND ((queue)::text = 'workflows'::text) AND (((run_at <= '2019-03-26 13:25:22.208747'::timestamp without time zone) AND ((locked_at IS NULL) OR (locked_at < '2019-03-26 09:25:22.20877'::timestamp without time zone))) OR ((locked_by)::text = 'host:job01.prod.jortt.nl pid:10029'::text)))
Rows Removed by Filter: 160
The seqscan+sort is going to find and return all the rows that meet
that "filter" condition *as of the start of the query*. The LockRows
node is going to take the first of those and lock it, which will include
finding and locking any newer version of the row that exists due to a
concurrent update. If there is a newer version, it then rechecks whether
that version still satisfies the filter condition (via some magic we
needn't get into here). If so, it returns the row to the LIMIT node,
which returns it up and then declares it's done, so we have found and
locked exactly one row. However, if that first row has been updated
to a state that *doesn't* satisfy the filter condition, the LockRows
node will advance to the next row of the seqscan+sort output, and lock
and retest that one. This repeats till it finds a row that does still
satisfy the filter condition post-locking.
So it's fairly easy to see how concurrent updates could cause this
query to lock N rows, for some N larger than one. But by itself
this isn't a very satisfactory explanation for the query locking
*all* the rows as you state happened. All of them would've had
to be concurrently updated to states that no longer satisfy the
filter condition, and that seems pretty unlikely to happen as a
consequence of a few other transactions individually doing the same
type of query.
Perhaps that could happen if the outer UPDATE were itself updating the row
to no longer satisfy the filter condition, so that the next iteration of
the subquery then skipped over it. But (a) your example doesn't seem to
do that, and (b) if it did happen like that then the symptoms would not
just be that the rows were locked, but that they were all updated as well.
(Or is that what you meant? You haven't been clear about what led you to
conclude that all the rows got locked.)
regards, tom lane
The LIMIT node will absolutely not return more than one row per execution
of the subquery. However ... what you're asking about is how many rows
can get locked, which is a slightly different question.
But (a) your example doesn't seem to
do that, and (b) if it did happen like that then the symptoms would not
just be that the rows were locked, but that they were all updated as well.
(Or is that what you meant? You haven't been clear about what led you to
conclude that all the rows got locked.)
My apologies for my unclear report. I meant *updated*. I guess since (a) I
kind of assumed (oh dear) that the locking and the updating is either 1 or
none and (b) it updates a locked_at column made me mix up the terminology.
So to be clear: This query caused that more than one row (actually > 1000s)
*got updated*, I don't know if they also got locked. We believe this
happened in that one single query. Especially because the query took ~ 82
seconds or so to complete.
So is that possible given that query?
Lars
On Wed, Mar 27, 2019 at 3:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Lars Vonk <lars.vonk@gmail.com> writes:
So what I understand so far is that allthough the query is not garantueed
to return the same single row, it is not possible it returned and updated
multiple rows in a single execution?The LIMIT node will absolutely not return more than one row per execution
of the subquery. However ... what you're asking about is how many rows
can get locked, which is a slightly different question.I ran the query again with EXPLAIN ANALYZE (but I am unsure if this means
anything since we do not have the exact EXPLAIN of that particularquery):
AFAICS, this plan shape wouldn't have the issue because the subquery is
on the outside of a nestloop and so would only get run once. What I'm
supposing is that you get trouble if the planner decides to go with a
nestloop semijoin (with the IN subquery on the inside of that loop).Now either way, the plan tree for the subquery itself ought to look
like what you have here:-> Limit (cost=146.51..146.52 rows=1 width=22)
(actual time=1.033..1.034 rows=0 loops=1)
-> LockRows (cost=146.51..146.52 rows=1
width=22) (actual time=1.033..1.033 rows=0 loops=1)
-> Sort (cost=146.51..146.51 rows=1
width=22) (actual time=1.033..1.033 rows=0 loops=1)
Sort Key:
delayed_jobs_1.priority, delayed_jobs_1.run_at
Sort Method: quicksort Memory:
25kB -> Seq Scan on delayed_jobs
delayed_jobs_1 (cost=0.00..146.50 rows=1 width=22) (actual
time=1.010..1.010 rows=0 loops=1)
Filter: ((failed_at IS
NULL) AND ((queue)::text = 'workflows'::text) AND (((run_at <= '2019-03-26
13:25:22.208747'::timestamp without time zone) AND ((locked_at IS NULL) OR
(locked_at < '2019-03-26 09:25:22.20877'::timestamp without time zone))) OR
((locked_by)::text = 'host:job01.prod.jortt.nl pid:10029'::text)))Rows Removed by Filter: 160
The seqscan+sort is going to find and return all the rows that meet
that "filter" condition *as of the start of the query*. The LockRows
node is going to take the first of those and lock it, which will include
finding and locking any newer version of the row that exists due to a
concurrent update. If there is a newer version, it then rechecks whether
that version still satisfies the filter condition (via some magic we
needn't get into here). If so, it returns the row to the LIMIT node,
which returns it up and then declares it's done, so we have found and
locked exactly one row. However, if that first row has been updated
to a state that *doesn't* satisfy the filter condition, the LockRows
node will advance to the next row of the seqscan+sort output, and lock
and retest that one. This repeats till it finds a row that does still
satisfy the filter condition post-locking.So it's fairly easy to see how concurrent updates could cause this
query to lock N rows, for some N larger than one. But by itself
this isn't a very satisfactory explanation for the query locking
*all* the rows as you state happened. All of them would've had
to be concurrently updated to states that no longer satisfy the
filter condition, and that seems pretty unlikely to happen as a
consequence of a few other transactions individually doing the same
type of query.Perhaps that could happen if the outer UPDATE were itself updating the row
to no longer satisfy the filter condition, so that the next iteration of
the subquery then skipped over it. But (a) your example doesn't seem to
do that, and (b) if it did happen like that then the symptoms would not
just be that the rows were locked, but that they were all updated as well.
(Or is that what you meant? You haven't been clear about what led you to
conclude that all the rows got locked.)regards, tom lane