In progress INSERT wrecks plans on table
Recently we encountered the following unhappy sequence of events:
1/ system running happily
2/ batch load into table begins
3/ very quickly (some) preexisting queries on said table go orders of
magnitude slower
4/ database instance becomes unresponsive
5/ application outage
After looking down a few false leads, We've isolated the cause to the
following:
The accumulating in-progress row changes are such that previously
optimal plans are optimal no longer. Now this situation will fix itself
when the next autoanalyze happens (and new plan will be chosen) -
however that cannot occur until the batch load is completed and
committed (approx 70 seconds). However during that time there is enough
of a performance degradation for queries still using the old plan to
cripple the server.
Now that we know what is happening we can work around it. But I'm
wondering - is there any way (or if not should there be one) to let
postgres handle this automatically? I experimented with a quick hack to
src/backend/commands/analyze.c (attached) that lets another session's
ANALYZE see in progress rows - which works but a) may cause other
problems and b) does not help autoaanalyze which has to wait for COMMIT
+ stats message.
I've attached a (synthetic) test case that shows the issue, I'll
reproduce the output below to hopefully make the point obvious:
Table "public.plan"
Column | Type | Modifiers
--------+-----------------------------+-----------
id | integer | not null
typ | integer | not null
dat | timestamp without time zone |
val | text | not null
Indexes:
"plan_id" UNIQUE, btree (id)
"plan_dat" btree (dat)
"plan_typ" btree (typ)
[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Index Scan using plan_dat on plan (cost=0.00..265.47 rows=55
width=117) (actual time=0.130..4.409 rows=75 loops=1)
Index Cond: (dat IS NOT NULL)
Filter: (typ = 3)
Rows Removed by Filter: 5960
Total runtime: 4.440 ms
(5 rows)
[Session 2]
BEGIN;
INSERT INTO plan
SELECT id + 2000001,typ,current_date + id * '1 seconds'::interval ,val
FROM plan
;
[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using plan_dat on plan (cost=0.00..551.35 rows=91
width=117) (actual time=0.131..202.699 rows=75 loops=1)
Index Cond: (dat IS NOT NULL)
Filter: (typ = 3)
Rows Removed by Filter: 5960
Total runtime: 202.729 ms
(5 rows)
[Session 2]
COMMIT;
[Session 1...wait for autoanalyze to finish then]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on plan (cost=407.87..44991.95 rows=10116 width=117)
(actual time=2.692..6.582 rows=75 loops=1)
Recheck Cond: (typ = 3)
Filter: (dat IS NOT NULL)
Rows Removed by Filter: 19925
-> Bitmap Index Scan on plan_typ (cost=0.00..405.34 rows=20346
width=0) (actual time=2.573..2.573 rows=20000 loops=1)
Index Cond: (typ = 3)
Total runtime: 6.615 ms
Regards
Mark
On 26/04/13 14:56, Gavin Flower wrote:
On 26/04/13 14:33, Mark Kirkwood wrote:
Recently we encountered the following unhappy sequence of events:
1/ system running happily
2/ batch load into table begins
3/ very quickly (some) preexisting queries on said table go orders of
magnitude slower
4/ database instance becomes unresponsive
5/ application outageAfter looking down a few false leads, We've isolated the cause to the
following:The accumulating in-progress row changes are such that previously
optimal plans are optimal no longer. Now this situation will fix
itself when the next autoanalyze happens (and new plan will be
chosen) - however that cannot occur until the batch load is completed
and committed (approx 70 seconds). However during that time there is
enough of a performance degradation for queries still using the old
plan to cripple the server.Now that we know what is happening we can work around it. But I'm
wondering - is there any way (or if not should there be one) to let
postgres handle this automatically? I experimented with a quick hack
to src/backend/commands/analyze.c (attached) that lets another
session's ANALYZE see in progress rows - which works but a) may cause
other problems and b) does not help autoaanalyze which has to wait
for COMMIT + stats message.I've attached a (synthetic) test case that shows the issue, I'll
reproduce the output below to hopefully make the point obvious:Table "public.plan"
Column | Type | Modifiers
--------+-----------------------------+-----------
id | integer | not null
typ | integer | not null
dat | timestamp without time zone |
val | text | not null
Indexes:
"plan_id" UNIQUE, btree (id)
"plan_dat" btree (dat)
"plan_typ" btree (typ)[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------Index Scan using plan_dat on plan (cost=0.00..265.47 rows=55
width=117) (actual time=0.130..4.409 rows=75 loops=1)
Index Cond: (dat IS NOT NULL)
Filter: (typ = 3)
Rows Removed by Filter: 5960
Total runtime: 4.440 ms
(5 rows)[Session 2]
BEGIN;
INSERT INTO plan
SELECT id + 2000001,typ,current_date + id * '1 seconds'::interval ,val
FROM plan
;[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------Index Scan using plan_dat on plan (cost=0.00..551.35 rows=91
width=117) (actual time=0.131..202.699 rows=75 loops=1)
Index Cond: (dat IS NOT NULL)
Filter: (typ = 3)
Rows Removed by Filter: 5960
Total runtime: 202.729 ms
(5 rows)
[Session 2]
COMMIT;[Session 1...wait for autoanalyze to finish then]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on plan (cost=407.87..44991.95 rows=10116
width=117) (actual time=2.692..6.582 rows=75 loops=1)
Recheck Cond: (typ = 3)
Filter: (dat IS NOT NULL)
Rows Removed by Filter: 19925
-> Bitmap Index Scan on plan_typ (cost=0.00..405.34 rows=20346
width=0) (actual time=2.573..2.573 rows=20000 loops=1)
Index Cond: (typ = 3)
Total runtime: 6.615 msRegards
Mark
Hmm...
You need to specify:
1. version of Postgres
2. Operating system
3. changes to postgresql.conf
4. CPU/RAM etc
5. anything else that might be relevant
While in general you are quite correct - in the above case (particularly
as I've supplied a test case) it should be pretty obvious that any
moderately modern version of postgres on any supported platform will
exhibit this.
I produced the above test case on Postgres 9.2.4 Ubuntu 13.04, with no
changes to the default postgresql.conf
Now our actual production server is a 32 CPU box with 512GB RAM, and 16
SAS SSD running Postgres 9.2.4 on Ubuntu 12.04. And yes there are quite
a few changes from the defaults there - and I wasted quite a lot of time
chasing issues with high CPU and RAM, and changing various configs to
see if they helped - before identifying that the issue was in progress
row changes and planner statistics. Also in the "real" case with much
bigger datasets the difference between the plan being optimal and it
*not* being optimal is a factor of 2000x elapsed time instead of a mere
50x !
regards
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Import Notes
Reply to msg id not found: 5179ECC5.2010206@archidevsys.co.nz
On 26/04/13 15:34, Gavin Flower wrote:
On 26/04/13 15:19, Mark Kirkwood wrote:
While in general you are quite correct - in the above case
(particularly as I've supplied a test case) it should be pretty
obvious that any moderately modern version of postgres on any
supported platform will exhibit this.While I admit that I did not look closely at your test case - I am aware
that several times changes to Postgres from one minor version to
another, can have drastic unintended side effects (which might, or might
not, be relevant to your situation). Besides, it helps sets the scene,
and is one less thing that needs to be deduced.
Indeed - however, my perhaps slightly grumpy reply to your email was
based on an impression of over keen-ness to dismiss my message without
reading it (!) and a - dare I say it - one size fits all presentation of
"here are the hoops to jump through". Now I spent a reasonable amount of
time preparing the message and its attendant test case - and a comment
such as your based on *not reading it* ...errrm... well lets say I think
we can/should do better.
I am concerned that the deafening lack of any replies to my original
message is a result of folk glancing at your original quick reply and
thinking... incomplete problem spec...ignore... when that is not that
case - yes I should have muttered "9.2" in the original email, but we
have covered that now.
Regards
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Import Notes
Reply to msg id not found: 5179F5C6.8070702@archidevsys.co.nz
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
I am concerned that the deafening lack of any replies to my original
message is a result of folk glancing at your original quick reply and
thinking... incomplete problem spec...ignore... when that is not that
case - yes I should have muttered "9.2" in the original email, but we
have covered that now.
No, I think it's more that we're trying to get to beta, and so anything
that looks like new development is getting shuffled to folks' "to
look at later" queues. The proposed patch is IMO a complete nonstarter
anyway; but I'm not sure what a less bogus solution would look like.
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 02/05/13 02:06, Tom Lane wrote:
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
I am concerned that the deafening lack of any replies to my original
message is a result of folk glancing at your original quick reply and
thinking... incomplete problem spec...ignore... when that is not that
case - yes I should have muttered "9.2" in the original email, but we
have covered that now.No, I think it's more that we're trying to get to beta, and so anything
that looks like new development is getting shuffled to folks' "to
look at later" queues. The proposed patch is IMO a complete nonstarter
anyway; but I'm not sure what a less bogus solution would look like.
Yeah, I did think that beta might be consuming everyone's attention (of
course immediately *after* sending the email)!
And yes, the patch was merely to illustrate the problem rather than any
serious attempt at a solution.
Regards
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 2 May 2013 01:49, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 02/05/13 02:06, Tom Lane wrote:
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
I am concerned that the deafening lack of any replies to my original
message is a result of folk glancing at your original quick reply and
thinking... incomplete problem spec...ignore... when that is not that
case - yes I should have muttered "9.2" in the original email, but we
have covered that now.No, I think it's more that we're trying to get to beta, and so anything
that looks like new development is getting shuffled to folks' "to
look at later" queues. The proposed patch is IMO a complete nonstarter
anyway; but I'm not sure what a less bogus solution would look like.Yeah, I did think that beta might be consuming everyone's attention (of
course immediately *after* sending the email)!And yes, the patch was merely to illustrate the problem rather than any
serious attempt at a solution.
I think we need a problem statement before we attempt a solution,
which is what Tom is alluding to.
ISTM that you've got a case where the plan is very sensitive to a
table load. Which is a pretty common situation and one that can be
solved in various ways. I don't see much that Postgres can do because
it can't know ahead of time you're about to load rows. We could
imagine an optimizer that set thresholds on plans that caused the
whole plan to be recalculated half way thru a run, but that would be a
lot of work to design and implement and even harder to test. Having
static plans at least allows us to discuss what it does after the fact
with some ease.
The plan is set using stats that are set when there are very few
non-NULL rows, and those increase massively on load. The way to cope
is to run the ANALYZE immediately after the load and then don't allow
auto-ANALYZE to reset them later.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 03/05/13 00:27, Simon Riggs wrote:
On 2 May 2013 01:49, Mark Kirkwood<mark.kirkwood@catalyst.net.nz> wrote:
On 02/05/13 02:06, Tom Lane wrote:
Mark Kirkwood<mark.kirkwood@catalyst.net.nz> writes:
I am concerned that the deafening lack of any replies to my original
message is a result of folk glancing at your original quick reply and
thinking... incomplete problem spec...ignore... when that is not that
case - yes I should have muttered "9.2" in the original email, but we
have covered that now.No, I think it's more that we're trying to get to beta, and so anything
that looks like new development is getting shuffled to folks' "to
look at later" queues. The proposed patch is IMO a complete nonstarter
anyway; but I'm not sure what a less bogus solution would look like.Yeah, I did think that beta might be consuming everyone's attention (of
course immediately *after* sending the email)!And yes, the patch was merely to illustrate the problem rather than any
serious attempt at a solution.I think we need a problem statement before we attempt a solution,
which is what Tom is alluding to.ISTM that you've got a case where the plan is very sensitive to a
table load. Which is a pretty common situation and one that can be
solved in various ways. I don't see much that Postgres can do because
it can't know ahead of time you're about to load rows. We could
imagine an optimizer that set thresholds on plans that caused the
whole plan to be recalculated half way thru a run, but that would be a
lot of work to design and implement and even harder to test. Having
static plans at least allows us to discuss what it does after the fact
with some ease.The plan is set using stats that are set when there are very few
non-NULL rows, and those increase massively on load. The way to cope
is to run the ANALYZE immediately after the load and then don't allow
auto-ANALYZE to reset them later.--
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Would be practicable to have a facility for telling Postgres in advance
what you intend to do, so it can create plans accordingly?
I won't try and invent syntax, but it would be good to tell the system
that you intend to:
insert a million rows sequentially
or
insert ten million rows randomly
or
update two million rows in the primary key range 'AA00000' to 'PP88877'
etc.
Though, sometime it may be useful to give a little more
detail,especially if you have a good estimate of the distribution of
primary keys, e.g.:
AA000000
20%
AA000456
0%
KE700999
30%
NN400005
35%
PA000001
15%
PP808877
I figure that if the planner had more information about what one intends
to do, then it could combine that with the statistics it knows, to come
up with a more realistic plan.
Cheers,
Gavin
On 2 May 2013 01:49, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
I think we need a problem statement before we attempt a solution,
which is what Tom is alluding to.
Actually no - I think Tom (quite correctly) was saying that the patch was
not a viable solution. With which I agree.
I believe the title of this thread is the problem statement.
ISTM that you've got a case where the plan is very sensitive to a
table load. Which is a pretty common situation and one that can be
solved in various ways. I don't see much that Postgres can do because
it can't know ahead of time you're about to load rows. We could
imagine an optimizer that set thresholds on plans that caused the
whole plan to be recalculated half way thru a run, but that would be a
lot of work to design and implement and even harder to test. Having
static plans at least allows us to discuss what it does after the fact
with some ease.The plan is set using stats that are set when there are very few
non-NULL rows, and those increase massively on load. The way to cope
is to run the ANALYZE immediately after the load and then don't allow
auto-ANALYZE to reset them later.
No. We do run analyze immediately after the load. The surprise was that
this was not sufficient - the (small) amount of time where non optimal
plans were being used due to the in progress row activity was enough to
cripple the system - that is the problem. The analysis of why not led to
the test case included in the original email. And sure it is deliberately
crafted to display the issue, and is therefore open to criticism for being
artificial. However it was purely meant to make it easy to see what I was
talking about.
Currently we are working around this by coercing one of the predicates in
the query to discourage the attractive looking but dangerous index.
I think the idea of telling postgres that we are doing a load is probably
the wrong way to go about this. We have a framework that tries to
automatically figure out the best plans...I think some more thought about
how to make that understand some of the more subtle triggers for a
time-to-do-new-plans moment is the way to go. I understand this is
probably hard - and may imply some radical surgery to how the stats
collector and planner interact.
Regards
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
mark.kirkwood@catalyst.net.nz wrote on 03.05.2013 00:19:
I think the idea of telling postgres that we are doing a load is probably
the wrong way to go about this. We have a framework that tries to
automatically figure out the best plans...I think some more thought about
how to make that understand some of the more subtle triggers for a
time-to-do-new-plans moment is the way to go. I understand this is
probably hard - and may imply some radical surgery to how the stats
collector and planner interact.
I wonder if "freezing" (analyze, then disable autovacuum) the statistics for the large number of rows would work.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
mark.kirkwood@catalyst.net.nz wrote on 03.05.2013 00:19:
I think the idea of telling postgres that we are doing a load is
probably
the wrong way to go about this. We have a framework that tries to
automatically figure out the best plans...I think some more thought
about
how to make that understand some of the more subtle triggers for a
time-to-do-new-plans moment is the way to go. I understand this is
probably hard - and may imply some radical surgery to how the stats
collector and planner interact.I wonder if "freezing" (analyze, then disable autovacuum) the statistics
for the large number of rows would work.
I'm thinking that the issue is actually the opposite - it is that a new
plan is needed because the new (uncomitted) rows are changing the data
distribution. So we want more plan instability rather than plan stability
:-)
Cheers
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 2 May 2013 23:19, <mark.kirkwood@catalyst.net.nz> wrote:
On 2 May 2013 01:49, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
I think we need a problem statement before we attempt a solution,
which is what Tom is alluding to.Actually no - I think Tom (quite correctly) was saying that the patch was
not a viable solution. With which I agree.I believe the title of this thread is the problem statement.
ISTM that you've got a case where the plan is very sensitive to a
table load. Which is a pretty common situation and one that can be
solved in various ways. I don't see much that Postgres can do because
it can't know ahead of time you're about to load rows. We could
imagine an optimizer that set thresholds on plans that caused the
whole plan to be recalculated half way thru a run, but that would be a
lot of work to design and implement and even harder to test. Having
static plans at least allows us to discuss what it does after the fact
with some ease.The plan is set using stats that are set when there are very few
non-NULL rows, and those increase massively on load. The way to cope
is to run the ANALYZE immediately after the load and then don't allow
auto-ANALYZE to reset them later.No. We do run analyze immediately after the load. The surprise was that
this was not sufficient - the (small) amount of time where non optimal
plans were being used due to the in progress row activity was enough to
cripple the system - that is the problem. The analysis of why not led to
the test case included in the original email. And sure it is deliberately
crafted to display the issue, and is therefore open to criticism for being
artificial. However it was purely meant to make it easy to see what I was
talking about.
I had another look at this and see I that I read the second explain incorrectly.
The amount of data examined and returned is identical in both plans.
The only difference is the number of in-progress rows seen by the
second query. Looking at the numbers some more, it looks like 6000
in-progress rows are examined in addition to the data. It might be
worth an EXPLAIN patch to put instrumentation in to show that, but its
not that interesting.
It would be useful to force the indexscan into a bitmapscan to check
that the cost isn't attributable to the plan but to other overheads.
What appears to be happening is we're spending a lot of time in
TransactionIdIsInProgress() so we can set hints and then when we find
it is still in progress we then spend more time in XidIsInSnapshot()
while we check that it is still invisible to us. Even if the
transaction we see repeatedly ends, we will still pay the cost in
XidIsInSnapshot repeatedly as we execute.
Given that code path, I would expect it to suck worse on a live system
with many sessions, and even worse with many subtransactions.
(1) A proposed fix is attached, but its only a partial one and barely tested.
Deeper fixes might be
(2) to sort the xid array if we call XidIsInSnapshot too many times
in a transaction. I don't think that is worth it, because a long
running snapshot may be examined many times, but is unlikely to see
multiple in-progress xids repeatedly. Whereas your case seems
reasonably common.
(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.
I think we need both (1) and (3) but the attached patch does just (1).
This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
xid_in_snapshot_cache.v1.patchapplication/octet-stream; name=xid_in_snapshot_cache.v1.patchDownload+37-0
On 3 May 2013 13:41, Simon Riggs <simon@2ndquadrant.com> wrote:
(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.I think we need both (1) and (3) but the attached patch does just (1).
This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.
ISTM we can improve performance of TransactionIdIsInProgress() by
caching the procno of our last xid.
Mark, could you retest with both these patches? Thanks.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
cache_TransactionIdInProgress.v1.patchapplication/octet-stream; name=cache_TransactionIdInProgress.v1.patchDownload+73-10
On 05/05/13 00:49, Simon Riggs wrote:
On 3 May 2013 13:41, Simon Riggs <simon@2ndquadrant.com> wrote:
(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.I think we need both (1) and (3) but the attached patch does just (1).
This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.ISTM we can improve performance of TransactionIdIsInProgress() by
caching the procno of our last xid.Mark, could you retest with both these patches? Thanks.
Thanks Simon, will do and report back.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 6 May 2013 02:51, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 05/05/13 00:49, Simon Riggs wrote:
On 3 May 2013 13:41, Simon Riggs <simon@2ndquadrant.com> wrote:
(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.I think we need both (1) and (3) but the attached patch does just (1).
This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.ISTM we can improve performance of TransactionIdIsInProgress() by
caching the procno of our last xid.Mark, could you retest with both these patches? Thanks.
Thanks Simon, will do and report back.
OK, here's a easily reproducible test...
Prep:
DROP TABLE IF EXISTS plan;
CREATE TABLE plan
(
id INTEGER NOT NULL,
typ INTEGER NOT NULL,
dat TIMESTAMP,
val TEXT NOT NULL
);
insert into plan select generate_series(1,100000), 0,
current_timestamp, 'some texts';
CREATE UNIQUE INDEX plan_id ON plan(id);
CREATE INDEX plan_dat ON plan(dat);
testcase.pgb
select count(*) from plan where dat is null and typ = 3;
Session 1:
pgbench -n -f testcase.pgb -t 100
Session 2:
BEGIN; insert into plan select 1000000 + generate_series(1, 100000),
3, NULL, 'b';
Transaction rate in Session 1: (in tps)
(a) before we run Session 2:
Current: 5600tps
Patched: 5600tps
(b) after Session 2 has run, yet before transaction end
Current: 56tps
Patched: 65tps
(c ) after Session 2 has aborted
Current/Patched: 836, 1028, 5400tps
VACUUM improves timing again
New version of patch attached which fixes a few bugs.
Patch works and improves things, but we're still swamped by the block
accesses via the index.
Which brings me back to Mark's original point, which is that we are
x100 times slower in this case and it *is* because the choice of
IndexScan is a bad one for this situation.
After some thought on this, I do think we need to do something about
it directly, rather than by tuning infrastructire (as I just
attempted). The root cause here is that IndexScan plans are sensitive
to mistakes in data distribution, much more so than other plan types.
The two options, broadly, are to either
1. avoid IndexScans in the planner unless they have a *significantly*
better cost. At the moment we use IndexScans if cost is lowest, even
if that is only by a whisker.
2. make IndexScans adaptive so that they switch to other plan types
mid-way through execution.
(2) seems fairly hard generically, since we'd have to keep track of
the tids returned from the IndexScan to allow us to switch to a
different plan and avoid re-issuing rows that we've already returned.
But maybe if we adapted the IndexScan plan type so that it adopted a
more page oriented approach internally, it could act like a
bitmapscan. Anyway, that would need some proof that it would work and
sounds like a fair task.
(1) sounds more easily possible and plausible. At the moment we have
enable_indexscan = off. If we had something like
plan_cost_weight_indexscan = N, we could selectively increase the cost
of index scans so that they would be less likely to be selected. i.e.
plan_cost_weight_indexscan = 2 would mean an indexscan would need to
be half the cost of any other plan before it was selected. (parameter
name selected so it could apply to all parameter types). The reason to
apply this weighting would be to calculate "risk adjusted cost" not
just estimated cost.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
cache_TransactionIdInProgress.v2.patchapplication/octet-stream; name=cache_TransactionIdInProgress.v2.patchDownload+78-5
(2) seems fairly hard generically, since we'd have to keep track of
the tids returned from the IndexScan to allow us to switch to a
different plan and avoid re-issuing rows that we've already returned.
But maybe if we adapted the IndexScan plan type so that it adopted a
more page oriented approach internally, it could act like a
bitmapscan. Anyway, that would need some proof that it would work and
sounds like a fair task.(1) sounds more easily possible and plausible. At the moment we have
enable_indexscan = off. If we had something like
plan_cost_weight_indexscan = N, we could selectively increase the cost
of index scans so that they would be less likely to be selected. i.e.
plan_cost_weight_indexscan = 2 would mean an indexscan would need to
be half the cost of any other plan before it was selected. (parameter
name selected so it could apply to all parameter types). The reason to
apply this weighting would be to calculate "risk adjusted cost" not
just estimated cost.--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Another option would be for the bulk insert/update/delete to track the
distribution stats as the operation progresses and if it detects that it
is changing the distribution of data beyond a certain threshold it would
update the pg stats accordingly.
--
Matt Clarkson
Catalyst.Net Limited
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Simon Riggs wrote:
Patch works and improves things, but we're still swamped by the block
accesses via the index.
Which *might* be enough to stop it making the server go unresponsive,
we'll look at the effect of this in the next few days, nice work!
Which brings me back to Mark's original point, which is that we are
x100 times slower in this case and it *is* because the choice of
IndexScan is a bad one for this situation.After some thought on this, I do think we need to do something about
it directly, rather than by tuning infrastructire (as I just
attempted). The root cause here is that IndexScan plans are sensitive
to mistakes in data distribution, much more so than other plan types.The two options, broadly, are to either
1. avoid IndexScans in the planner unless they have a *significantly*
better cost. At the moment we use IndexScans if cost is lowest, even
if that is only by a whisker.2. make IndexScans adaptive so that they switch to other plan types
mid-way through execution.(2) seems fairly hard generically, since we'd have to keep track of
the tids returned from the IndexScan to allow us to switch to a
different plan and avoid re-issuing rows that we've already returned.
But maybe if we adapted the IndexScan plan type so that it adopted a
more page oriented approach internally, it could act like a
bitmapscan. Anyway, that would need some proof that it would work and
sounds like a fair task.(1) sounds more easily possible and plausible. At the moment we have
enable_indexscan = off. If we had something like
plan_cost_weight_indexscan = N, we could selectively increase the cost
of index scans so that they would be less likely to be selected. i.e.
plan_cost_weight_indexscan = 2 would mean an indexscan would need to
be half the cost of any other plan before it was selected. (parameter
name selected so it could apply to all parameter types). The reason to
apply this weighting would be to calculate "risk adjusted cost" not
just estimated cost.
I'm thinking that a variant of (2) might be simpler to inplement:
(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows >
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.
regards
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 7 May 2013 01:23, <mark.kirkwood@catalyst.net.nz> wrote:
I'm thinking that a variant of (2) might be simpler to inplement:
(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows >
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.
Are you loading using COPY? Why not break down the load into chunks?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 07/05/13 18:10, Simon Riggs wrote:
On 7 May 2013 01:23, <mark.kirkwood@catalyst.net.nz> wrote:
I'm thinking that a variant of (2) might be simpler to inplement:
(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows >
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.Are you loading using COPY? Why not break down the load into chunks?
INSERT - but we could maybe workaround by chunking the INSERT. However
that *really* breaks the idea that in SQL you just say what you want,
not how the database engine should do it! And more practically means
that the most obvious and clear way to add your new data has nasty side
effects, and you have to tip toe around muttering secret incantations to
make things work well :-)
I'm still thinking that making postgres smarter about having current
stats for getting the actual optimal plan is the best solution.
Cheers
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Tue, 2013-05-07 at 18:32 +1200, Mark Kirkwood wrote:
On 07/05/13 18:10, Simon Riggs wrote:
On 7 May 2013 01:23, <mark.kirkwood@catalyst.net.nz> wrote:
I'm thinking that a variant of (2) might be simpler to inplement:
(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows >
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.Are you loading using COPY? Why not break down the load into chunks?
INSERT - but we could maybe workaround by chunking the INSERT. However
that *really* breaks the idea that in SQL you just say what you want,
not how the database engine should do it! And more practically means
that the most obvious and clear way to add your new data has nasty side
effects, and you have to tip toe around muttering secret incantations to
make things work well :-)
We also had the same problem with an UPDATE altering the data
distribution in such a way that trivial but frequently executed queries
cause massive server load until auto analyze sorted out the stats.
--
Matt Clarkson
Catalyst.Net Limited
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 7 May 2013 07:32, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 07/05/13 18:10, Simon Riggs wrote:
On 7 May 2013 01:23, <mark.kirkwood@catalyst.net.nz> wrote:
I'm thinking that a variant of (2) might be simpler to inplement:
(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans
to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows >
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.Are you loading using COPY? Why not break down the load into chunks?
INSERT - but we could maybe workaround by chunking the INSERT. However that
*really* breaks the idea that in SQL you just say what you want, not how the
database engine should do it! And more practically means that the most
obvious and clear way to add your new data has nasty side effects, and you
have to tip toe around muttering secret incantations to make things work
well :-)
Yes, we'd need to break up SQL statements into pieces and use external
transaction snapshots to do that.
I'm still thinking that making postgres smarter about having current stats
for getting the actual optimal plan is the best solution.
I agree.
The challenge now is to come up with something that actually works;
most of the ideas have been very vague and ignore the many downsides.
The hard bit is the analysis and balanced thinking, not the
developing.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance