Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT

Started by Steve McLellanover 16 years ago5 messagesbugs
Jump to latest
#1Steve McLellan
smclellan@mintel.com

"Steven McLellan" <smclellan@mintel.com> writes:

I've found what appears to be a bug seriously affecting performance

running

a particular query using a named cursor versus running it as a simple
SELECT.

You haven't shown us a plan for the cursor case, but I'm thinking the
issue here is that Postgres prefers fast-start plans for cursors, on
the theory that if you're using a cursor you probably care more about
incremental fetching than the total elapsed time. As of 8.4 you can
twiddle the strength of that preference via cursor_tuple_fraction.

http://www.postgresql.org/docs/8.4/static/runtime-config-query.html#GUC-CURSOR-TUPLE-FRACTION

regards, tom lane

Thanks! Installing 8.4 and setting cursor_tuple_fraction to 1.0 does seem to
force it to execute in the same time as not using the cursor, and we'll
probably go with this solution (the only reason we're using cursors is to
avoid retrieving vast result sets through psycopg2/fetchmany). Your
explanation makes sense, and I'm curious to see why this particular query
ends up being so different, but I couldn't figure out how to run the explain
- the DECLARE syntax doesn't seem to allow it. Do I need to do it through
plpgsql?

Thanks again,

Steve McLellan

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve McLellan (#1)
Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT

Steve McLellan <smclellan@mintel.com> writes:

Thanks! Installing 8.4 and setting cursor_tuple_fraction to 1.0 does seem to
force it to execute in the same time as not using the cursor, and we'll
probably go with this solution (the only reason we're using cursors is to
avoid retrieving vast result sets through psycopg2/fetchmany). Your
explanation makes sense, and I'm curious to see why this particular query
ends up being so different, but I couldn't figure out how to run the explain
- the DECLARE syntax doesn't seem to allow it.

"EXPLAIN [ANALYZE] DECLARE x CURSOR ..." works for me.

regards, tom lane

#3Steve McLellan
smclellan@mintel.com
In reply to: Steve McLellan (#1)

Steve McLellan <smclellan@mintel.com> writes:

Thanks! Installing 8.4 and setting cursor_tuple_fraction to 1.0 does seem

to

force it to execute in the same time as not using the cursor, and we'll
probably go with this solution (the only reason we're using cursors is to
avoid retrieving vast result sets through psycopg2/fetchmany). Your
explanation makes sense, and I'm curious to see why this particular query
ends up being so different, but I couldn't figure out how to run the

explain

- the DECLARE syntax doesn't seem to allow it.

"EXPLAIN [ANALYZE] DECLARE x CURSOR ..." works for me.

regards, tom lane

Sorry, I should've tried that. The two explains are below; without cursor
then with cursor. I don't know enough to reliably say whether there's
anything wrong with either (we use default_statistics_target=100 although
the estimates don't look right for some of the query fragments), but they're
certainly very different.

========
WITHOUT CURSOR:

Hash Join (cost=33364.36..46606.90 rows=1 width=4) (actual
time=795.690..833.577 rows=14 loops=1)
Hash Cond: (fact_table.originator_key =
originator_dimension.originator_key)
-> Nested Loop (cost=33339.89..46577.20 rows=1392 width=8) (actual
time=792.676..832.203 rows=186 loops=1)
-> Nested Loop (cost=33339.89..40678.24 rows=48 width=8) (actual
time=792.653..830.017 rows=110 loops=1)
-> Hash Join (cost=33339.89..40641.84 rows=66 width=12)
(actual time=467.029..790.075 rows=3884 loops=1)
Hash Cond: (record_type_mv.record_key =
record_dimension.record_key)
-> Seq Scan on record_type_mv (cost=0.00..7076.20
rows=30012 width=4) (actual time=0.920..270.790 rows=31056 loops=1)
Filter: (("Type")::text = ANY (('{"Cat 1 -
yyy","Cat 1 - www","Cat 1 - zzz"}'::character varying[])::text[]))
-> Hash (cost=33287.67..33287.67 rows=4178 width=8)
(actual time=447.819..447.819 rows=10203 loops=1)
-> Bitmap Heap Scan on record_dimension
(cost=18214.01..33287.67 rows=4178 width=8) (actual time=371.277..428.278
rows=10203 loops=1)
Recheck Cond: (("Category" = 'Cat 1'::text)
AND ("Sector" = 'Sector I'::text) AND ("Dataset - A" OR "Dataset - C" OR
"Dataset - B"))
Filter: (("Dataset - A" OR "Dataset - C" OR
"Dataset - B") AND ("Country" = 'USA'::text))
-> BitmapAnd (cost=18214.01..18214.01
rows=4623 width=0) (actual time=367.336..367.336 rows=0 loops=1)
-> Bitmap Index Scan on "Category"
(cost=0.00..3091.75 rows=123623 width=0) (actual time=53.713..53.713
rows=124053 loops=1)
Index Cond: ("Category" = 'Cat
1'::text)
-> Bitmap Index Scan on "Sector"
(cost=0.00..7525.37 rows=327577 width=0) (actual time=129.610..129.610
rows=328054 loops=1)
Index Cond: ("Sector" = 'Sector
I'::text)
-> BitmapOr (cost=7594.30..7594.30
rows=410371 width=0) (actual time=128.983..128.983 rows=0 loops=1)
-> Bitmap Index Scan on
"Dataset - A" (cost=0.00..550.30 rows=29579 width=0) (actual
time=11.393..11.393 rows=30016 loops=1)
Index Cond: ("Dataset -
A" = true)
-> Bitmap Index Scan on
"Dataset - C" (cost=0.00..6981.18 rows=377696 width=0) (actual
time=116.306..116.306 rows=380788 loops=1)
Index Cond: ("Dataset -
C" = true)
-> Bitmap Index Scan on
"Dataset - B" (cost=0.00..59.69 rows=3097 width=0) (actual
time=1.272..1.272 rows=3116 loops=1)
Index Cond: ("Dataset -
B" = true)
-> Index Scan using date_dimension_pkey on date_dimension
"date_dimension_Published Date" (cost=0.00..0.54 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=3884)
Index Cond: ("date_dimension_Published Date".date_key =
record_dimension.published_date_key)
Filter: ("date_dimension_Published Date"."SQL Date" >=
(now() - '6 mons'::interval))
-> Index Scan using record_date on instance_core_fact fact_table
(cost=0.00..122.48 rows=33 width=12) (actual time=0.007..0.011 rows=2
loops=110)
Index Cond: (fact_table.record_key =
record_dimension.record_key)
-> Hash (cost=24.45..24.45 rows=1 width=4) (actual time=1.004..1.004
rows=169 loops=1)
-> Bitmap Heap Scan on originator_dimension (cost=20.44..24.45
rows=1 width=4) (actual time=0.313..0.662 rows=169 loops=1)
Recheck Cond: (("Area" = ANY ('{National,"Phoenix,
AZ"}'::text[])) AND ("Originator Type" = 'NO DATA'::text))
-> BitmapAnd (cost=20.44..20.44 rows=1 width=0) (actual
time=0.299..0.299 rows=0 loops=1)
-> Bitmap Index Scan on publication_market_area
(cost=0.00..8.95 rows=54 width=0) (actual time=0.165..0.165 rows=169
loops=1)
Index Cond: ("Area" = ANY ('{National,"Phoenix,
AZ"}'::text[]))
-> Bitmap Index Scan on "Originator Type"
(cost=0.00..11.24 rows=390 width=0) (actual time=0.121..0.121 rows=426
loops=1)
Index Cond: ("Originator Type" = 'NO DATA'::text)
===================
WITH CURSOR:

Nested Loop (cost=0.00..145138.44 rows=1 width=4) (actual
time=1508.556..556939.884 rows=14 loops=1)
Join Filter: (fact_table.originator_key =
originator_dimension.originator_key)
-> Index Scan using "Originator Type" on originator_dimension
(cost=0.00..125.29 rows=1 width=4) (actual time=26.067..41.906 rows=169
loops=1)
Index Cond: ("Originator Type" = 'NO DATA'::text)
Filter: ("Area" = ANY ('{National,"Phoenix, AZ"}'::text[]))
-> Nested Loop (cost=0.00..144995.75 rows=1392 width=8) (actual
time=1081.426..3294.915 rows=186 loops=169)
-> Nested Loop (cost=0.00..139096.78 rows=48 width=8) (actual
time=1081.108..3290.440 rows=110 loops=169)
-> Nested Loop (cost=0.00..124922.31 rows=3026 width=4)
(actual time=1061.766..3284.424 rows=380 loops=169)
-> Seq Scan on record_dimension (cost=0.00..122618.12
rows=4178 width=8) (actual time=0.427..3177.623 rows=10203 loops=169)
Filter: (("Dataset - A" OR "Dataset - C" OR
"Dataset - B") AND ("Sector" = 'Sector I'::text) AND ("Category" = 'Cat
1'::text) AND ("Country" = 'USA'::text))
-> Index Scan using date_dimension_pkey on
date_dimension "date_dimension_Published Date" (cost=0.00..0.54 rows=1
width=4) (actual time=0.007..0.007 rows=0 loops=1724307)
Index Cond: ("date_dimension_Published
Date".date_key = record_dimension.published_date_key)
Filter: ("date_dimension_Published Date"."SQL
Date" >= (now() - '6 mons'::interval))
-> Index Scan using record_type_mv_pkey on record_type_mv
(cost=0.00..4.67 rows=1 width=4) (actual time=0.011..0.012 rows=0
loops=64220)
Index Cond: (record_type_mv.record_key =
record_dimension.record_key)
Filter: ((record_type_mv."Type")::text = ANY (('{"Cat 1
- yyy","Cat 1 - www","Cat 1 - zzz"}'::character varying[])::text[]))
-> Index Scan using record_date on instance_core_fact fact_table
(cost=0.00..122.48 rows=33 width=12) (actual time=0.024..0.032 rows=2
loops=18590)
Index Cond: (fact_table.record_key =
record_dimension.record_key)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve McLellan (#3)
Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT

Steve McLellan <smclellan@mintel.com> writes:

Sorry, I should've tried that. The two explains are below; without cursor
then with cursor. I don't know enough to reliably say whether there's
anything wrong with either (we use default_statistics_target=100 although
the estimates don't look right for some of the query fragments), but they're
certainly very different.

Right, so the cursor plan is one that is estimated to take 3x longer,
but can deliver rows starting right away, where the other plan will have
a long delay and then spit everything out in a bunch. It's not a bad
tradeoff, if the estimate were right. The killer mistake is here:

-> Index Scan using "Originator Type" on originator_dimension
(cost=0.00..125.29 rows=1 width=4) (actual time=26.067..41.906 rows=169
loops=1)
Index Cond: ("Originator Type" = 'NO DATA'::text)
Filter: ("Area" = ANY ('{National,"Phoenix, AZ"}'::text[]))

Since that's on the outside of a nestloop, the 169x rowcount error
translates directly to a 169x error in the total runtime estimate ---
and it looks like that's not very far at all from the real ratio,
so the other estimates are pretty good.

I'm not sure whether you'd get any real improvement from a further
increase in stats target, though that's certainly an easy thing to
try. What this looks like to me is the longstanding problem of having
correlations between the distributions of two different columns
(here, Area and Originator Type), which the planner doesn't know about.
So its estimate of the combined probability of meeting the conditions
is off. You could try checking estimated vs real rowcount for the
area and originator conditions separately to see if that's the case
or not. If either one is off by itself then a stats increase for that
column would help, if not there's not much to be done except lean on
the cursor target knob ...

regards, tom lane

#5Steve McLellan
smclellan@mintel.com
In reply to: Steve McLellan (#3)

Since that's on the outside of a nestloop, the 169x rowcount error
translates directly to a 169x error in the total runtime estimate ---
and it looks like that's not very far at all from the real ratio,
so the other estimates are pretty good.

You're correct - the query's exercising a very small subset of that table.
The table has 400,000 rows, of which only 400 have Originator Type = 'NO
DATA', and ALL the rows with those two Areas have Originator Type = 'NO
DATA'. As you say, increasing the stats target may not make much difference
because of the size of this sample when compared to the overall data set (in
fact the production servers have it set to 700 and it makes little
real-world difference), so we'll go with your initial suggestion and upgrade
to 8.4. Thanks again; this has been an good set of lessons to learn.

Steve