min/max performance inequality.

Started by Pawel Veselovover 11 years ago4 messagesgeneral

pawel.veselov@gmail.com

over 11 years ago

Hi.

I was wondering how come there is such a drastic difference between finding
max and min. Seems like "index scan backwards" is really bad... The table
is freshly re-indexed just in case. I added a count(*) in there, forcing
the seq scan, and it's even better than the backwards index scan...

db=> EXPLAIN ANALYZE select min(rowdate) from r_agrio where blockid = 4814;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=495.89..495.90 rows=1 width=0) (actual time=24.149..24.150
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..495.89 rows=1 width=13) (actual
time=24.139..24.140 rows=1 loops=1)
-> Index Scan using rowdate_r_agrio on r_agrio
(cost=0.00..222160.24 rows=448 width=13) (actual time=24.137..24.137
rows=1 loops=1)
Index Cond: ((rowdate)::text IS NOT NULL)
Filter: (blockid = 4814::numeric)
Total runtime: 24.186 ms
(7 rows)

db=> EXPLAIN ANALYZE select max(rowdate) from r_agrio where blockid = 4814;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=495.89..495.90 rows=1 width=0) (actual time=926.032..926.033
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..495.89 rows=1 width=13) (actual
time=926.019..926.021 rows=1 loops=1)
-> Index Scan Backward using rowdate_r_agrio on r_agrio
(cost=0.00..222160.24 rows=448 width=13) (actual time=926.017..926.017
rows=1 loops=1)
Index Cond: ((rowdate)::text IS NOT NULL)
Filter: (blockid = 4814::numeric)
Total runtime: 926.070 ms
(7 rows)

db=> EXPLAIN ANALYZE select count(*), max(rowdate) from r_agrio where
blockid = 4814;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=31585.18..31585.19 rows=1 width=13) (actual
time=461.079..461.080 rows=1 loops=1)
-> Seq Scan on r_agrio (cost=0.00..31582.94 rows=448 width=13) (actual
time=8.912..460.999 rows=15 loops=1)
Filter: (blockid = 4814::numeric)
Total runtime: 461.134 ms
(4 rows)

Jeff Janes

jeff.janes@gmail.com

over 11 years ago

In reply to: Pawel Veselov (#1)

Re: min/max performance inequality.

On Wed, Jan 7, 2015 at 3:00 PM, Pawel Veselov <pawel.veselov@gmail.com>
wrote:

Hi.

I was wondering how come there is such a drastic difference between
finding max and min. Seems like "index scan backwards" is really bad... The
table is freshly re-indexed just in case. I added a count(*) in there,
forcing the seq scan, and it's even better than the backwards index scan...

db=> EXPLAIN ANALYZE select min(rowdate) from r_agrio where blockid = 4814;

It crawls the data in rowdate order (either forward or reverse) until it
finds the first 4814. Crawling forward it finds 4814 very early. Crawling
backwards it has to pass through a bunch of non-4814 before it finds the
first 4814.

This fact doesn't show up in your EXPLAIN ANALYZE, but if you used a more
modern version of postgresql (9.2 or above) there would be another line for
"Rows Removed by Filter:" which would tell the story of what is going on.

If you have a composite index on (blockid, rowdate), it would help make
this much faster, as it can go directly to the desired row.

Cheers,

Jeff

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Pawel Veselov (#1)

Re: min/max performance inequality.

Pawel Veselov <pawel.veselov@gmail.com> writes:

I was wondering how come there is such a drastic difference between finding
max and min. Seems like "index scan backwards" is really bad...

It's probably an artifact of your data distribution, ie, the "blockid =
4814" condition is skipping lots of rows at one end of the index but few
or none at the other.

If you're concerned about the performance of this type of query, an index
on (blockid, rowdate) would work a lot better than the ones you've
provided.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Pawel Veselov

pawel.veselov@gmail.com

over 11 years ago

In reply to: Jeff Janes (#2)

Re: min/max performance inequality.

Thanks Jeff (and Tom)

On Wed, Jan 7, 2015 at 3:34 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Jan 7, 2015 at 3:00 PM, Pawel Veselov <pawel.veselov@gmail.com>
wrote:

Hi.

I was wondering how come there is such a drastic difference between
finding max and min. Seems like "index scan backwards" is really bad... The
table is freshly re-indexed just in case. I added a count(*) in there,
forcing the seq scan, and it's even better than the backwards index scan...

db=> EXPLAIN ANALYZE select min(rowdate) from r_agrio where blockid =
4814;

It crawls the data in rowdate order (either forward or reverse) until it
finds the first 4814. Crawling forward it finds 4814 very early. Crawling
backwards it has to pass through a bunch of non-4814 before it finds the
first 4814.

This fact doesn't show up in your EXPLAIN ANALYZE, but if you used a more
modern version of postgresql (9.2 or above) there would be another line for
"Rows Removed by Filter:" which would tell the story of what is going on.

Yeah, there is 10x more rows on when going backwards

If you have a composite index on (blockid, rowdate), it would help make
this much faster, as it can go directly to the desired row.

That does help a lot. So, when does postgres use a more-dimensional index,
even if not all dimensions are engaged (as there is an index that involves
those 2 fields, and more)? I definitely see it do that in some cases...

Even with that index, however, there is still a good difference in time
(the interest is theoretical at this point, as I found a better way to
extract that data anyway).

On a newer db.

db=> EXPLAIN analyze select min(rowdate) from r_agrio where blockid = 4814;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=521.54..521.55 rows=1 width=0) (actual time=39.770..39.770
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.55..521.54 rows=1 width=13) (actual
time=39.765..39.766 rows=1 loops=1)
-> Index Scan using rowdate_r_agrio on r_agrio
(cost=0.55..303738.47 rows=583 width=13) (actual time=39.763..39.763
rows=1 loops=1)
Index Cond: ((rowdate)::text IS NOT NULL)
Filter: (blockid = 4814::numeric)
Rows Removed by Filter: 37246
Total runtime: 39.798 ms
(8 rows)

db=> EXPLAIN analyze select max(rowdate) from r_agrio where blockid = 4814;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=521.54..521.55 rows=1 width=0) (actual
time=1497.377..1497.378 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.55..521.54 rows=1 width=13) (actual
time=1497.371..1497.372 rows=1 loops=1)
-> Index Scan Backward using rowdate_r_agrio on r_agrio
(cost=0.55..303738.47 rows=583 width=13) (actual time=1497.370..1497.370
rows=1 loops=1)
Index Cond: ((rowdate)::text IS NOT NULL)
Filter: (blockid = 4814::numeric)
Rows Removed by Filter: 317739
Total runtime: 1497.407 ms
(8 rows)
db=> CREATE INDEX concurrently xxx on r_agrio(rowdate,blockid);
CREATE INDEX

db=> EXPLAIN analyze select min(rowdate) from r_agrio where blockid = 4814;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
Result (cost=85.05..85.06 rows=1 width=0) (actual time=17.585..17.585
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..85.05 rows=1 width=13) (actual
time=17.580..17.581 rows=1 loops=1)
-> Index Only Scan using xxx on r_agrio (cost=0.43..37827.09
rows=447 width=13) (actual time=17.578..17.578 rows=1 loops=1)
Index Cond: ((rowdate IS NOT NULL) AND (blockid =
4814::numeric))
Heap Fetches: 0
Total runtime: 17.616 ms
(7 rows)

db=> EXPLAIN analyze select max(rowdate) from r_agrio where blockid = 4814;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=85.04..85.05 rows=1 width=0) (actual time=89.141..89.142
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..85.04 rows=1 width=13) (actual
time=89.135..89.136 rows=1 loops=1)
-> Index Only Scan Backward using xxx on r_agrio
(cost=0.43..37823.09 rows=447 width=13) (actual time=89.134..89.134 rows=1
loops=1)
Index Cond: ((rowdate IS NOT NULL) AND (blockid =
4814::numeric))
Heap Fetches: 1
Total runtime: 89.173 ms
(7 rows)

Show quoted text

Cheers,