Odd Row Estimates in Query Plan (rows=75)

Started by Don Seileralmost 8 years ago6 messagesgeneral

don@seiler.us

almost 8 years ago

PostgreSQL 9.6.6 on CentOS.

We have a report query that has gone from maybe a few seconds to run to a
few minutes to run since mid-July. Looking at the output of EXPLAIN
ANALYZE, the row count estimates are way off, even though this table was
just analyzed a day or so ago. What's more bizarre to me is that the row
count esimate is *always* 75 for every node of the plan, where the actual
rows is in the hundreds or thousands. This table is one of the busiest
tables in our production database (many inserts and updates). It is
autovacuumed and autoanalyzed a few times per week, although I'm looking to
change it to a nightly manual schedule to avoid daytime autovacuums.

Hash Join (cost=1869142.34..1869146.15 rows=75 width=88) (actual
time=179877.869..179878.011 rows=759 loops=1)
Hash Cond: (stores.pkey = lt.store_pkey)
Buffers: shared hit=1654593 read=331897 dirtied=249
-> Seq Scan on stores (cost=0.00..2.77 rows=77 width=22) (actual
time=0.007..0.023 rows=78 loops=1)
Buffers: shared hit=2
-> Hash (cost=1869141.40..1869141.40 rows=75 width=50) (actual
time=179877.847..179877.847 rows=759 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 73kB
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Subquery Scan on lt (cost=1869138.59..1869141.40 rows=75
width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
Buffers: shared hit=1654591 read=331897 dirtied=249
-> GroupAggregate (cost=1869138.59..1869140.65 rows=75
width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
Group Key: lts.store_pkey, lts.owner,
(date_trunc('minute'::text, lts.date_gifted))
Filter: (count(*) IS NOT NULL)
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Sort (cost=1869138.59..1869138.78 rows=75
width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
Sort Key: lts.store_pkey,
lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
Sort Method: quicksort Memory: 757kB
Buffers: shared hit=1654591 read=331897
dirtied=249
-> Index Scan using gifts_date_added on gifts
lts (cost=0.56..1869136.25 rows=75 width=42) (actual
time=190.657..179870.165 rows=6731 loops=1)
Index Cond: ((date_added > '2018-07-14
11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13
14:14:21'::timestamp without time zone))
Filter: ((date_gifted >= '2018-08-13
11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13
14:14:21'::timestamp without time zone))
Rows Removed by Filter: 938197
Buffers: shared hit=1654591 read=331897
dirtied=249
Planning time: 0.426 ms
Execution time: 179893.894 ms

I don't have a version of this query from prior to this summer, but getting
explain plan for older data from older sandboxes show a similar plan.

Sidenote: I am suggesting that an index be added on the date_gifted field
as that is far more selective and avoids throwing rows away. However I'm
very interested in why every node dealing with the gifts table thinks
rows=75 when the actual is much, much higher. And 75 seems like too round
of a number to be random?

--
Don Seiler
www.seiler.us

Adrian Klaver

adrian.klaver@aklaver.com

almost 8 years ago

In reply to: Don Seiler (#1)

Re: Odd Row Estimates in Query Plan (rows=75)

On 08/15/2018 12:31 PM, Don Seiler wrote:

PostgreSQL 9.6.6 on CentOS.

We have a report query that has gone from maybe a few seconds to run to
a few minutes to run since mid-July. Looking at the output of EXPLAIN
ANALYZE, the row count estimates are way off, even though this table was
just analyzed a day or so ago. What's more bizarre to me is that the row
count esimate is *always* 75 for every node of the plan, where the
actual rows is in the hundreds or thousands. This table is one of the
busiest tables in our production database (many inserts and updates). It
is autovacuumed and autoanalyzed a few times per week, although I'm
looking to change it to a nightly manual schedule to avoid daytime
autovacuums.

Hash Join (cost=1869142.34..1869146.15 rows=75 width=88) (actual
time=179877.869..179878.011 rows=759 loops=1)
Hash Cond: (stores.pkey = lt.store_pkey)
Buffers: shared hit=1654593 read=331897 dirtied=249
-> Seq Scan on stores (cost=0.00..2.77 rows=77 width=22) (actual
time=0.007..0.023 rows=78 loops=1)
Buffers: shared hit=2
-> Hash (cost=1869141.40..1869141.40 rows=75 width=50) (actual
time=179877.847..179877.847 rows=759 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 73kB
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Subquery Scan on lt (cost=1869138.59..1869141.40 rows=75
width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
Buffers: shared hit=1654591 read=331897 dirtied=249
-> GroupAggregate (cost=1869138.59..1869140.65 rows=75
width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
Group Key: lts.store_pkey, lts.owner,
(date_trunc('minute'::text, lts.date_gifted))
Filter: (count(*) IS NOT NULL)
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Sort (cost=1869138.59..1869138.78 rows=75
width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
Sort Key: lts.store_pkey,
lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
Sort Method: quicksort Memory: 757kB
Buffers: shared hit=1654591 read=331897
dirtied=249
-> Index Scan using gifts_date_added on
gifts lts (cost=0.56..1869136.25 rows=75 width=42) (actual
time=190.657..179870.165 rows=6731 loops=1)
Index Cond: ((date_added > '2018-07-14
11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13
14:14:21'::timestamp without time zone))
Filter: ((date_gifted >= '2018-08-13
11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13
14:14:21'::timestamp without time zone))
Rows Removed by Filter: 938197
Buffers: shared hit=1654591
read=331897 dirtied=249
Planning time: 0.426 ms
Execution time: 179893.894 ms

I don't have a version of this query from prior to this summer, but
getting explain plan for older data from older sandboxes show a similar
plan.

I don't have an answer, just a question:

Can you provide the actual query and the table schema?

Sidenote: I am suggesting that an index be added on the date_gifted
field as that is far more selective and avoids throwing rows away.
However I'm very interested in why every node dealing with the gifts
table thinks rows=75 when the actual is much, much higher. And 75 seems
like too round of a number to be random?

--
Don Seiler
www.seiler.us <http://www.seiler.us>

--
Adrian Klaver
adrian.klaver@aklaver.com

Don Seiler

don@seiler.us

almost 8 years ago

In reply to: Adrian Klaver (#2)

Re: Odd Row Estimates in Query Plan (rows=75)

Here's the query, obfuscated manually by me:

SELECT
'Foo' as system_function,
stores.name as store,
lt.owner,
lt.minute_of_day,
lt.records
FROM
foo.stores
LEFT OUTER JOIN
(SELECT
lts.store_pkey,
lts.owner,
date_trunc('minute', lts.date_gifted) as minute_of_day,
count(*) as records
FROM foo.gifts lts
WHERE
lts.date_added > '2017-07-14 11:13:05'
AND lts.date_added < '2017-08-13 14:14:21'
AND lts.date_gifted >= '2017-08-13 11:13:05'
AND lts.date_gifted < '2017-08-13 14:14:21'
GROUP BY 1,2,3
ORDER BY 1
) lt ON lt.store_pkey = stores.pkey
WHERE lt.records IS NOT NULL;

The foo.gifts table is pretty much the core table of our database. It's big
and very active. There is an index on date_added but not yet on
date_gifted.

I'm working to re-write the query while the dev sees if we even need this
query anymore.

On Wed, Aug 15, 2018 at 2:39 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 08/15/2018 12:31 PM, Don Seiler wrote:

PostgreSQL 9.6.6 on CentOS.

We have a report query that has gone from maybe a few seconds to run to a
few minutes to run since mid-July. Looking at the output of EXPLAIN
ANALYZE, the row count estimates are way off, even though this table was
just analyzed a day or so ago. What's more bizarre to me is that the row
count esimate is *always* 75 for every node of the plan, where the actual
rows is in the hundreds or thousands. This table is one of the busiest
tables in our production database (many inserts and updates). It is
autovacuumed and autoanalyzed a few times per week, although I'm looking to
change it to a nightly manual schedule to avoid daytime autovacuums.

Hash Join (cost=1869142.34..1869146.15 rows=75 width=88) (actual
time=179877.869..179878.011 rows=759 loops=1)
Hash Cond: (stores.pkey = lt.store_pkey)
Buffers: shared hit=1654593 read=331897 dirtied=249
-> Seq Scan on stores (cost=0.00..2.77 rows=77 width=22) (actual
time=0.007..0.023 rows=78 loops=1)
Buffers: shared hit=2
-> Hash (cost=1869141.40..1869141.40 rows=75 width=50) (actual
time=179877.847..179877.847 rows=759 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 73kB
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Subquery Scan on lt (cost=1869138.59..1869141.40 rows=75
width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
Buffers: shared hit=1654591 read=331897 dirtied=249
-> GroupAggregate (cost=1869138.59..1869140.65 rows=75
width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
Group Key: lts.store_pkey, lts.owner,
(date_trunc('minute'::text, lts.date_gifted))
Filter: (count(*) IS NOT NULL)
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Sort (cost=1869138.59..1869138.78 rows=75
width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
Sort Key: lts.store_pkey,
lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
Sort Method: quicksort Memory: 757kB
Buffers: shared hit=1654591 read=331897
dirtied=249
-> Index Scan using gifts_date_added on
gifts lts (cost=0.56..1869136.25 rows=75 width=42) (actual
time=190.657..179870.165 rows=6731 loops=1)
Index Cond: ((date_added > '2018-07-14
11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13
14:14:21'::timestamp without time zone))
Filter: ((date_gifted >= '2018-08-13
11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13
14:14:21'::timestamp without time zone))
Rows Removed by Filter: 938197
Buffers: shared hit=1654591 read=331897
dirtied=249
Planning time: 0.426 ms
Execution time: 179893.894 ms

I don't have a version of this query from prior to this summer, but
getting explain plan for older data from older sandboxes show a similar
plan.

I don't have an answer, just a question:

Can you provide the actual query and the table schema?

Sidenote: I am suggesting that an index be added on the date_gifted field
as that is far more selective and avoids throwing rows away. However I'm
very interested in why every node dealing with the gifts table thinks
rows=75 when the actual is much, much higher. And 75 seems like too round
of a number to be random?

--
Don Seiler
www.seiler.us <http://www.seiler.us>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Don Seiler
www.seiler.us

Adrian Klaver

adrian.klaver@aklaver.com

almost 8 years ago

In reply to: Don Seiler (#3)

Re: Odd Row Estimates in Query Plan (rows=75)

On 08/15/2018 01:03 PM, Don Seiler wrote:

Here's the query, obfuscated manually by me:

SELECT
'Foo' as system_function,
stores.name <http://stores.name> as store,
lt.owner,
lt.minute_of_day,
lt.records
FROM
foo.stores
LEFT OUTER JOIN
(SELECT
lts.store_pkey,
lts.owner,
date_trunc('minute', lts.date_gifted) as minute_of_day,
count(*) as records
FROM foo.gifts lts
WHERE
lts.date_added > '2017-07-14 11:13:05'
AND lts.date_added < '2017-08-13 14:14:21'
AND lts.date_gifted >= '2017-08-13 11:13:05'
AND lts.date_gifted < '2017-08-13 14:14:21'
GROUP BY 1,2,3
ORDER BY 1
) lt ON lt.store_pkey = stores.pkey
WHERE lt.records IS NOT NULL;

The foo.gifts table is pretty much the core table of our database. It's
big and very active. There is an index on date_added but not yet on
date_gifted.

I'm working to re-write the query while the dev sees if we even need
this query anymore.

I agree the issue seems to be in the index/filter of the dates. That
leads me to another question:

Why in:

WHERE
lts.date_added > '2017-07-14 11:13:05'
AND
lts.date_added < '2017-08-13 14:14:21'
AND
lts.date_gifted >= '2017-08-13 11:13:05'
AND
lts.date_gifted < '2017-08-13 14:14:21'

lts.date_added > '2017-07-14 11:13:05'

and

lts.date_gifted >= '2017-08-13 11:13:05'
?

In other words one '>' and the other '>=' ?

--
Adrian Klaver
adrian.klaver@aklaver.com

Don Seiler

don@seiler.us

almost 8 years ago

In reply to: Adrian Klaver (#4)

Re: Odd Row Estimates in Query Plan (rows=75)

On Wed, Aug 15, 2018 at 3:31 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

lts.date_added > '2017-07-14 11:13:05'

and

lts.date_gifted >= '2017-08-13 11:13:05'
?

In other words one '>' and the other '>=' ?

The date_added filters were added just to use that index and with a broad
range, since there isn't a filter on date_gifted. You'll notice the
date_added range is 30 days but the date_gifted range is 3 hours. We really
only care about date_gifted but at this time there isn't an index on that
field.

Even as I experiment with some query rewrites, the EXPLAIN ANALYZE always
says rows=75. I'm *very* curious to see why it is using that value.

Don.

--
Don Seiler
www.seiler.us

Laurenz Albe

laurenz.albe@cybertec.at

almost 8 years ago

In reply to: Don Seiler (#1)

Re: Odd Row Estimates in Query Plan (rows=75)

Don Seiler wrote:

We have a report query that has gone from maybe a few seconds to run to a few minutes to run since mid-July.
Looking at the output of EXPLAIN ANALYZE, the row count estimates are way off, even though this table was
just analyzed a day or so ago. What's more bizarre to me is that the row count esimate is *always* 75 for
every node of the plan, where the actual rows is in the hundreds or thousands. This table is one of the
busiest tables in our production database (many inserts and updates). It is autovacuumed and autoanalyzed
a few times per week, although I'm looking to change it to a nightly manual schedule to avoid daytime autovacuums.

Hash Join (cost=1869142.34..1869146.15 rows=75 width=88) (actual time=179877.869..179878.011 rows=759 loops=1)
Hash Cond: (stores.pkey = lt.store_pkey)
Buffers: shared hit=1654593 read=331897 dirtied=249
-> Seq Scan on stores (cost=0.00..2.77 rows=77 width=22) (actual time=0.007..0.023 rows=78 loops=1)
Buffers: shared hit=2
-> Hash (cost=1869141.40..1869141.40 rows=75 width=50) (actual time=179877.847..179877.847 rows=759 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 73kB
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Subquery Scan on lt (cost=1869138.59..1869141.40 rows=75 width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
Buffers: shared hit=1654591 read=331897 dirtied=249
-> GroupAggregate (cost=1869138.59..1869140.65 rows=75 width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
Group Key: lts.store_pkey, lts.owner, (date_trunc('minute'::text, lts.date_gifted))
Filter: (count(*) IS NOT NULL)
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Sort (cost=1869138.59..1869138.78 rows=75 width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
Sort Key: lts.store_pkey, lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
Sort Method: quicksort Memory: 757kB
Buffers: shared hit=1654591 read=331897 dirtied=249
-> Index Scan using gifts_date_added on gifts lts (cost=0.56..1869136.25 rows=75 width=42) (actual time=190.657..179870.165 rows=6731 loops=1)
Index Cond: ((date_added > '2018-07-14 11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13 14:14:21'::timestamp without time zone))
Filter: ((date_gifted >= '2018-08-13 11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13 14:14:21'::timestamp without time zone))
Rows Removed by Filter: 938197
Buffers: shared hit=1654591 read=331897 dirtied=249
Planning time: 0.426 ms
Execution time: 179893.894 ms

I don't have a version of this query from prior to this summer, but getting explain plan for older data from
older sandboxes show a similar plan.

Sidenote: I am suggesting that an index be added on the date_gifted field as that is far more selective and avoids
throwing rows away. However I'm very interested in why every node dealing with the gifts table thinks rows=75
when the actual is much, much higher. And 75 seems like too round of a number to be random?

Yes, I would say that adding an index on "date_gifted" would help. You may end
up with two bitmap index scans that get combined.
Make sure "work_mem" is big enough to avoid lossy bitmaps (indicated in the plan).

About the misestimate:

You could try running ANALYZE with an increased "default_statistics_target" and see
if that changes the estimate.
If yes, then maybe you should increase statistics for that table or (seing that you are
querying current values) you should collect statistics more often.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com