Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Started by Markabout 7 years ago14 messages
#1Mark
mwchambers@gmail.com

Hi All,

I'm seeing some very slow queries and it looks like the query planner is
deciding to do a 'Nested Loop Left Join'
which is slow. When I SET enable_nestloop=OFF for testing it does a 'Hash
Left Join' which is much faster.

I think I can see the cause of the problem in the examples below. I can't
make sense of the statistics being written
by ANALYSE or the planners row estimates.

Can anyone help me understand....

- Why the row estimate I get in the query below for school_id = 36 is 1 ?
( I expect it to be higher)
- Why does '1' appear in the most_common_vals when it is actually the least
common value.
- Why doesn't 36 appear in the most_common_vals (it is more common than 1)
- Does the analyse output below mean that it only scanned 51538 of 65463
rows in the table? Is school_id 36 just being missed in the sample? (This
happens when the analyse is repeated )

Any help with understanding what's happening here would be much
appreciated.

I hope I've provided enough information below.

Thanks,

Mark

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.087 ms
Execution time: 2.706 ms
(4 rows)

db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
from pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname |
most_common_vals
| histogram_bounds | n_distinct
----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45
(1 row)

db=> select count(distinct(school_id)) from common_student ;
count
-------
55
(1 row)

db=> alter table common_student alter column school_id set statistics
100000;
WARNING: lowering statistics target to 10000
ALTER TABLE
db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65463 live
rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows
ANALYZE
db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
from pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname |
most_common_vals
| histogram_bounds | n_distinct
----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45
(1 row)

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.334 ms
Execution time: 6.542 ms
(4 rows)

db=> select school_id, count(*) from common_student group by school_id
order by count(*) limit 6 ;
school_id | count
-----------+-------
1 | 50
88 | 161
53 | 252
94 | 422
31 | 434
68 | 454
(6 rows)

dvpjxbzc=> select school_id, count(*) from common_student where school_id =
36 group by school_id ;
school_id | count
-----------+-------
36 | 1388
(1 row)

db=> explain analyse select * from common_student where school_id = 1 ;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=4.91..243.76 rows=64 width=385)
(actual time=0.128..0.267 rows=50 loops=1)
Recheck Cond: (school_id = 1)
Heap Blocks: exact=16
-> Bitmap Index Scan on "unique common_student" (cost=0.00..4.90
rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
Index Cond: (school_id = 1)
Planning time: 0.177 ms
Execution time: 0.414 ms
(7 rows)

db=> select VERSION();
version

-----------------------------------------------------------------------------
PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2, 64-bit
(1 row)

#2Kumar, Virendra
Virendra.Kumar@guycarp.com
In reply to: Mark (#1)
RE: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi Mark,

Can you try setting up stats target for school_id column and analyze table and see where it takes, something like:
--
ALTER table common_student ALTER COLUMN school_id SET STATISTICS 10000;
ANALYZE common_stundent;

Regards,
Virendra

From: Mark [mailto:mwchambers@gmail.com]
Sent: Friday, December 21, 2018 11:39 AM
To: pgsql-general@lists.postgresql.org
Subject: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi All,

I'm seeing some very slow queries and it looks like the query planner is deciding to do a 'Nested Loop Left Join'
which is slow. When I SET enable_nestloop=OFF for testing it does a 'Hash Left Join' which is much faster.

I think I can see the cause of the problem in the examples below. I can't make sense of the statistics being written
by ANALYSE or the planners row estimates.

Can anyone help me understand....

- Why the row estimate I get in the query below for school_id = 36 is 1 ? ( I expect it to be higher)
- Why does '1' appear in the most_common_vals when it is actually the least common value.
- Why doesn't 36 appear in the most_common_vals (it is more common than 1)
- Does the analyse output below mean that it only scanned 51538 of 65463 rows in the table? Is school_id 36 just being missed in the sample? (This happens when the analyse is repeated )

Any help with understanding what's happening here would be much appreciated.

I hope I've provided enough information below.

Thanks,

Mark

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student (cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.087 ms
Execution time: 2.706 ms
(4 rows)

db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct from pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname | most_common_vals | histogram_bounds | n_distinct
----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id | {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1} | | 45
(1 row)

db=> select count(distinct(school_id)) from common_student ;
count
-------
55
(1 row)

db=> alter table common_student alter column school_id set statistics 100000;
WARNING: lowering statistics target to 10000
ALTER TABLE
db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65463 live rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows
ANALYZE
db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct from pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname | most_common_vals | histogram_bounds | n_distinct
----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id | {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1} | | 45
(1 row)

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student (cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.334 ms
Execution time: 6.542 ms
(4 rows)

db=> select school_id, count(*) from common_student group by school_id order by count(*) limit 6 ;
school_id | count
-----------+-------
1 | 50
88 | 161
53 | 252
94 | 422
31 | 434
68 | 454
(6 rows)

dvpjxbzc=> select school_id, count(*) from common_student where school_id = 36 group by school_id ;
school_id | count
-----------+-------
36 | 1388
(1 row)

db=> explain analyse select * from common_student where school_id = 1 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=4.91..243.76 rows=64 width=385) (actual time=0.128..0.267 rows=50 loops=1)
Recheck Cond: (school_id = 1)
Heap Blocks: exact=16
-> Bitmap Index Scan on "unique common_student" (cost=0.00..4.90 rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
Index Cond: (school_id = 1)
Planning time: 0.177 ms
Execution time: 0.414 ms
(7 rows)

db=> select VERSION();
version
-----------------------------------------------------------------------------
PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2, 64-bit
(1 row)

________________________________

This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.

#3Mark
mwchambers@gmail.com
In reply to: Kumar, Virendra (#2)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi Virendra,

Thanks for the reply.

You must have missed it, but I've already done that (see my original
email). The stats target for that column is already set to 10000.

db=> SELECT attstattarget FROM pg_attribute WHERE attrelid =
'public.common_student'::regclass AND attname = 'school_id';
attstattarget
---------------
10000
(1 row)

Mark

On Fri, 21 Dec 2018 at 18:39 Kumar, Virendra <Virendra.Kumar@guycarp.com>
wrote:

Show quoted text

Hi Mark,

Can you try setting up stats target for school_id column and analyze table
and see where it takes, something like:

--

ALTER table common_student ALTER COLUMN school_id SET STATISTICS 10000;

ANALYZE common_stundent;

Regards,

Virendra

*From:* Mark [mailto:mwchambers@gmail.com]
*Sent:* Friday, December 21, 2018 11:39 AM
*To:* pgsql-general@lists.postgresql.org
*Subject:* Query planner / Analyse statistics bad estimate rows=1 with
maximum statistics 10000 on PostgreSQL 10.2

Hi All,

I'm seeing some very slow queries and it looks like the query planner is
deciding to do a 'Nested Loop Left Join'

which is slow. When I SET enable_nestloop=OFF for testing it does a 'Hash
Left Join' which is much faster.

I think I can see the cause of the problem in the examples below. I can't
make sense of the statistics being written

by ANALYSE or the planners row estimates.

Can anyone help me understand....

- Why the row estimate I get in the query below for school_id = 36 is 1 ?
( I expect it to be higher)

- Why does '1' appear in the most_common_vals when it is actually the
least common value.

- Why doesn't 36 appear in the most_common_vals (it is more common than 1)

- Does the analyse output below mean that it only scanned 51538 of 65463
rows in the table? Is school_id 36 just being missed in the sample? (This
happens when the analyse is repeated )

Any help with understanding what's happening here would be much
appreciated.

I hope I've provided enough information below.

Thanks,

Mark

db=> explain analyse select * from common_student where school_id = 36 ;

QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------

Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388
loops=1)

Index Cond: (school_id = 36)

Planning time: 0.087 ms

Execution time: 2.706 ms

(4 rows)

db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
from pg_stats where attname='school_id' and tablename='common_stude

nt';

tablename | attname |
most_common_vals
| histogram_bounds | n_distinct

----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------

common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45

(1 row)

db=> select count(distinct(school_id)) from common_student ;

count

-------

55

(1 row)

db=> alter table common_student alter column school_id set statistics
100000;

WARNING: lowering statistics target to 10000

ALTER TABLE

db=> analyse verbose common_student(school_id);

INFO: analyzing "public.common_student"

INFO: "common_student": scanned 7322 of 7322 pages, containing 65463 live
rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows

ANALYZE

db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
from pg_stats where attname='school_id' and tablename='common_stude

nt';

tablename | attname |
most_common_vals
| histogram_bounds | n_distinct

----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------

common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45

(1 row)

db=> explain analyse select * from common_student where school_id = 36 ;

QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------

Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388
loops=1)

Index Cond: (school_id = 36)

Planning time: 0.334 ms

Execution time: 6.542 ms

(4 rows)

db=> select school_id, count(*) from common_student group by school_id
order by count(*) limit 6 ;

school_id | count

-----------+-------

1 | 50

88 | 161

53 | 252

94 | 422

31 | 434

68 | 454

(6 rows)

dvpjxbzc=> select school_id, count(*) from common_student where school_id
= 36 group by school_id ;

school_id | count

-----------+-------

36 | 1388

(1 row)

db=> explain analyse select * from common_student where school_id = 1 ;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on common_student (cost=4.91..243.76 rows=64 width=385)
(actual time=0.128..0.267 rows=50 loops=1)

Recheck Cond: (school_id = 1)

Heap Blocks: exact=16

-> Bitmap Index Scan on "unique common_student" (cost=0.00..4.90
rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)

Index Cond: (school_id = 1)

Planning time: 0.177 ms

Execution time: 0.414 ms

(7 rows)

db=> select VERSION();

version

-----------------------------------------------------------------------------

PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2,
64-bit

(1 row)

------------------------------

This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the
message
and its attachments and notify the sender immediately. Thank you.

#4Jeff Janes
jeff.janes@gmail.com
In reply to: Mark (#1)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

- Does the analyse output below mean that it only scanned 51538 of 65463
rows in the table? Is school_id 36 just being missed in the sample? (This
happens when the analyse is repeated )

Is there a transaction which had deleted all of school_id=36, and then was
just left open indefinitely without either committing or rolling back?

That would explain it, and I don't know of anything else that could. The
deleted but not committed tuples are still live, but don't get sampled.

Cheers,

Jeff

#5Maxim Boguk
maxim.boguk@gmail.com
In reply to: Mark (#1)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi Mark,

It's look very weird.
Can you try something like this (check that you have enough disk space for
second copy of common_student before):

create table test_table AS SELECT * from common_student;
Vacuum analyze test_table;
explain analyze select * from test_table where school_id = 36;
drop table test_table;
create table test_table AS SELECT * from common_student ORDER BY school_id;
Vacuum analyze test_table;
explain analyze select * from test_table where school_id = 36;
drop table test_table;

And provide results of both explain analyze queries.

On Sat, Dec 22, 2018 at 3:39 AM Mark <mwchambers@gmail.com> wrote:

Hi All,

I'm seeing some very slow queries and it looks like the query planner is
deciding to do a 'Nested Loop Left Join'
which is slow. When I SET enable_nestloop=OFF for testing it does a 'Hash
Left Join' which is much faster.

I think I can see the cause of the problem in the examples below. I can't
make sense of the statistics being written
by ANALYSE or the planners row estimates.

Can anyone help me understand....

- Why the row estimate I get in the query below for school_id = 36 is 1 ?
( I expect it to be higher)
- Why does '1' appear in the most_common_vals when it is actually the
least common value.
- Why doesn't 36 appear in the most_common_vals (it is more common than 1)
- Does the analyse output below mean that it only scanned 51538 of 65463
rows in the table? Is school_id 36 just being missed in the sample? (This
happens when the analyse is repeated )

Any help with understanding what's happening here would be much
appreciated.

I hope I've provided enough information below.

Thanks,

Mark

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.087 ms
Execution time: 2.706 ms
(4 rows)

db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
from pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname |
most_common_vals
| histogram_bounds | n_distinct

----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45
(1 row)

db=> select count(distinct(school_id)) from common_student ;
count
-------
55
(1 row)

db=> alter table common_student alter column school_id set statistics
100000;
WARNING: lowering statistics target to 10000
ALTER TABLE
db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65463 live
rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows
ANALYZE
db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
from pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname |
most_common_vals
| histogram_bounds | n_distinct

----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45
(1 row)

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.334 ms
Execution time: 6.542 ms
(4 rows)

db=> select school_id, count(*) from common_student group by school_id
order by count(*) limit 6 ;
school_id | count
-----------+-------
1 | 50
88 | 161
53 | 252
94 | 422
31 | 434
68 | 454
(6 rows)

dvpjxbzc=> select school_id, count(*) from common_student where school_id
= 36 group by school_id ;
school_id | count
-----------+-------
36 | 1388
(1 row)

db=> explain analyse select * from common_student where school_id = 1 ;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=4.91..243.76 rows=64 width=385)
(actual time=0.128..0.267 rows=50 loops=1)
Recheck Cond: (school_id = 1)
Heap Blocks: exact=16
-> Bitmap Index Scan on "unique common_student" (cost=0.00..4.90
rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
Index Cond: (school_id = 1)
Planning time: 0.177 ms
Execution time: 0.414 ms
(7 rows)

db=> select VERSION();
version

-----------------------------------------------------------------------------
PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2,
64-bit
(1 row)

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"

#6Mark
mwchambers@gmail.com
In reply to: Jeff Janes (#4)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi Jeff,

Thanks for your help. That is exactly what is happening.

I have a long running job which deletes all of the common_student table and
then repopulates it. It takes long time to load all the other data and
commit the transaction. I didn't think the delete inside the transaction
would have any effect until it is commited or rolled back.

I will have to rewrite the application so it updates the existing rows
rather than deleting all and then inserting.

Thanks again for helping me understand what's happening here.

Proof:

db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.097 ms
Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36 */ ;

db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65431 live
rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.098 ms
Execution time: 2.583 ms
(4 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36 */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.327 ms
Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 02:57 Jeff Janes <jeff.janes@gmail.com> wrote:

Show quoted text

- Does the analyse output below mean that it only scanned 51538 of 65463
rows in the table? Is school_id 36 just being missed in the sample? (This
happens when the analyse is repeated )

Is there a transaction which had deleted all of school_id=36, and then was
just left open indefinitely without either committing or rolling back?

That would explain it, and I don't know of anything else that could. The
deleted but not committed tuples are still live, but don't get sampled.

Cheers,

Jeff

#7Mark
mwchambers@gmail.com
In reply to: Maxim Boguk (#5)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi Maxim,

Thanks for your help.

Jeff has pointed me in the right direction here, it seems that the rows=1
is due to a long running transaction which deletes all of common_student
for school_id 36 and then repopulates it.

I was unaware that the delete inside the transaction would affect the
VACUUM ANALYSE. As I said to Jeff I will have to rewrite the application to
update the rows if this is the normal behaviour.

Here are the queries:

db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.097 ms
Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36 */ ;

db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65431 live
rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.098 ms
Execution time: 2.583 ms
(4 rows)

db=> create table test_table AS SELECT * from common_student;
SELECT 65431
db=> vacuum analyze test_table;
VACUUM
db=> explain analyze select * from test_table where school_id = 36;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..3848.89 rows=1381 width=384) (actual
time=10.105..22.680 rows=1388 loops=1)
Filter: (school_id = 36)
Rows Removed by Filter: 64043
Planning time: 0.390 ms
Execution time: 23.767 ms
(5 rows)

db=> drop table test_table;
DROP TABLE
db=> create table test_table AS SELECT * from common_student ORDER BY
school_id;
SELECT 65431
db=> vacuum analyze test_table;
VACUUM
db=> explain analyze select * from test_table where school_id = 36;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..3850.89 rows=1341 width=382) (actual
time=5.674..27.585 rows=1388 loops=1)
Filter: (school_id = 36)
Rows Removed by Filter: 64043
Planning time: 0.264 ms
Execution time: 28.643 ms
(5 rows)

db=> explain analyze select * from test_table where school_id = 36;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..3850.89 rows=1341 width=382) (actual
time=20.848..43.272 rows=1388 loops=1)
Filter: (school_id = 36)
Rows Removed by Filter: 64043
Planning time: 0.068 ms
Execution time: 44.423 ms
(5 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36 */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.327 ms
Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 15:28 Maxim Boguk <maxim.boguk@gmail.com> wrote:

Show quoted text

Hi Mark,

It's look very weird.
Can you try something like this (check that you have enough disk space for
second copy of common_student before):

create table test_table AS SELECT * from common_student;
Vacuum analyze test_table;
explain analyze select * from test_table where school_id = 36;
drop table test_table;
create table test_table AS SELECT * from common_student ORDER BY school_id
;
Vacuum analyze test_table;
explain analyze select * from test_table where school_id = 36;
drop table test_table;

And provide results of both explain analyze queries.

On Sat, Dec 22, 2018 at 3:39 AM Mark <mwchambers@gmail.com> wrote:

Hi All,

I'm seeing some very slow queries and it looks like the query planner is
deciding to do a 'Nested Loop Left Join'
which is slow. When I SET enable_nestloop=OFF for testing it does a
'Hash Left Join' which is much faster.

I think I can see the cause of the problem in the examples below. I
can't make sense of the statistics being written
by ANALYSE or the planners row estimates.

Can anyone help me understand....

- Why the row estimate I get in the query below for school_id = 36 is 1
? ( I expect it to be higher)
- Why does '1' appear in the most_common_vals when it is actually the
least common value.
- Why doesn't 36 appear in the most_common_vals (it is more common than 1)
- Does the analyse output below mean that it only scanned 51538 of 65463
rows in the table? Is school_id 36 just being missed in the sample? (This
happens when the analyse is repeated )

Any help with understanding what's happening here would be much
appreciated.

I hope I've provided enough information below.

Thanks,

Mark

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.087 ms
Execution time: 2.706 ms
(4 rows)

db=> select
tablename,attname,most_common_vals,histogram_bounds,n_distinct from
pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname |
most_common_vals
| histogram_bounds | n_distinct

----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45
(1 row)

db=> select count(distinct(school_id)) from common_student ;
count
-------
55
(1 row)

db=> alter table common_student alter column school_id set statistics
100000;
WARNING: lowering statistics target to 10000
ALTER TABLE
db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65463
live rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total
rows
ANALYZE
db=> select
tablename,attname,most_common_vals,histogram_bounds,n_distinct from
pg_stats where attname='school_id' and tablename='common_stude
nt';
tablename | attname |
most_common_vals
| histogram_bounds | n_distinct

----------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------+------------------+------------
common_student | school_id |
{79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
| | 45
(1 row)

db=> explain analyse select * from common_student where school_id = 36 ;
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.334 ms
Execution time: 6.542 ms
(4 rows)

db=> select school_id, count(*) from common_student group by school_id
order by count(*) limit 6 ;
school_id | count
-----------+-------
1 | 50
88 | 161
53 | 252
94 | 422
31 | 434
68 | 454
(6 rows)

dvpjxbzc=> select school_id, count(*) from common_student where school_id
= 36 group by school_id ;
school_id | count
-----------+-------
36 | 1388
(1 row)

db=> explain analyse select * from common_student where school_id = 1 ;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=4.91..243.76 rows=64
width=385) (actual time=0.128..0.267 rows=50 loops=1)
Recheck Cond: (school_id = 1)
Heap Blocks: exact=16
-> Bitmap Index Scan on "unique common_student" (cost=0.00..4.90
rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
Index Cond: (school_id = 1)
Planning time: 0.177 ms
Execution time: 0.414 ms
(7 rows)

db=> select VERSION();
version

-----------------------------------------------------------------------------
PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2,
64-bit
(1 row)

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000 <+7%20985%20433-00-00>
Phone UA: +380 99 143 0000 <+380%2099%20143%200000>
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"

#8Ron
ronljohnsonjr@gmail.com
In reply to: Mark (#6)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

But Jeff said "left open indefinitely without either committing or rolling
back".  Your process is committing the transaction.

On 1/2/19 6:15 AM, Mark wrote:

Hi Jeff,

Thanks for your help. That is exactly what is happening.

I have a long running job which deletes all of the common_student table
and then repopulates it. It takes long time to load all the other data and
commit the transaction. I didn't think the delete inside the transaction
would have any effect until it is commited or rolled back.

I will have to rewrite the application so it updates the existing rows
rather than deleting all and then inserting.

Thanks again for helping me understand what's happening here.

Proof:

db=> explain analyze select * from common_student where school_id = 36;
    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on common_student  (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
         Index Cond: (school_id = 36)
 Planning time: 0.097 ms
 Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36  */ ;

db=> analyse verbose common_student(school_id);
INFO:  analyzing "public.common_student"
INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431 live
rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_common_student_sid on common_student 
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
loops=1)
   Index Cond: (school_id = 36)
 Planning time: 0.098 ms
 Execution time: 2.583 ms
(4 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36  */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on common_student  (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
         Index Cond: (school_id = 36)
 Planning time: 0.327 ms
 Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 02:57 Jeff Janes <jeff.janes@gmail.com
<mailto:jeff.janes@gmail.com>> wrote:

- Does the analyse output below mean that it only scanned 51538 of
65463 rows in the table? Is school_id 36 just being missed in the
sample? (This happens when the analyse is repeated )

Is there a transaction which had deleted all of school_id=36, and then
was just left open indefinitely without either committing or rolling back?

That would explain it, and I don't know of anything else that could. 
The deleted but not committed tuples are still live, but don't get
sampled.

Cheers,

Jeff

--
Angular momentum makes the world go 'round.

#9Mark
mwchambers@gmail.com
In reply to: Ron (#8)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi Ron,

Yes, my process will commit the transaction (or roll it back) eventually.
It's the window where
one transaction has deleted all the rows (before committing) and an ANALYSE
has ran.

The deleted rows won't make it into the sample even though the transaction
has NOT been commited.

During this time I will get bad row estimates on rows for that ID. You can
see this in the example below with two
database connections (labelled 1 and 2).

I would have expected the DELETE to have no effect until it was committed.

connection 1=> create table test_table(school_id integer, note text);
CREATE TABLE
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT
g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> select * from test_table ;
school_id | note
-----------+----------------------------------
1 | 0e08cf3990a04f0e943584517c564d31
2 | 96bf83ae5f8eb9342e8408b1ac25cb14
1 | f8fd943012edfe42a03a421df660bc33
2 | cd5d7ff0abca61f18857df9b21d234e0
1 | 60d731f430cb68c7285ddbcd9186eaa0
2 | 635e6c9cf305147ad8684213f0a9299c
(6 rows)

connection 1=> analyse verbose test_table ;
INFO: analyzing "public.test_table"
INFO: "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 6 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2
;
QUERY PLAN

-----------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..1.07 rows=3 width=37) (actual
time=0.011..0.015 rows=3 loops=1)
Filter: (school_id = 2)
Rows Removed by Filter: 3
Planning time: 0.164 ms
Execution time: 0.043 ms
(5 rows)

connection 2=> BEGIN ;
BEGIN
connection 2=> delete from test_table where school_id = 2 ;
DELETE 3
connection 2=> /* This connection is now idle */

connection 1=> analyse verbose test_table ;
INFO: analyzing "public.test_table"
INFO: "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 3 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2
;
QUERY PLAN

-----------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..1.07 rows=1 width=37) (actual
time=0.009..0.014 rows=3 loops=1)
Filter: (school_id = 2)
Rows Removed by Filter: 3
Planning time: 0.095 ms
Execution time: 0.039 ms
(5 rows)

On Wed, 2 Jan 2019 at 14:04 Ron <ronljohnsonjr@gmail.com> wrote:

Show quoted text

But Jeff said "left open indefinitely without either committing or rolling
back". Your process is committing the transaction.

On 1/2/19 6:15 AM, Mark wrote:

Hi Jeff,

Thanks for your help. That is exactly what is happening.

I have a long running job which deletes all of the common_student table
and then repopulates it. It takes long time to load all the other data and
commit the transaction. I didn't think the delete inside the transaction
would have any effect until it is commited or rolled back.

I will have to rewrite the application so it updates the existing rows
rather than deleting all and then inserting.

Thanks again for helping me understand what's happening here.

Proof:

db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.097 ms
Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36 */ ;

db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65431 live
rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.098 ms
Execution time: 2.583 ms
(4 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36 */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.327 ms
Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 02:57 Jeff Janes <jeff.janes@gmail.com> wrote:

- Does the analyse output below mean that it only scanned 51538 of 65463
rows in the table? Is school_id 36 just being missed in the sample? (This
happens when the analyse is repeated )

Is there a transaction which had deleted all of school_id=36, and then
was just left open indefinitely without either committing or rolling back?

That would explain it, and I don't know of anything else that could. The
deleted but not committed tuples are still live, but don't get sampled.

Cheers,

Jeff

--
Angular momentum makes the world go 'round.

#10Ron
ronljohnsonjr@gmail.com
In reply to: Mark (#9)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Try using SERIALIZABLE transactions instead of the default READ COMMITTED.

On 1/2/19 9:28 AM, Mark wrote:

Hi Ron,

Yes, my process will commit the transaction (or roll it back) eventually.
It's the window where
one transaction has deleted all the rows (before committing) and an
ANALYSE has ran.

The deleted rows won't make it into the sample even though the transaction
has NOT been commited.

During this time I will get bad row estimates on rows for that ID. You can
see this in the example below with two
database connections (labelled 1 and 2).

I would have expected the DELETE to have no effect until it was committed.

connection 1=> create table test_table(school_id integer, note text);
CREATE TABLE
connection 1=> insert into test_table(school_id, note) SELECT g.id
<http://g.id&gt;,md5(random()::text) FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT g.id
<http://g.id&gt;,md5(random()::text) FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT g.id
<http://g.id&gt;,md5(random()::text) FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> select * from test_table ;
 school_id |               note
-----------+----------------------------------
         1 | 0e08cf3990a04f0e943584517c564d31
         2 | 96bf83ae5f8eb9342e8408b1ac25cb14
         1 | f8fd943012edfe42a03a421df660bc33
         2 | cd5d7ff0abca61f18857df9b21d234e0
         1 | 60d731f430cb68c7285ddbcd9186eaa0
         2 | 635e6c9cf305147ad8684213f0a9299c
(6 rows)

connection 1=> analyse verbose test_table ;
INFO:  analyzing "public.test_table"
INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 6 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2 ;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..1.07 rows=3 width=37) (actual
time=0.011..0.015 rows=3 loops=1)
   Filter: (school_id = 2)
   Rows Removed by Filter: 3
 Planning time: 0.164 ms
 Execution time: 0.043 ms
(5 rows)

connection 2=> BEGIN ;
BEGIN
connection 2=> delete from test_table where school_id = 2 ;
DELETE 3
connection 2=> /* This connection is now idle */

connection 1=> analyse verbose test_table ;
INFO:  analyzing "public.test_table"
INFO:  "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 3 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id = 2 ;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..1.07 rows=1 width=37) (actual
time=0.009..0.014 rows=3 loops=1)
   Filter: (school_id = 2)
   Rows Removed by Filter: 3
 Planning time: 0.095 ms
 Execution time: 0.039 ms
(5 rows)

On Wed, 2 Jan 2019 at 14:04 Ron <ronljohnsonjr@gmail.com
<mailto:ronljohnsonjr@gmail.com>> wrote:

But Jeff said "left open indefinitely without either committing or
rolling back".  Your process is committing the transaction.

On 1/2/19 6:15 AM, Mark wrote:

Hi Jeff,

Thanks for your help. That is exactly what is happening.

I have a long running job which deletes all of the common_student
table and then repopulates it. It takes long time to load all the
other data and commit the transaction. I didn't think the delete
inside the transaction would have any effect until it is commited or
rolled back.

I will have to rewrite the application so it updates the existing
rows rather than deleting all and then inserting.

Thanks again for helping me understand what's happening here.

Proof:

db=> explain analyze select * from common_student where school_id = 36;
              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
         Index Cond: (school_id = 36)
 Planning time: 0.097 ms
 Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36  */ ;

db=> analyse verbose common_student(school_id);
INFO:  analyzing "public.common_student"
INFO:  "common_student": scanned 7322 of 7322 pages, containing 65431
live rows and 8060 dead rows; 56818 rows in sample, 65431 estimated
total rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_common_student_sid on common_student 
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498
rows=1388 loops=1)
   Index Cond: (school_id = 36)
 Planning time: 0.098 ms
 Execution time: 2.583 ms
(4 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36  */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
   Recheck Cond: (school_id = 36)
   Heap Blocks: exact=67
   ->  Bitmap Index Scan on idx_common_student_sid  (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
         Index Cond: (school_id = 36)
 Planning time: 0.327 ms
 Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 02:57 Jeff Janes <jeff.janes@gmail.com
<mailto:jeff.janes@gmail.com>> wrote:

- Does the analyse output below mean that it only scanned
51538 of 65463 rows in the table? Is school_id 36 just being
missed in the sample? (This happens when the analyse is
repeated )

Is there a transaction which had deleted all of school_id=36, and
then was just left open indefinitely without either committing or
rolling back?

That would explain it, and I don't know of anything else that
could.  The deleted but not committed tuples are still live, but
don't get sampled.

Cheers,

Jeff

--
Angular momentum makes the world go 'round.

--
Angular momentum makes the world go 'round.

#11Mark
mwchambers@gmail.com
In reply to: Ron (#10)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi Ron,

I tried my test_table example below using swapping 'BEGIN' for:

=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

It made no difference to the behaviour.

On Wed, 2 Jan 2019 at 15:45 Ron <ronljohnsonjr@gmail.com> wrote:

Show quoted text

Try using SERIALIZABLE transactions instead of the default READ COMMITTED.

On 1/2/19 9:28 AM, Mark wrote:

Hi Ron,

Yes, my process will commit the transaction (or roll it back) eventually.
It's the window where
one transaction has deleted all the rows (before committing) and an
ANALYSE has ran.

The deleted rows won't make it into the sample even though the transaction
has NOT been commited.

During this time I will get bad row estimates on rows for that ID. You can
see this in the example below with two
database connections (labelled 1 and 2).

I would have expected the DELETE to have no effect until it was committed.

connection 1=> create table test_table(school_id integer, note text);
CREATE TABLE
connection 1=> insert into test_table(school_id, note) SELECT g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> insert into test_table(school_id, note) SELECT g.id,md5(random()::text)
FROM generate_series(1,2) as g(id);
INSERT 0 2
connection 1=> select * from test_table ;
school_id | note
-----------+----------------------------------
1 | 0e08cf3990a04f0e943584517c564d31
2 | 96bf83ae5f8eb9342e8408b1ac25cb14
1 | f8fd943012edfe42a03a421df660bc33
2 | cd5d7ff0abca61f18857df9b21d234e0
1 | 60d731f430cb68c7285ddbcd9186eaa0
2 | 635e6c9cf305147ad8684213f0a9299c
(6 rows)

connection 1=> analyse verbose test_table ;
INFO: analyzing "public.test_table"
INFO: "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 6 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id =
2 ;
QUERY PLAN

-----------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..1.07 rows=3 width=37) (actual
time=0.011..0.015 rows=3 loops=1)
Filter: (school_id = 2)
Rows Removed by Filter: 3
Planning time: 0.164 ms
Execution time: 0.043 ms
(5 rows)

connection 2=> BEGIN ;
BEGIN
connection 2=> delete from test_table where school_id = 2 ;
DELETE 3
connection 2=> /* This connection is now idle */

connection 1=> analyse verbose test_table ;
INFO: analyzing "public.test_table"
INFO: "test_table": scanned 1 of 1 pages, containing 6 live rows and 0
dead rows; 3 rows in sample, 6 estimated total rows
ANALYZE
connection 1=> explain analyse select * from test_table where school_id =
2 ;
QUERY PLAN

-----------------------------------------------------------------------------------------------------
Seq Scan on test_table (cost=0.00..1.07 rows=1 width=37) (actual
time=0.009..0.014 rows=3 loops=1)
Filter: (school_id = 2)
Rows Removed by Filter: 3
Planning time: 0.095 ms
Execution time: 0.039 ms
(5 rows)

On Wed, 2 Jan 2019 at 14:04 Ron <ronljohnsonjr@gmail.com> wrote:

But Jeff said "left open indefinitely without either committing or
rolling back". Your process is committing the transaction.

On 1/2/19 6:15 AM, Mark wrote:

Hi Jeff,

Thanks for your help. That is exactly what is happening.

I have a long running job which deletes all of the common_student table
and then repopulates it. It takes long time to load all the other data and
commit the transaction. I didn't think the delete inside the transaction
would have any effect until it is commited or rolled back.

I will have to rewrite the application so it updates the existing rows
rather than deleting all and then inserting.

Thanks again for helping me understand what's happening here.

Proof:

db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=88.30..3846.49 rows=1533
width=384) (actual time=4.852..7.065 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..87.91
rows=1533 width=0) (actual time=4.817..4.817 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.097 ms
Execution time: 8.084 ms
(7 rows)

db=> /* At this point I have started a long running transaction that
deletes all of common_student for school_id 36 */ ;

db=> analyse verbose common_student(school_id);
INFO: analyzing "public.common_student"
INFO: "common_student": scanned 7322 of 7322 pages, containing 65431
live rows and 8060 dead rows; 56818 rows in sample, 65431 estimated total
rows
ANALYZE
db=> explain analyze select * from common_student where school_id = 36;
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_common_student_sid on common_student
(cost=0.41..8.43 rows=1 width=384) (actual time=0.017..1.498 rows=1388
loops=1)
Index Cond: (school_id = 36)
Planning time: 0.098 ms
Execution time: 2.583 ms
(4 rows)

db=> /* At this point I have killed the long running transaction that
deletes all of common_student for school_id 36 */ ;
db=> vacuum analyze common_student;
VACUUM
db=> explain analyze select * from common_student where school_id = 36;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on common_student (cost=79.17..3357.79 rows=1388
width=383) (actual time=0.088..1.302 rows=1388 loops=1)
Recheck Cond: (school_id = 36)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx_common_student_sid (cost=0.00..78.83
rows=1388 width=0) (actual time=0.077..0.077 rows=1388 loops=1)
Index Cond: (school_id = 36)
Planning time: 0.327 ms
Execution time: 2.311 ms
(7 rows)

On Sun, 23 Dec 2018 at 02:57 Jeff Janes <jeff.janes@gmail.com> wrote:

- Does the analyse output below mean that it only scanned 51538 of
65463 rows in the table? Is school_id 36 just being missed in the sample?
(This happens when the analyse is repeated )

Is there a transaction which had deleted all of school_id=36, and then
was just left open indefinitely without either committing or rolling back?

That would explain it, and I don't know of anything else that could.
The deleted but not committed tuples are still live, but don't get sampled.

Cheers,

Jeff

--
Angular momentum makes the world go 'round.

--
Angular momentum makes the world go 'round.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark (#6)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Mark <mwchambers@gmail.com> writes:

I have a long running job which deletes all of the common_student table and
then repopulates it. It takes long time to load all the other data and
commit the transaction. I didn't think the delete inside the transaction
would have any effect until it is commited or rolled back.
I will have to rewrite the application so it updates the existing rows
rather than deleting all and then inserting.

Hmm ... I'm not sure if that will actually make things better. The root
of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:

* We count delete-in-progress rows as still live, using
* the same reasoning given above; but we don't bother to
* include them in the sample.

The "reasoning given above" is a reference to what happens for
INSERT_IN_PROGRESS tuples:

* Insert-in-progress rows are not counted. We assume
* that when the inserting transaction commits or aborts,
* it will send a stats message to increment the proper
* count. This works right only if that transaction ends
* after we finish analyzing the table; if things happen
* in the other order, its stats update will be
* overwritten by ours. However, the error will be large
* only if the other transaction runs long enough to
* insert many tuples, so assuming it will finish after us
* is the safer option.

Now the problem with this, from your perspective, is that *neither*
INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
ANALYZE's data sample. So a long-running update transaction will
cause all the rows it changes to be excluded from the sample until
commit. This seems like a bad thing, and it definitely means that
adjusting your app as you're contemplating won't help.

In order to handle the bulk-update scenario sanely, it seems like
we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
to be included. And it looks like, for consistency with the row-counting
logic, the one that's included needs to be DELETE_IN_PROGRESS. This
is a slightly annoying conclusion, because it means we're accumulating
stats that we know are likely to soon be obsolete, but I think sampling
INSERT_IN_PROGRESS tuples instead would lead to very strange results
in some cases.

In short, I think we want to do this to the DELETE_IN_PROGRESS logic:

                    if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
                        deadrows += 1;
                    else
+                   {
+                       sample_it = true;
                        liverows += 1;
+                   }

with suitable adjustment of the adjacent comment.

Thoughts?

regards, tom lane

#13Mark
mwchambers@gmail.com
In reply to: Tom Lane (#12)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Hi Tom,

Thanks for your reply.

Am I correct in my understanding that any row that has been modified (i.e.
UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
in the sample?

I'm going to rework the application so there is less time between the
DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
this smaller time window. Looks like your patch will help if this happens.

Then again, it also seems no one has had a problem with its current
behaviour (for 11 years!).

Thanks,

Mark

On Wed, 2 Jan 2019 at 16:11 Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Mark <mwchambers@gmail.com> writes:

I have a long running job which deletes all of the common_student table

and

then repopulates it. It takes long time to load all the other data and
commit the transaction. I didn't think the delete inside the transaction
would have any effect until it is commited or rolled back.
I will have to rewrite the application so it updates the existing rows
rather than deleting all and then inserting.

Hmm ... I'm not sure if that will actually make things better. The root
of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:

* We count delete-in-progress rows as still live,
using
* the same reasoning given above; but we don't bother
to
* include them in the sample.

The "reasoning given above" is a reference to what happens for
INSERT_IN_PROGRESS tuples:

* Insert-in-progress rows are not counted. We assume
* that when the inserting transaction commits or
aborts,
* it will send a stats message to increment the proper
* count. This works right only if that transaction
ends
* after we finish analyzing the table; if things
happen
* in the other order, its stats update will be
* overwritten by ours. However, the error will be
large
* only if the other transaction runs long enough to
* insert many tuples, so assuming it will finish
after us
* is the safer option.

Now the problem with this, from your perspective, is that *neither*
INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
ANALYZE's data sample. So a long-running update transaction will
cause all the rows it changes to be excluded from the sample until
commit. This seems like a bad thing, and it definitely means that
adjusting your app as you're contemplating won't help.

In order to handle the bulk-update scenario sanely, it seems like
we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
to be included. And it looks like, for consistency with the row-counting
logic, the one that's included needs to be DELETE_IN_PROGRESS. This
is a slightly annoying conclusion, because it means we're accumulating
stats that we know are likely to soon be obsolete, but I think sampling
INSERT_IN_PROGRESS tuples instead would lead to very strange results
in some cases.

In short, I think we want to do this to the DELETE_IN_PROGRESS logic:

if
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
deadrows += 1;
else
+                   {
+                       sample_it = true;
liverows += 1;
+                   }

with suitable adjustment of the adjacent comment.

Thoughts?

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark (#13)
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Mark <mwchambers@gmail.com> writes:

Am I correct in my understanding that any row that has been modified (i.e.
UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
in the sample?

An update will mark the existing tuple as delete-in-progress and then
insert a new tuple (row version) that's insert-in-progress.

A concurrent ANALYZE scan will definitely see the old tuple (modulo
sampling considerations) but it's timing-dependent which state it sees it
in --- it could still be "live" when we see it, or already
delete-in-progress. ANALYZE might or might not see the new tuple at all,
depending on timing and where the new tuple gets placed. So "count/sample
delete-in-progress but not insert-in-progress" seems like a good rule to
minimize the timing sensitivity of the results. It's not completely
bulletproof, but I think it's better than what we're doing now.

I'm going to rework the application so there is less time between the
DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
this smaller time window.

Yeah, that's about the best you can do from the application side.

regards, tom lane