Explain returns different number of rows

Started by Vince McMahonover 3 years ago5 messagesgeneral
Jump to latest
#1Vince McMahon
sippingonesandzeros@gmail.com
Show quoted text

I executed the following statements 3 times
explain(analyze, buffet) select * from table1

The number of rows are different. Is the table corrupted? How to confirm
and how to fix it?

#2Vince McMahon
sippingonesandzeros@gmail.com
In reply to: Vince McMahon (#1)

I did get reply so I am trying again.

I executed the following statements 3 times

Show quoted text

explain(analyze, buffet) select * from table1

The number of rows are different. Is the table corrupted? How to confirm
and how to fix it?

#3Christophe Pettus
xof@thebuild.com
In reply to: Vince McMahon (#2)
Re: Explain returns different number of rows

On Oct 20, 2022, at 09:52, Vince McMahon <sippingonesandzeros@gmail.com> wrote:
The number of rows are different.

This isn't unexpected. EXPLAIN does not actually run the query and determine how many rows are returned; it calculates an estimate based on the current system statistics, which vary constantly depending on activity in the database.

#4Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Christophe Pettus (#3)
Re: Explain returns different number of rows

On 2022-10-20 09:56:23 -0700, Christophe Pettus wrote:

On Oct 20, 2022, at 09:52, Vince McMahon <sippingonesandzeros@gmail.com> wrote:

The number of rows are different.

This isn't unexpected. EXPLAIN does not actually run the query and
determine how many rows are returned; it calculates an estimate based
on the current system statistics, which vary constantly depending on
activity in the database.

EXPLAIN ANALYZE (which is what he did) does run the query and return the
actual number of rows:

#v+
wdsah=> explain (analyze, buffers) select * from facttable_eurostat_comext_cpa2_1 ;
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ QUERY PLAN ║
╟──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Seq Scan on facttable_eurostat_comext_cpa2_1 (cost=0.00..1005741.32 rows=39633432 width=85) (actual time=0.396..6541.701 rows=39633591 loops=1) ║
║ Buffers: shared read=609407 ║
║ Planning Time: 1.650 ms ║
║ Execution Time: 7913.027 ms ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
(4 rows)
#v-

The first tuple (cost=0.00..1005741.32 rows=39633432 width=85) is an
estimate used to plan the query. But the second one
(actual time=0.396..6541.701 rows=39633591 loops=1)
contains measurements from actually running the query.

I think it's possible that the rows estimate in the first tuple changes
without any actual data change (although the only reason I can think of
right now would be an ANALYZE (in another session or by autovacuum)).
But the actual rows definitely shouldn't change.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#5Vince McMahon
sippingonesandzeros@gmail.com
In reply to: Peter J. Holzer (#4)
Re: Explain returns different number of rows

Thanks for the clarification, Peter.

On Sat, Oct 22, 2022, 05:32 Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

Show quoted text

On 2022-10-20 09:56:23 -0700, Christophe Pettus wrote:

On Oct 20, 2022, at 09:52, Vince McMahon <sippingonesandzeros@gmail.com>

wrote:

The number of rows are different.

This isn't unexpected. EXPLAIN does not actually run the query and
determine how many rows are returned; it calculates an estimate based
on the current system statistics, which vary constantly depending on
activity in the database.

EXPLAIN ANALYZE (which is what he did) does run the query and return the
actual number of rows:

#v+
wdsah=> explain (analyze, buffers) select * from
facttable_eurostat_comext_cpa2_1 ;

╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ QUERY
PLAN ║

╟──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Seq Scan on facttable_eurostat_comext_cpa2_1 (cost=0.00..1005741.32
rows=39633432 width=85) (actual time=0.396..6541.701 rows=39633591 loops=1)

║ Buffers: shared read=609407

║ Planning Time: 1.650 ms

║ Execution Time: 7913.027 ms

╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
(4 rows)
#v-

The first tuple (cost=0.00..1005741.32 rows=39633432 width=85) is an
estimate used to plan the query. But the second one
(actual time=0.396..6541.701 rows=39633591 loops=1)
contains measurements from actually running the query.

I think it's possible that the rows estimate in the first tuple changes
without any actual data change (although the only reason I can think of
right now would be an ANALYZE (in another session or by autovacuum)).
But the actual rows definitely shouldn't change.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"