Explain returns different number of rows

Started by Vince McMahonover 3 years ago5 messagesgeneral

sippingonesandzeros@gmail.com

over 3 years ago

Show quoted text

I executed the following statements 3 times
explain(analyze, buffet) select * from table1

The number of rows are different. Is the table corrupted? How to confirm
and how to fix it?

Import Notes

Reply to msg id not found: CAKS+vVb5CsvV_ae_=TVu6cRZTTWbGv1mhoyaBDPGLFQbQ9J=JQ@mail.gmail.comReference msg id not found: CAKS+vVb5CsvV_ae_=TVu6cRZTTWbGv1mhoyaBDPGLFQbQ9J=JQ@mail.gmail.com

Vince McMahon

sippingonesandzeros@gmail.com

over 3 years ago

In reply to: Vince McMahon (#1)

I did get reply so I am trying again.

I executed the following statements 3 times

Show quoted text

explain(analyze, buffet) select * from table1

The number of rows are different. Is the table corrupted? How to confirm
and how to fix it?

Christophe Pettus

xof@thebuild.com

over 3 years ago

In reply to: Vince McMahon (#2)

Re: Explain returns different number of rows

On Oct 20, 2022, at 09:52, Vince McMahon <sippingonesandzeros@gmail.com> wrote:
The number of rows are different.

This isn't unexpected. EXPLAIN does not actually run the query and determine how many rows are returned; it calculates an estimate based on the current system statistics, which vary constantly depending on activity in the database.

Peter J. Holzer

hjp-pgsql@hjp.at

over 3 years ago

In reply to: Christophe Pettus (#3)

Re: Explain returns different number of rows

On 2022-10-20 09:56:23 -0700, Christophe Pettus wrote:

On Oct 20, 2022, at 09:52, Vince McMahon <sippingonesandzeros@gmail.com> wrote:

The number of rows are different.

This isn't unexpected. EXPLAIN does not actually run the query and
determine how many rows are returned; it calculates an estimate based
on the current system statistics, which vary constantly depending on
activity in the database.

EXPLAIN ANALYZE (which is what he did) does run the query and return the
actual number of rows:

#v+
wdsah=> explain (analyze, buffers) select * from facttable_eurostat_comext_cpa2_1 ;
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ QUERY PLAN ║
╟──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Seq Scan on facttable_eurostat_comext_cpa2_1 (cost=0.00..1005741.32 rows=39633432 width=85) (actual time=0.396..6541.701 rows=39633591 loops=1) ║
║ Buffers: shared read=609407 ║
║ Planning Time: 1.650 ms ║
║ Execution Time: 7913.027 ms ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
(4 rows)
#v-

The first tuple (cost=0.00..1005741.32 rows=39633432 width=85) is an
estimate used to plan the query. But the second one
(actual time=0.396..6541.701 rows=39633591 loops=1)
contains measurements from actually running the query.

I think it's possible that the rows estimate in the first tuple changes
without any actual data change (although the only reason I can think of
right now would be an ANALYZE (in another session or by autovacuum)).
But the actual rows definitely shouldn't change.

Vince McMahon

sippingonesandzeros@gmail.com

over 3 years ago

In reply to: Peter J. Holzer (#4)

Re: Explain returns different number of rows

Thanks for the clarification, Peter.

On Sat, Oct 22, 2022, 05:32 Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

Show quoted text

On 2022-10-20 09:56:23 -0700, Christophe Pettus wrote:

On Oct 20, 2022, at 09:52, Vince McMahon <sippingonesandzeros@gmail.com>

wrote:

The number of rows are different.

This isn't unexpected. EXPLAIN does not actually run the query and
determine how many rows are returned; it calculates an estimate based
on the current system statistics, which vary constantly depending on
activity in the database.

EXPLAIN ANALYZE (which is what he did) does run the query and return the
actual number of rows:

#v+
wdsah=> explain (analyze, buffers) select * from
facttable_eurostat_comext_cpa2_1 ;

╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ QUERY
PLAN ║

╟──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Seq Scan on facttable_eurostat_comext_cpa2_1 (cost=0.00..1005741.32
rows=39633432 width=85) (actual time=0.396..6541.701 rows=39633591 loops=1)
║
║ Buffers: shared read=609407
║
║ Planning Time: 1.650 ms
║
║ Execution Time: 7913.027 ms
║

╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
(4 rows)
#v-

The first tuple (cost=0.00..1005741.32 rows=39633432 width=85) is an
estimate used to plan the query. But the second one
(actual time=0.396..6541.701 rows=39633591 loops=1)
contains measurements from actually running the query.

I think it's possible that the rows estimate in the first tuple changes
without any actual data change (although the only reason I can think of
right now would be an ANALYZE (in another session or by autovacuum)).
But the actual rows definitely shouldn't change.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"