Why analyze reports 30000 pages and rows scanned. Why not just rows?

Started by David Mullineux8 months ago2 messagesgeneral
Jump to latest
#1David Mullineux
dmullx@gmail.com

According to docs, analyze ,by default, will try to sample 30000 rows from
a table.
(I've read analyze.c note about Haas and Stokes IBM Research ).

But my question is, why does 'analyze verbose' report that it has scanned
'30000 of NNNN pages, containing NNNN live rows and 0 dead rows; 30000 rows
in sample,....'

As most tables would store more than 1 row per page, I expected that 30000
rows would require a lot fewer than 30000 *pages* to be scanned. Why is it
saying it's scanned 30000 pages instead of only 30000 rows ?

Confused. thanks.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Mullineux (#1)
Re: Why analyze reports 30000 pages and rows scanned. Why not just rows?

David Mullineux <dmullx@gmail.com> writes:

But my question is, why does 'analyze verbose' report that it has scanned
'30000 of NNNN pages, containing NNNN live rows and 0 dead rows; 30000 rows
in sample,....'

As most tables would store more than 1 row per page, I expected that 30000
rows would require a lot fewer than 30000 *pages* to be scanned. Why is it
saying it's scanned 30000 pages instead of only 30000 rows ?

If the table is sufficiently large, taking a sample of a single row
from each of 30000 different pages is the correct behavior. Taking
more than one row from each of a smaller set of pages would give a
nonrandom (because clumped) sample.

regards, tom lane