vacuumdb --analyze-only scans all pages?

Started by Gerhard Wiesingerover 9 years ago4 messagesgeneral
Jump to latest
#1Gerhard Wiesinger
lists@wiesinger.com

Hello,

PostgreSQl 9.6.1: after a pg_dump/restore procedure it scans all pages
(at least for some of the tables, analyze-only switch is specified).

I would expect that only the sample rows are scanned.

"log_details": scanned 2133350 of 2133350 pages

vacuumdb --analyze-only --all --verbose
INFO: analyzing "public.log"
INFO: "log": scanned 30000 of 30851 pages, containing 3599899 live rows
and 0 dead rows; 30000 rows in sample, 3702016 estimated total rows
INFO: analyzing "public.log_details"
INFO: "log_details": scanned 2133350 of 2133350 pages, containing
334935843 live rows and 0 dead rows; 3000000 rows in sample, 334935843
estimated total rows
INFO: analyzing "public.log_details_str"
INFO: "log_details_str": scanned 30000 of 521126 pages, containing
3601451 live rows and 0 dead rows; 30000 rows in sample, 62560215
estimated total rows

Any ideas why?

Thnx.

Ciao,

Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Gerhard Wiesinger (#1)
Re: vacuumdb --analyze-only scans all pages?

On 12/28/2016 11:54 PM, Gerhard Wiesinger wrote:

Hello,

PostgreSQl 9.6.1: after a pg_dump/restore procedure it scans all pages
(at least for some of the tables, analyze-only switch is specified).

I would expect that only the sample rows are scanned.

"log_details": scanned 2133350 of 2133350 pages

vacuumdb --analyze-only --all --verbose
INFO: analyzing "public.log"
INFO: "log": scanned 30000 of 30851 pages, containing 3599899 live rows
and 0 dead rows; 30000 rows in sample, 3702016 estimated total rows
INFO: analyzing "public.log_details"
INFO: "log_details": scanned 2133350 of 2133350 pages, containing
334935843 live rows and 0 dead rows; 3000000 rows in sample, 334935843
estimated total rows
INFO: analyzing "public.log_details_str"
INFO: "log_details_str": scanned 30000 of 521126 pages, containing
3601451 live rows and 0 dead rows; 30000 rows in sample, 62560215
estimated total rows

Any ideas why?

I would say because the '3000000 rows in sample' where spread out over
all 2133350 pages.

Thnx.

Ciao,

Gerhard

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adrian Klaver (#2)
Re: vacuumdb --analyze-only scans all pages?

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 12/28/2016 11:54 PM, Gerhard Wiesinger wrote:

vacuumdb --analyze-only --all --verbose
INFO: analyzing "public.log"
INFO: "log": scanned 30000 of 30851 pages, containing 3599899 live rows
and 0 dead rows; 30000 rows in sample, 3702016 estimated total rows
INFO: analyzing "public.log_details"
INFO: "log_details": scanned 2133350 of 2133350 pages, containing
334935843 live rows and 0 dead rows; 3000000 rows in sample, 334935843
estimated total rows
INFO: analyzing "public.log_details_str"
INFO: "log_details_str": scanned 30000 of 521126 pages, containing
3601451 live rows and 0 dead rows; 30000 rows in sample, 62560215
estimated total rows

Any ideas why?

I would say because the '3000000 rows in sample' where spread out over
all 2133350 pages.

Worth pointing out here is that you must have a custom statistics target
set on log_details to make it want a sample so much larger than the
default. If you feel ANALYZE is taking too long, you should reconsider
whether you need such a large target.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Gerhard Wiesinger
lists@wiesinger.com
In reply to: Tom Lane (#3)
Re: vacuumdb --analyze-only scans all pages?

On 29.12.2016 16:10, Tom Lane wrote:

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 12/28/2016 11:54 PM, Gerhard Wiesinger wrote:

vacuumdb --analyze-only --all --verbose
INFO: analyzing "public.log"
INFO: "log": scanned 30000 of 30851 pages, containing 3599899 live rows
and 0 dead rows; 30000 rows in sample, 3702016 estimated total rows
INFO: analyzing "public.log_details"
INFO: "log_details": scanned 2133350 of 2133350 pages, containing
334935843 live rows and 0 dead rows; 3000000 rows in sample, 334935843
estimated total rows
INFO: analyzing "public.log_details_str"
INFO: "log_details_str": scanned 30000 of 521126 pages, containing
3601451 live rows and 0 dead rows; 30000 rows in sample, 62560215
estimated total rows

Any ideas why?

I would say because the '3000000 rows in sample' where spread out over
all 2133350 pages.

Worth pointing out here is that you must have a custom statistics target
set on log_details to make it want a sample so much larger than the
default. If you feel ANALYZE is taking too long, you should reconsider
whether you need such a large target.

Thanx Tom and Adrian

Yes, there is a custom statistic target of 10000 set, I guess for some
reasons some time ago to overcome a performance problem after upgrade
from 8.3 to 8.4.
Thanx Tom for pointing that out.

Good query to find it out:
SELECT
n.nspname AS schemaname,
CASE
WHEN cl.relkind = 'r' THEN CAST('TABLE' AS VARCHAR(20))
WHEN cl.relkind = 'i' THEN CAST('INDEX' AS VARCHAR(20))
WHEN cl.relkind = 'S' THEN CAST('SEQUENCE' AS VARCHAR(20))
WHEN cl.relkind = 's' THEN CAST('SPECIAL' AS VARCHAR(20))
WHEN cl.relkind = 'v' THEN CAST('VIEW' AS VARCHAR(20))
WHEN cl.relkind = 't' THEN CAST('TOAST TABLE' AS VARCHAR(18))
ELSE null
END AS object_type,
cl.relname,
attname,
attstattarget
FROM
pg_attribute a
LEFT OUTER JOIN pg_class cl ON a.attrelid = cl.oid
LEFT OUTER JOIN pg_namespace n ON n.oid = cl.relnamespace
WHERE
(cl.relkind = 'r' OR cl.relkind = 'i' OR cl.relkind = 't')
AND attnum > 0 -- only regular columns
AND n.nspname = 'public' -- public schema only
AND NOT(relname ILIKE 'pgstatspack_%')
AND cl.relkind = 'r' -- TABLE
AND attstattarget <> -1 -- non default values only
ORDER BY
n.nspname,
cl.relname,
attnum
;

BTW: It looks like that the statistics target is multiplied by 300 to
get the number of rows, is that true (didn't find any documentation
about that)?
https://www.postgresql.org/docs/current/static/planner-stats.html
-- ALTER TABLE public.log_details ALTER COLUMN fk_id SET STATISTICS 10000;
-- ALTER TABLE public.log_details ALTER COLUMN fk_keyid SET STATISTICS
10000;
-- Default is 100, means 300*100=30000 rows (30k)
-- Max ss 10000, means 300*10000=3000000 rows (3 Mio)
ALTER TABLE public.log_details ALTER COLUMN fk_id SET STATISTICS -1;
ALTER TABLE public.log_details ALTER COLUMN fk_keyid SET STATISTICS -1;
https://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET

Thnx.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general