Support --include-analyze in pg_dump, pg_dumpall, pg_restore

Started by jian heover 1 year ago3 messageshackers
Jump to latest
#1jian he
jian.universality@gmail.com

hi.
after watching https://www.youtube.com/live/k4A9-WZET_4?si=vz3lTud735s2vcCO
then trying to hack it too.

the implemented feature is as the $subject description, making pg_dump
also includes the ANALYZE command in its output.

if option --include-analyze not specified, then pg_dump will not dump
ANALYZE, that's the status quo.
option --include-analyze support table relkind: RELKIND_PARTITIONED_TABLE,
RELKIND_MATVIEW, RELKIND_RELATION.

this option cannot be used with option --schema-only.
for materialized view, it will dump after the refresh materialized
view command.
(tested in several cases, ANALYZE will really be at the end of the dump).

By default, pg_restore does not restore the ANALYZE commands.
This means that if the archive contains ANALYZE commands,
you still need to explicitly specify the --include-analyze option to
restore them.

doc added.
demo for dump:
create materialized view mvw as select a from generate_series(1, 3) a;

---<<<<<<<<<<<<<<portion of pg_dump output starts>>>>>>>>>>>>>>>>>>>>>>
--
-- Name: mvw; Type: MATERIALIZED VIEW DATA; Schema: public; Owner: jian
--
REFRESH MATERIALIZED VIEW public.mvw;

--
-- Name: mvw; Type: ANALYZE; Schema: public; Owner: jian
--

ANALYZE public.mvw;

---<<<<<<<<<<<<<<portion of pg_dump output ends>>>>>>>>>>>>>>>>>>>>>>
demo for dump partitioned table:
CREATE TABLE prt1 (a int, b int, c text) PARTITION BY RANGE(a) ;
CREATE TABLE prt1_p1 PARTITION OF prt1 FOR VALUES FROM (0) TO (5);
CREATE TABLE prt1_p2 PARTITION OF prt1 FOR VALUES FROM (8) TO (11);
INSERT INTO prt1 SELECT i, i+11, to_char(i, 'FM0000') FROM
generate_series(0, 2) i;

pg_dump --table=*prt1* --include-analyze
---<<<<<<<<<<<<<<portion of output ends>>>>>>>>>>>>>>>>>>>>>>
--
-- Data for Name: prt1_p1; Type: TABLE DATA; Schema: public; Owner: jian
--
COPY public.prt1_p1 (a, b, c) FROM stdin;
0 11 0000
1 12 0001
2 13 0002
\.

--
-- Data for Name: prt1_p2; Type: TABLE DATA; Schema: public; Owner: jian
--
COPY public.prt1_p2 (a, b, c) FROM stdin;
\.
--
-- Name: prt1; Type: ANALYZE; Schema: public; Owner: jian
--
ANALYZE ONLY public.prt1;
--
-- Name: prt1_p1; Type: ANALYZE; Schema: public; Owner: jian
--
ANALYZE public.prt1_p1;
--
-- Name: prt1_p2; Type: ANALYZE; Schema: public; Owner: jian
--

ANALYZE public.prt1_p2;

---<<<<<<<<<<<<<<portion of output starts>>>>>>>>>>>>>>>>>>>>>>
TODO item: writing tests.

idea credits to Andrey Borodin, Nikolay Samokhvalov, Kirk Wolak

what do you think?

Attachments:

v1-0001-Support-include-analyze-in-pg_dump-pg_dumpall-pg_.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Support-include-analyze-in-pg_dump-pg_dumpall-pg_.patchDownload+203-24
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: jian he (#1)
Re: Support --include-analyze in pg_dump, pg_dumpall, pg_restore

jian he <jian.universality@gmail.com> writes:

the implemented feature is as the $subject description, making pg_dump
also includes the ANALYZE command in its output.

Isn't this pretty much obsoleted by the ongoing work to dump and
restore statistics?

regards, tom lane

#3Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#2)
Re: Support --include-analyze in pg_dump, pg_dumpall, pg_restore

On Tue, Jan 14, 2025 at 09:32:19AM -0500, Tom Lane wrote:

jian he <jian.universality@gmail.com> writes:

the implemented feature is as the $subject description, making pg_dump
also includes the ANALYZE command in its output.

Isn't this pretty much obsoleted by the ongoing work to dump and
restore statistics?

Yeah, and I believe that work is still on track for v18. That's likely
going to be much faster than analyzing everything, and it'll be usable in
pg_upgrade, too.

/messages/by-id/flat/CADkLM=cB0rF3p_FuWRTMSV0983ihTRpsH+OCpNyiqE7Wk0vUWA@mail.gmail.com

--
nathan