Statistics Import and Export commit related issue.

Started by jian he11 months ago2 messages
#1jian he
jian.universality@gmail.com
1 attachment(s)

hi.
the thread "Statistics Import and Export" is quite hot.
so a new thread for some minor issue i found.

static int
_tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
{

if (strcmp(te->desc, "STATISTICS DATA") == 0)
{
if (!ropt->dumpStatistics)
return 0;
else
res = REQ_STATS;
}

/* If it's statistics and we don't want statistics, maybe ignore it */
if (!ropt->dumpStatistics && strcmp(te->desc, "STATISTICS DATA") == 0)
return 0;
}
As you can see, there is some duplicate code there.
in _tocEntryRequired, we can move `` if (strcmp(te->desc, "STATISTICS
DATA") == 0)`` after switch (curSection).
so pg_dump --section option does not influence statistics dump.
attached is the proposed change.

in dumpTableData_copy, we have:
pg_log_info("dumping contents of table \"%s.%s\"",
tbinfo->dobj.namespace->dobj.name, classname);
dumpRelationStats add a pg_log_info would be a good thing.
commit: https://git.postgresql.org/cgit/postgresql.git/commit/?id=1fd1bd871012732e3c6c482667d2f2c56f1a9395
didn't have any pg_log_info.

https://www.postgresql.org/docs/devel/app-pgrestore.html
"""
--jobs=number-of-jobs
Run the most time-consuming steps of pg_restore — those that load
data, create indexes, or create constraints — concurrently, using up
to number-of-jobs concurrent sessions.
"""
we may need to change the above sentence?
pg_restore restore STATISTICS DATA also doing it concurrently if
--jobs is specified.

Attachments:

v1-0001-refactor-_tocEntryRequired-for-STATISTICS-DATA.patchtext/x-patch; charset=US-ASCII; name=v1-0001-refactor-_tocEntryRequired-for-STATISTICS-DATA.patchDownload
From f1ce9c90493d300c0c0d1169c483a18661416c3f Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 25 Feb 2025 14:13:24 +0800
Subject: [PATCH v1 1/1] refactor _tocEntryRequired for STATISTICS DATA.

we don't dump STATISTICS DATA in --section=pre-data.
so in _tocEntryRequired, we evaulate "(strcmp(te->desc, "STATISTICS DATA") == 0)"
after ``switch (curSection)``.
---
 src/bin/pg_dump/pg_backup_archiver.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 632077113a4..0fc4ba04ba4 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2932,14 +2932,6 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
 		strcmp(te->desc, "SEARCHPATH") == 0)
 		return REQ_SPECIAL;
 
-	if (strcmp(te->desc, "STATISTICS DATA") == 0)
-	{
-		if (!ropt->dumpStatistics)
-			return 0;
-		else
-			res = REQ_STATS;
-	}
-
 	/*
 	 * DATABASE and DATABASE PROPERTIES also have a special rule: they are
 	 * restored in createDB mode, and not restored otherwise, independently of
@@ -2984,10 +2976,6 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
 	if (ropt->no_subscriptions && strcmp(te->desc, "SUBSCRIPTION") == 0)
 		return 0;
 
-	/* If it's statistics and we don't want statistics, maybe ignore it */
-	if (!ropt->dumpStatistics && strcmp(te->desc, "STATISTICS DATA") == 0)
-		return 0;
-
 	/* Ignore it if section is not to be dumped/restored */
 	switch (curSection)
 	{
@@ -3008,6 +2996,15 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
 			return 0;
 	}
 
+	/* If it's statistics and we don't want statistics, ignore it */
+	if (strcmp(te->desc, "STATISTICS DATA") == 0)
+	{
+		if (!ropt->dumpStatistics)
+			return 0;
+		else
+			return REQ_STATS;
+	}
+
 	/* Ignore it if rejected by idWanted[] (cf. SortTocFromFile) */
 	if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
 		return 0;
-- 
2.34.1

#2Corey Huinker
corey.huinker@gmail.com
In reply to: jian he (#1)
Re: Statistics Import and Export commit related issue.

On Tue, Feb 25, 2025 at 1:31 AM jian he <jian.universality@gmail.com> wrote:

hi.
the thread "Statistics Import and Export" is quite hot.
so a new thread for some minor issue i found.

static int
_tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
{

if (strcmp(te->desc, "STATISTICS DATA") == 0)
{
if (!ropt->dumpStatistics)
return 0;
else
res = REQ_STATS;
}

/* If it's statistics and we don't want statistics, maybe ignore it */
if (!ropt->dumpStatistics && strcmp(te->desc, "STATISTICS DATA") == 0)
return 0;
}
As you can see, there is some duplicate code there.
in _tocEntryRequired, we can move `` if (strcmp(te->desc, "STATISTICS
DATA") == 0)`` after switch (curSection).
so pg_dump --section option does not influence statistics dump.
attached is the proposed change.

+1, and I think the patch provided is good as-is.

in dumpTableData_copy, we have:
pg_log_info("dumping contents of table \"%s.%s\"",
tbinfo->dobj.namespace->dobj.name, classname);
dumpRelationStats add a pg_log_info would be a good thing.
commit:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=1fd1bd871012732e3c6c482667d2f2c56f1a9395
didn't have any pg_log_info.

I don't have a strong opinion on this one, but I lean against doing it. I
think table data got a log message because dumping table data is an
operation that takes up a major % of the dump's runtime, whereas statistics
are just some catalog operations. There are log messages for the global
catalog operations (ex. read in all user defined functions, read in all
inheritance relationships), but not per-table catalog operations. If we end
up going to a global export of attribute statistics then I'd definitely
want a log message for that operation.

https://www.postgresql.org/docs/devel/app-pgrestore.html
"""
--jobs=number-of-jobs
Run the most time-consuming steps of pg_restore — those that load
data, create indexes, or create constraints — concurrently, using up
to number-of-jobs concurrent sessions.
"""
we may need to change the above sentence?
pg_restore restore STATISTICS DATA also doing it concurrently if
--jobs is specified.

I also have no strong opinion here, but I think the sentence is fine as is
because it specifies "most time-consuming" and a single catalog row
operation is much smaller than a COPY load, or the sequential scans needed
for index builds and constraint verification.