progress report for ANALYZE
Hello
Here's a patch that implements progress reporting for ANALYZE.
Thanks
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v1-0001-Report-progress-for-ANALYZE.patchtext/x-diff; charset=us-asciiDownload+237-5
On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Here's a patch that implements progress reporting for ANALYZE.
Patch applies, code and doc and compiles cleanly. I have few comments:
@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
if (numrows > 0)
{
MemoryContext col_context,
- old_context;
+ old_context;
+ const int index[] = {
+ PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_BLOCKS_DONE
+ };
+ const int64 val[] = {
+ PROGRESS_ANALYZE_PHASE_ANALYSIS,
+ 0, 0
+ };
+
+ pgstat_progress_update_multi_param(3, index, val);
[...]
}
+ pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_PHASE_COMPLETE);
+
If there wasn't any row but multiple blocks were scanned, the
PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
the blocks that were scanned. I'm not sure if we should stay
consistent here.
diff --git a/src/backend/utils/adt/pgstatfuncs.c
b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..98b01e54fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
/* Translate command name into command type code. */
if (pg_strcasecmp(cmd, "VACUUM") == 0)
cmdtype = PROGRESS_COMMAND_VACUUM;
+ if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+ cmdtype = PROGRESS_COMMAND_ANALYZE;
else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
cmdtype = PROGRESS_COMMAND_CLUSTER;
else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
it should be an "else if" here.
Everything else LGTM.
Hi,
In monitoring.sgml, "a" is missing in "row for ech backend that is
currently running that command[...]".
Anthony
On Tuesday, July 2, 2019, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Show quoted text
Here's a patch that implements progress reporting for ANALYZE.
Patch applies, code and doc and compiles cleanly. I have few comments:
@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params, if (numrows > 0) { MemoryContext col_context, - old_context; + old_context; + const int index[] = { + PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_TOTAL_BLOCKS, + PROGRESS_ANALYZE_BLOCKS_DONE + }; + const int64 val[] = { + PROGRESS_ANALYZE_PHASE_ANALYSIS, + 0, 0 + }; + + pgstat_progress_update_multi_param(3, index, val); [...] } + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPLETE); + If there wasn't any row but multiple blocks were scanned, the PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about the blocks that were scanned. I'm not sure if we should stay consistent here.diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 05240bfd14..98b01e54fa 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS) /* Translate command name into command type code. */ if (pg_strcasecmp(cmd, "VACUUM") == 0) cmdtype = PROGRESS_COMMAND_VACUUM; + if (pg_strcasecmp(cmd, "ANALYZE") == 0) + cmdtype = PROGRESS_COMMAND_ANALYZE; else if (pg_strcasecmp(cmd, "CLUSTER") == 0) cmdtype = PROGRESS_COMMAND_CLUSTER; else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)it should be an "else if" here.
Everything else LGTM.
Hi Alvaro!
On 2019/06/22 3:52, Alvaro Herrera wrote:
Hello
Here's a patch that implements progress reporting for ANALYZE.
Sorry for the late reply.
My email address was changed to tatsuro.yamada.tf@nttcom.co.jp.
I have a question about your patch.
My ex-colleague Vinayak created same patch in 2017 [1]ANALYZE command progress checker /messages/by-id/968b4eda-2417-8b7b-d468-71643cf088b6@openscg.com, and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.
However, actually, I think it's okay because the feature is useful
for DBAs, even if your patch can't handle Foreign table.
I'll review your patch in this week. :)
[1]: ANALYZE command progress checker /messages/by-id/968b4eda-2417-8b7b-d468-71643cf088b6@openscg.com
/messages/by-id/968b4eda-2417-8b7b-d468-71643cf088b6@openscg.com
Regards,
Tatsuro Yamada
Hi Alvaro,
I'll review your patch in this week. :)
I tested your patch on 6b854896.
Here is a result. See below:
---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;
[Session #2]
\a
\t
select * from pg_stat_progress_analyze; \watch 0.001
17520|13599|postgres|16387|f|16387|scanning table|4425|14
17520|13599|postgres|16387|f|16387|scanning table|4425|64
17520|13599|postgres|16387|f|16387|scanning table|4425|111
...
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0 <-- Is it Okay??
---------------------------------------------------------
I have a question of the last line of the result.
I'm not sure it is able or not, but I guess it would be better
to keep the phase name (analyzing sample) on the view until the
end of this command. :)
Regards,
Tatsuro Yamada
On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0 <-- Is it Okay??
Why do we zero out the block numbers when we switch phases? The
CREATE INDEX progress reporting patch does that kind of thing too, and
it seems like poor behavior to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2019-Jul-08, Robert Haas wrote:
On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0 <-- Is it Okay??Why do we zero out the block numbers when we switch phases? The
CREATE INDEX progress reporting patch does that kind of thing too, and
it seems like poor behavior to me.
Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.
I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?
I propose that once a field is set, we should leave it set until the end.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?I propose that once a field is set, we should leave it set until the end.
+1
Note that this patch is already behaving like that if the table only
contains dead rows.
Hi Alvaro, Anthony, Julien and Robert,
On 2019/07/09 3:47, Julien Rouhaud wrote:
On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?I propose that once a field is set, we should leave it set until the end.
+1
Note that this patch is already behaving like that if the table only
contains dead rows.
I fixed the patch including:
- Replace "if" to "else if". (Suggested by Julien)
- Fix typo s/ech/each/. (Suggested by Anthony)
- Add Phase "analyzing complete" in the pgstat view. (Suggested by Julien, Robert and me)
It was overlooked to add it in system_views.sql.
I share my re-test result, see below:
---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;
[Session #2]
\a \t
select * from pg_stat_progress_analyze; \watch 0.001
3785|13599|postgres|16384|f|16384|scanning table|4425|6
3785|13599|postgres|16384|f|16384|scanning table|4425|31
3785|13599|postgres|16384|f|16384|scanning table|4425|70
3785|13599|postgres|16384|f|16384|scanning table|4425|109
...
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|analyzing sample|0|0
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and fixed. :)
---------------------------------------------------------
Thanks,
Tatsuro Yamada
Attachments:
v2-0001-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v2-0001-Report-progress-for-ANALYZE.patchDownload+246-4
On 2019-Jul-08, Robert Haas wrote:
On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?
No commit, just discussion in the CREATE INDEX thread.
I propose that once a field is set, we should leave it set until the end.
Hmm, ok. In CREATE INDEX, we use the block counters multiple times. We
can leave them set until the next time we need them, I suppose.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Hmm, ok. In CREATE INDEX, we use the block counters multiple times.
Why do we do that?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2019-Jul-10, Robert Haas wrote:
On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Hmm, ok. In CREATE INDEX, we use the block counters multiple times.
Why do we do that?
Because we scan the table first, then the index, then the table again
(last two for the validation phase of CIC). We count "block numbers"
separately for each of those, and keep those counters in the same pair
of columns. I think we also do that for tuple counters in one case.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jul 10, 2019 at 9:26 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Jul-10, Robert Haas wrote:
On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Hmm, ok. In CREATE INDEX, we use the block counters multiple times.
Why do we do that?
Because we scan the table first, then the index, then the table again
(last two for the validation phase of CIC). We count "block numbers"
separately for each of those, and keep those counters in the same pair
of columns. I think we also do that for tuple counters in one case.
Hmm. I think I would have been inclined to use different counter
numbers for table blocks and index blocks, but perhaps we don't have
room. Anyway, leaving them set until we need them again seems like the
best we can do as things stand.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hello.
At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in <244cb241-168b-d6a9-c45f-a80c34cdc6ad@nttcom.co.jp>
Hi Alvaro, Anthony, Julien and Robert,
On 2019/07/09 3:47, Julien Rouhaud wrote:
On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com>
wrote:On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Yeah, I got the impression that that was determined to be the
desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?I propose that once a field is set, we should leave it set until the
end.+1
Note that this patch is already behaving like that if the table only
contains dead rows.
+1 from me.
I fixed the patch including:
- Replace "if" to "else if". (Suggested by Julien)
- Fix typo s/ech/each/. (Suggested by Anthony)
- Add Phase "analyzing complete" in the pgstat view. (Suggested by
- Julien, Robert and me)
It was overlooked to add it in system_views.sql.I share my re-test result, see below:
---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;[Session #2]
\a \t
select * from pg_stat_progress_analyze; \watch 0.0013785|13599|postgres|16384|f|16384|scanning table|4425|6
3785|13599|postgres|16384|f|16384|scanning table|4425|31
3785|13599|postgres|16384|f|16384|scanning table|4425|70
3785|13599|postgres|16384|f|16384|scanning table|4425|109
...
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|analyzing sample|0|0
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------
I have some comments.
+ const int index[] = {
+ PROGRESS_ANALYZE_PHASE,
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_BLOCKS_DONE
+ };
+ const int64 val[] = {
+ PROGRESS_ANALYZE_PHASE_ANALYSIS,
+ 0, 0
Do you oppose to leaving the total/done blocks alone here:p?
+ BlockNumber nblocks;
+ double blksdone = 0;
Why is it a double even though blksdone is of the same domain
with nblocks? And finally:
+ pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+ ++blksdone);
It is converted into int64.
+ WHEN 2 THEN 'analyzing sample'
+ WHEN 3 THEN 'analyzing sample (extended stats)'
I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:
+ WHEN 2 THEN 'computing stats'
+ WHEN 3 THEN 'computing extended stats'
+ WHEN 4 THEN 'analyzing complete'
And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:
+ WHEN 4 THEN 'completing analyze'
+ WHEN 4 THEN 'finalizing analyze'
+ <entry>Process ID of backend.</entry>
of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.
+ <entry>OID of the database to which this backend is connected.</entry>
+ <entry>Name of the database to which this backend is connected.</entry>
"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)
+ <entry>Whether the current scan includes legacy inheritance children.</entry>
This apparently excludes partition tables but actually it is
included.
"Whether scanning through child tables" ?
I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..
+ The table being scanned (differs from <literal>relid</literal>
+ only when processing partitions or inheritance children).
Is <literal> needed? And I think the parentheses are not needed.
OID of the table currently being scanned. Can differ from relid
when analyzing tables that have child tables.
+ Total number of heap blocks to scan in the current table.
Number of heap blocks on scanning_table to scan?
It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".
+ The command is currently scanning the table.
"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.
+ <command>ANALYZE</command> is currently extracting statistical data
+ from the sample obtained.
Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.
regards.
-
Kyotaro Horiguchi
NTT Open Source Software Center
Hi Horiguchi-san!
On 2019/07/11 19:56, Kyotaro Horiguchi wrote:
Hello.
At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in <244cb241-168b-d6a9-c45f-a80c34cdc6ad@nttcom.co.jp>
Hi Alvaro, Anthony, Julien and Robert,
On 2019/07/09 3:47, Julien Rouhaud wrote:
On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com>
wrote:On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Yeah, I got the impression that that was determined to be the
desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?I propose that once a field is set, we should leave it set until the
end.+1
Note that this patch is already behaving like that if the table only
contains dead rows.+1 from me.
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------I have some comments.
+ const int index[] = { + PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_TOTAL_BLOCKS, + PROGRESS_ANALYZE_BLOCKS_DONE + }; + const int64 val[] = { + PROGRESS_ANALYZE_PHASE_ANALYSIS, + 0, 0Do you oppose to leaving the total/done blocks alone here:p?
Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)
Ah, I revised it to remove "0, 0".
+ BlockNumber nblocks;
+ double blksdone = 0;Why is it a double even though blksdone is of the same domain
with nblocks? And finally:+ pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE, + ++blksdone);It is converted into int64.
Fixed.
But is it suitable to use BlockNumber instead int64?
+ WHEN 2 THEN 'analyzing sample' + WHEN 3 THEN 'analyzing sample (extended stats)'I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:+ WHEN 2 THEN 'computing stats' + WHEN 3 THEN 'computing extended stats'+ WHEN 4 THEN 'analyzing complete'
And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:+ WHEN 4 THEN 'completing analyze' + WHEN 4 THEN 'finalizing analyze'
I have no strong opinion, so I changed the phase-names based on
your suggestions like following:
WHEN 2 THEN 'computing stats'
WHEN 3 THEN 'computing extended stats'
WHEN 4 THEN 'finalizing analyze'
However, I'd like to get any comments from hackers to get a consensus
about the names.
+ <entry>Process ID of backend.</entry>
of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.+ <entry>OID of the database to which this backend is connected.</entry> + <entry>Name of the database to which this backend is connected.</entry>"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)
I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)
+ <entry>Whether the current scan includes legacy inheritance children.</entry>
This apparently excludes partition tables but actually it is
included."Whether scanning through child tables" ?
I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..
Hmm... I'm also not sure but I fixed as you suggested.
+ The table being scanned (differs from <literal>relid</literal> + only when processing partitions or inheritance children).Is <literal> needed? And I think the parentheses are not needed.
OID of the table currently being scanned. Can differ from relid
when analyzing tables that have child tables.
How about:
OID of the table currently being scanned.
It might be different from relid when analyzing tables that have child tables.
+ Total number of heap blocks to scan in the current table.
Number of heap blocks on scanning_table to scan?
It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".
How about:
Total number of heap blocks in the scanning_table.
+ The command is currently scanning the table.
"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.
Fixed.
+ <command>ANALYZE</command> is currently extracting statistical data + from the sample obtained.Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.
Fixed.
Please find attached patch. :)
Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;
9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================
Thanks,
Tatsuro Yamada
Attachments:
v3-0001-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v3-0001-Report-progress-for-ANALYZE.patchDownload+236-5
Hello.
# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..
At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in <0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp>
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------I have some comments. + const int index[] = { + PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_TOTAL_BLOCKS, + PROGRESS_ANALYZE_BLOCKS_DONE + }; + const int64 val[] = { + PROGRESS_ANALYZE_PHASE_ANALYSIS, + 0, 0 Do you oppose to leaving the total/done blocks alone here:p?Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)Ah, I revised it to remove "0, 0".
Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.
+ BlockNumber nblocks; + double blksdone = 0; Why is it a double even though blksdone is of the same domain with nblocks? And finally: + pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE, + ++blksdone); It is converted into int64.Fixed.
But is it suitable to use BlockNumber instead int64?
Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.
+ WHEN 2 THEN 'analyzing sample' + WHEN 3 THEN 'analyzing sample (extended stats)' I think we should avoid parenthesized phrases as far as not-necessary. That makes the column unnecessarily long. The phase is internally called as "compute stats" so *I* would prefer something like the followings: + WHEN 2 THEN 'computing stats' + WHEN 3 THEN 'computing extended stats' + WHEN 4 THEN 'analyzing complete' And if you are intending by this that (as mentioned in the documentation) "working to complete this analyze", this would be: + WHEN 4 THEN 'completing analyze' + WHEN 4 THEN 'finalizing analyze'I have no strong opinion, so I changed the phase-names based on
your suggestions like following:WHEN 2 THEN 'computing stats'
WHEN 3 THEN 'computing extended stats'
WHEN 4 THEN 'finalizing analyze'However, I'd like to get any comments from hackers to get a consensus
about the names.
Agreed. Especially such word choosing is not suitable for me..
+ <entry>Process ID of backend.</entry>
of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.+ <entry>OID of the database to which this backend is connected.</entry> + <entry>Name of the database to which this backend is connected.</entry> "database to which .. is connected" is phrased as "database this backend is connected to" in pg_stat_acttivity. (Just for consistency)I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)
Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.
+ <entry>Whether the current scan includes legacy inheritance
children.</entry>
This apparently excludes partition tables but actually it is
included."Whether scanning through child tables" ?
I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..Hmm... I'm also not sure but I fixed as you suggested.
Yeah, I also am not sure the suggestion is good enough as is..
+ The table being scanned (differs from <literal>relid</literal> + only when processing partitions or inheritance children). Is <literal> needed? And I think the parentheses are not needed. OID of the table currently being scanned. Can differ from relid when analyzing tables that have child tables.How about:
OID of the table currently being scanned.
It might be different from relid when analyzing tables that have child
tables.+ Total number of heap blocks to scan in the current table.
Number of heap blocks on scanning_table to scan?
It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".How about:
Total number of heap blocks in the scanning_table.
(For me, ) that seems like it shows blocks including all
descendents for inheritance parent. But I'm not sure..a
+ The command is currently scanning the table.
"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.Fixed.
+ <command>ANALYZE</command> is currently extracting statistical data + from the sample obtained. Something like "The command is computing stats from the samples obtained in the previous phase" might be better.Fixed.
Please find attached patch. :)
Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================
Looks fine!
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi Horiguchi-san, Alvaro, Anthony, Julien and Robert,
On 2019/07/22 17:30, Kyotaro Horiguchi wrote:
Hello.
# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in <0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp>
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------I have some comments. + const int index[] = { + PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_TOTAL_BLOCKS, + PROGRESS_ANALYZE_BLOCKS_DONE + }; + const int64 val[] = { + PROGRESS_ANALYZE_PHASE_ANALYSIS, + 0, 0 Do you oppose to leaving the total/done blocks alone here:p?Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)Ah, I revised it to remove "0, 0".
Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.
"PROGRESS_ANALYZE_PHASE_ANALYSIS" was replaced with
"PROGRESS_ANALYZE_PHASE_COMPUTING" because I changed
the phase-name on v3.patch like this:
./src/include/commands/progress.h
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE 1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING 2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE 4
Is it Okay?
+ BlockNumber nblocks; + double blksdone = 0; Why is it a double even though blksdone is of the same domain with nblocks? And finally: + pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE, + ++blksdone); It is converted into int64.Fixed.
But is it suitable to use BlockNumber instead int64?Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.
Fixed. Thanks for the clarification. :)
Attached v4 patch file only includes this fix.
+ WHEN 2 THEN 'analyzing sample' + WHEN 3 THEN 'analyzing sample (extended stats)' I think we should avoid parenthesized phrases as far as not-necessary. That makes the column unnecessarily long. The phase is internally called as "compute stats" so *I* would prefer something like the followings: + WHEN 2 THEN 'computing stats' + WHEN 3 THEN 'computing extended stats' + WHEN 4 THEN 'analyzing complete' And if you are intending by this that (as mentioned in the documentation) "working to complete this analyze", this would be: + WHEN 4 THEN 'completing analyze' + WHEN 4 THEN 'finalizing analyze'I have no strong opinion, so I changed the phase-names based on
your suggestions like following:WHEN 2 THEN 'computing stats'
WHEN 3 THEN 'computing extended stats'
WHEN 4 THEN 'finalizing analyze'However, I'd like to get any comments from hackers to get a consensus
about the names.Agreed. Especially such word choosing is not suitable for me..
To Alvaro, Julien, Anthony and Robert,
Do you have any ideas? :)
+ <entry>Process ID of backend.</entry>
of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.+ <entry>OID of the database to which this backend is connected.</entry> + <entry>Name of the database to which this backend is connected.</entry> "database to which .. is connected" is phrased as "database this backend is connected to" in pg_stat_acttivity. (Just for consistency)I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.
Sounds reasonable. However, I'd like to create another patch after
this feature was committed since that document fix influence other
progress monitor's description on the document.
+ <entry>Whether the current scan includes legacy inheritance
children.</entry>
This apparently excludes partition tables but actually it is
included."Whether scanning through child tables" ?
I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..Hmm... I'm also not sure but I fixed as you suggested.
Yeah, I also am not sure the suggestion is good enough as is..
+ Total number of heap blocks to scan in the current table.
Number of heap blocks on scanning_table to scan?
It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".How about:
Total number of heap blocks in the scanning_table.(For me, ) that seems like it shows blocks including all
descendents for inheritance parent. But I'm not sure..a
In the case of scanning_table is parent table, it doesn't
show the number. However, child tables shows the number.
I tested it using Declarative partitioning table, See the bottom
of this email. :)
Please find attached patch. :)
Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;9067|13599|postgres|16387|f|16387|scanning table|443|14
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================Looks fine!
I shared a test result using Declarative partitioning table.
==========================================================
## Create partitioning table
create table hoge as select a from generate_series(0, 40000) a;
create table hoge2(
a integer
) partition by range(a);
create table hoge2_10000 partition of hoge2
for values from (0) to (10000);
create table hoge2_20000 partition of hoge2
for values from (10000) to (20000);
create table hoge2_30000 partition of hoge2
for values from (20000) to (30000);
create table hoge2_default partition of hoge2 default;
## Test
select oid,relname,relpages,reltuples from pg_class where relname like 'hoge%';
oid | relname | relpages | reltuples
-------+---------------+----------+-----------
16538 | hoge | 177 | 40001
16541 | hoge2 | 0 | 0
16544 | hoge2_10000 | 45 | 10000
16547 | hoge2_20000 | 45 | 10000
16550 | hoge2_30000 | 45 | 10000
16553 | hoge2_default | 45 | 10001
(6 rows)
select * from pg_stat_progress_analyze ; \watch 0.00001;
27579|13599|postgres|16541|t|16544|scanning table|45|17
27579|13599|postgres|16541|t|16544|scanning table|45|38
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|17
27579|13599|postgres|16541|t|16547|scanning table|45|37
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|9
27579|13599|postgres|16541|t|16550|scanning table|45|30
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16553|scanning table|45|5
27579|13599|postgres|16541|t|16553|scanning table|45|26
27579|13599|postgres|16541|t|16553|scanning table|45|42
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|finalizing analyze|45|45
27579|13599|postgres|16544|f|16544|scanning table|45|1
27579|13599|postgres|16544|f|16544|scanning table|45|30
27579|13599|postgres|16544|f|16544|computing stats|45|45
27579|13599|postgres|16547|f|16547|scanning table|45|25
27579|13599|postgres|16547|f|16547|computing stats|45|45
27579|13599|postgres|16550|f|16550|scanning table|45|11
27579|13599|postgres|16550|f|16550|scanning table|45|38
27579|13599|postgres|16550|f|16550|finalizing analyze|45|45
27579|13599|postgres|16553|f|16553|scanning table|45|25
27579|13599|postgres|16553|f|16553|computing stats|45|45
==========================================================
I'll share test result using partitioning table via
Inheritance tables on next email. :)
Thanks,
Tatsuro Yamada
Attachments:
v4-0001-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v4-0001-Report-progress-for-ANALYZE.patchDownload+236-5
On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
Attached v4 patch file only includes this fix.
Hello all,
I've moved this to the September CF, where it is in "Needs review" state.
Thanks,
--
Thomas Munro
https://enterprisedb.com
On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:Attached v4 patch file only includes this fix.
I've moved this to the September CF, where it is in "Needs review" state.
/me reviews.
+ <entry><structfield>scanning_table</structfield></entry>
I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.
+ The command is computing extended stats from the samples
obtained in the previous phase.
I think you should change this (and the previous one) to say "from the
samples obtained during the table scan."
+ Total number of heap blocks in the scanning_table.
Perhaps I'm confused, but it looks to me like what you are advertising
is the number of blocks that will be sampled, not the total number of
blocks in the table. I think that's the right thing to advertise, but
then the column should be named and documented that way.
+ {
+ const int index[] = {
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_SCANREL
+ };
+ const int64 val[] = {
+ nblocks,
+ RelationGetRelid(onerel)
+ };
+
+ pgstat_progress_update_multi_param(2, index, val);
+ }
This block seems to be introduced just so you can declare variables; I
don't think that's good style. It's arguably unnecessary because we
now are selectively allowing variable declarations within functions,
but I think you should just move the first array to the top of the
function and the second declaration to the top of the function
dropping const, and then just do val[0] = nblocks and val[1] =
RelationGetRelid(onerel). Maybe you can also come up with better
names than 'index' and 'val'. Same comment applies to another place
where you have something similar.
Patch seems to need minor rebasing.
Maybe "scanning table" should be renamed "acquiring sample rows," to
match the names used in the code?
I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count. That creates a race during which users could see the first bit
of data set and the second not set. I don't see any reason not to set
all four fields together.
Please be sure to make the names of the constants you use match up
with the external names as far as it reasonably makes sense, e.g.
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE 1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING 2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE 4
vs.
+ WHEN 0 THEN 'initializing'::text
+ WHEN 1 THEN 'scanning table'::text
+ WHEN 2 THEN 'computing stats'::text
+ WHEN 3 THEN 'computing extended stats'::text
+ WHEN 4 THEN 'finalizing analyze'::text
Not terrible, but it could be closer.
Similarly with the column names (include_children vs. INH).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company