monitoring CREATE INDEX [CONCURRENTLY]

Started by Alvaro Herreraover 7 years ago60 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

Monitoring progress of CREATE INDEX [CONCURRENTLY] is sure to be welcome,
so here's a proposal.

There are three distinct interesting cases. One is straight CREATE
INDEX of a standalone table; then we have CREATE INDEX CONCURRENTLY;
finally, CREATE INDEX on a partitioned table. Note that there's no
CONCURRENTLY for partitioned tables.

A non-concurrent build is a very straightforward: we call create_index,
which does index_build, done. See below for how to report for
index_build, which is the interesting part. I propose not to report
anything else than that for non-concurrent build. There's some
preparatory work that's identical than for CIC (see below). Like
VACUUM, it seems a bit pointless to report an initial phase that's
almost immediate, so I propose we just don't report anything until the
actual index building starts.

CREATE INDEX CONCURRENTLY does these things first, which we would not
report (this is just like VACUUM, which only starts reporting once it
starts scanning blocks):

a. lock rel. No metrics to report.
b. other prep; includes lots of catalog access. Unlikely to lock, but
not impossible. No metrics to report.
c. create_index. CIC skips index_build here, so there's no reason to
report it either.

We would start reporting at this point, with these phases:

1. WaitForLockers 1. Report how many xacts do we need to wait for, how
many are done.
2. index_build. See below.
3. WaitForLockers 2. Report how many xacts do we need to wait for, how
many are done.
4. validate_index. Scans the whole rel again. Report number of blocks
scanned.
5. wait for virtual XIDs. Like WaitForLockers: report how many xacts we
need to wait for, how many are done.

We're done.

(Alternatively, we could have an initial "prep" phase for a/b/c for the
concurrent case and a/b for non-concurrent. I'm just not sure it's
useful.)

index_build
-----------

The actual index building is an AM-specific undertaking, and we report
its progress separately from the AM-agnostic code. That is, each AM has
freedom to define its own list of phases and counters, separate from the
generic code. This avoids the need to provide a new AM method or invoke
callbacks. So when you see that CREATE_INDEX_PHASE is either "index
build" you'll have a separate BTREE_CREATE_PHASE value set to either
"scanning heap" or "sorting" or "building upper layers"; equivalently
for other AMs.

Partitioned indexes
-------------------

For partitioned indexes, we only have the index build phase, but we
repeat it for each partition. In addition to the index_build metrics
described above, we should report how many partitions we need to handle
in total and how many partitions are already done. (I'm avoiding
getting in the trouble of reporting *which* partition we're currently
handling and have already handled.)

Thoughts?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
Re: monitoring CREATE INDEX [CONCURRENTLY]

For discussion, here's an preliminary patch. This is just a first
skeleton; needs to grow a lot of flesh yet, per my previous proposal.
As far as the generic CREATE INDEX stuff goes, I think this is complete;
it's missing the AM-specific bits.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

create-index-progress.patchtext/x-diff; charset=us-asciiDownload+114-14
#3Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Alvaro Herrera (#2)
Re: monitoring CREATE INDEX [CONCURRENTLY]

On Tue, Jan 1, 2019 at 6:09 AM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

For discussion, here's an preliminary patch. This is just a first
skeleton; needs to grow a lot of flesh yet, per my previous proposal.
As far as the generic CREATE INDEX stuff goes, I think this is complete;
it's missing the AM-specific bits.

Looks like it's missing the validate_index blocks-scanned report, which is
not AM-specific and your original proposal does mention that.

I thought a bit about index_build part. If most AMs follow a somewhat
standard phases while building an index, it might be simpler to define
those phases and have AM agnostic code report those phases. Or may be just
report the most significant information, instead of reporting each
sub-phase of index_build. I think the most important progress to know would
be how far the heap is scanned for to-be-indexed tuples. AFAICS all AMs
use IndexBuildHeapScan() to scan the heap. Can we simply do some reporting
from that routine? Like number of blocks scanned against the total number
of blocks requested?

Some minor comments on the patch, though I suspect you might be already
updating the patch since you marked it as WIP.

+CREATE VIEW pg_stat_progress_create_index AS
+ SELECT
+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param1 WHEN 0 THEN 'initializing'
+   WHEN 1 THEN 'waiting for lockers 1'
+   WHEN 2 THEN 'building index'
+   WHEN 3 THEN 'waiting for lockers 2'
+   WHEN 4 THEN 'validating index'
+   WHEN 5 THEN 'waiting for lockers 3'

Can we have more descriptive text for waiters? Such as "waiting for existing
writers", "waiting for intermediate writers" and "waiting for old readers".
Not
sure if I got those correct, something of that sort will definitely give
more
insight into what the transaction is waiting for.

Can we actually also report the list of transactions the command is waiting
on?
That could be useful to the user if CIC appears to be stuck too long.

+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+ nparts);
+

IMHO we should just use full term INDEX instead of IDX, such as
PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple
of
extra characters won't make a difference. I did not see much precedence to
shortern to IDX for INDEX elsewhere in the code (though we tend to do that
for
variable names etc).

@@ -1282,6 +1318,9 @@ DefineIndex(Oid relationId,
  old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
    PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
    &n_old_snapshots);
+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+ PROGRESS_CREATEIDX_PHASE_WAIT_3);
+ pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);

I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
when the wait phase is over, to avoid any confusion. For example, I noticed
that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Shouldn't PROGRESS_WAITFOR_DONE be updated when we skip a snapshot in the
code below?
if (!VirtualTransactionIdIsValid(old_snapshots[i]))
continue; /* found uninteresting in previous cycle */

@@ -2817,7 +2818,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
  * uses of the result.
  */
 VirtualTransactionId *
-GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
+GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)

Could that out variable be named something differently? "countp" or
something
like that. I did not check if there is some practice that we follow, but I
remember suffixing with "p" rather than prefixing with "o" (for out I
assume)

+
+/* Progress parameters for CREATE INDEX */
+#define PROGRESS_CREATEIDX_PHASE 0
+/* 1 and 2 reserved for "waitfor" metrics */
+#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
+#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
+

Is there a reason to leave those reserve placeholders, only to fill them a
few
lines down?

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4Rahila Syed
rahilasyed90@gmail.com
In reply to: Alvaro Herrera (#1)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi Alvaro,

The WIP patch needs a rebase. Please see few in-line comments.

On Fri, Dec 21, 2018 at 3:30 AM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Monitoring progress of CREATE INDEX [CONCURRENTLY] is sure to be welcome,
so here's a proposal.

There are three distinct interesting cases. One is straight CREATE
INDEX of a standalone table; then we have CREATE INDEX CONCURRENTLY;
finally, CREATE INDEX on a partitioned table. Note that there's no
CONCURRENTLY for partitioned tables.

A non-concurrent build is a very straightforward: we call create_index,
which does index_build, done. See below for how to report for
index_build, which is the interesting part. I propose not to report
anything else than that for non-concurrent build. There's some
preparatory work that's identical than for CIC (see below). Like
VACUUM, it seems a bit pointless to report an initial phase that's
almost immediate, so I propose we just don't report anything until the
actual index building starts.

Aren't we reporting this initial preparatory work in the form of
'initializing' phase that you
have in current patch? IIUC, there are no metrics but the name of the phase.

CREATE INDEX CONCURRENTLY does these things first, which we would not

report (this is just like VACUUM, which only starts reporting once it
starts scanning blocks):

a. lock rel. No metrics to report.
b. other prep; includes lots of catalog access. Unlikely to lock, but
not impossible. No metrics to report.
c. create_index. CIC skips index_build here, so there's no reason to
report it either.

We would start reporting at this point, with these phases:

1. WaitForLockers 1. Report how many xacts do we need to wait for, how
many are done.
2. index_build. See below.
3. WaitForLockers 2. Report how many xacts do we need to wait for, how
many are done.
4. validate_index. Scans the whole rel again. Report number of blocks
scanned.
5. wait for virtual XIDs. Like WaitForLockers: report how many xacts we
need to wait for, how many are done.

We're done.

(Alternatively, we could have an initial "prep" phase for a/b/c for the
concurrent case and a/b for non-concurrent. I'm just not sure it's
useful.)

index_build
-----------

The actual index building is an AM-specific undertaking, and we report
its progress separately from the AM-agnostic code. That is, each AM has
freedom to define its own list of phases and counters, separate from the
generic code. This avoids the need to provide a new AM method or invoke
callbacks. So when you see that CREATE_INDEX_PHASE is either "index
build" you'll have a separate BTREE_CREATE_PHASE value set to either
"scanning heap" or "sorting" or "building upper layers"; equivalently
for other AMs.

OK.

I think the main phases in which index_build for most AMs can be divided is
as follows:
1. Scanning heap tuples for building index which is common for all AMs
2. Forming index tuples which is AM-specific
3. Writing tuples into the index which is AM-specific.
Out of these, metrics for phase 1 can be heap_tuples_scanned /
total_heap_tuples and
for phase 3, it can be index_tuples_computed/ total_index_tuples.
I am not sure about metrics for phase 2 especially for Btree which involves
reporting progress of sorting.

Partitioned indexes

-------------------

For partitioned indexes, we only have the index build phase, but we
repeat it for each partition. In addition to the index_build metrics
described above, we should report how many partitions we need to handle
in total and how many partitions are already done. (I'm avoiding
getting in the trouble of reporting *which* partition we're currently
handling and have already handled.)

Thoughts?

+CREATE VIEW pg_stat_progress_create_index AS

+ SELECT
+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param1 WHEN 0 THEN 'initializing'
+   WHEN 1 THEN 'waiting for lockers 1'
+   WHEN 2 THEN 'building index'
+   WHEN 3 THEN 'waiting for lockers 2'
+   WHEN 4 THEN 'validating index'
+   WHEN 5 THEN 'waiting for lockers 3'
+   END as phase,
+ S.param2 AS procs_to_wait_for,
+ S.param3 AS procs_waited_for,
+ S.param4 AS partitions_to_build,
+ S.param5 AS partitions_built
+ FROM pg_stat_get_progress_info('CREATE INDEX') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+

1. In the above code, I think it will be useful if we report phases as
'Initializing phase 1 of n'
'Waiting for lockers phase 2 of n' etc. i.e reporting total number of
phases as well.

+               holders = lappend(holders,
+                                                 GetLockConflicts(locktag,
lockmode, &count));
+               total += count;
2. IIUC, the above code in WaitForLockersMultiple can be written under
if(progress) condition like rest of the progress checking code in that
function
and pass NULL for count otherwise.

3. Will it be useful to report pid's of the backend's
for the transactions which CREATE INDEX concurrently is waiting for?
I think it can be used to debug long running transactions.

4. Should we have additional statistics update phase before
index_update_stats
like PROGRESS_VACUUM_PHASE_FINAL_CLEANUP?

5. I think it may be useful to report number of parallel workers requested
and number of workers
actually running index build in case of btree.

6. Also, this can be reported as an additional validation phase for
exclusion constraint
in index_build function as it involves scanning all live tuples of heap
relation.

/*
* If it's for an exclusion constraint, make a second pass over the
heap
* to verify that the constraint is satisfied. We must not do this
until
* the index is fully valid. (Broken HOT chains shouldn't matter,
though;
* see comments for IndexCheckExclusion.)
*/
if (indexInfo->ii_ExclusionOps != NULL)
IndexCheckExclusion(heapRelation, indexRelation, indexInfo);
*/

Thank you,
Rahila Syed

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pavan Deolasee (#3)
Re: monitoring CREATE INDEX [CONCURRENTLY]

On 2019-Jan-09, Pavan Deolasee wrote:

Looks like it's missing the validate_index blocks-scanned report, which is
not AM-specific and your original proposal does mention that.

True. That phase is actually 3 phases, which would be reported
separately:

5. index_bulk_delete() scan
6. performsort
7. validate_index_heapscan

I thought a bit about index_build part. If most AMs follow a somewhat
standard phases while building an index, it might be simpler to define
those phases and have AM agnostic code report those phases.

Well, think about btrees. We first scan the whole table putting all
tuples in a spool (two spools actually), then we tell the spools to get
sorted, then we extract tuples from the spools, and finally we read the
spool to produce the leaf pages. If we just report the table scan, the
reports gets to 100% complete in that phase and then waits for a very
long time during which nothing seems to happen. That's not cool.

I'm adding a new AM support routine which turns the subphase number into
a textual description, so that we don't have to hardcode phase names in
the agnostic code.

Can we have more descriptive text for waiters? Such as "waiting for
existing writers", "waiting for intermediate writers" and "waiting for
old readers". Not sure if I got those correct, something of that sort
will definitely give more insight into what the transaction is waiting
for.

Can do.

Can we actually also report the list of transactions the command is waiting
on?
That could be useful to the user if CIC appears to be stuck too long.

I'm afraid this is not possible, because the progress report interface
doesn't allow for something like that.

IMHO we should just use full term INDEX instead of IDX, such as
PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple
of extra characters won't make a difference.

Really? I though it was clear enough and it's *three* characters saved
even.

I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
when the wait phase is over, to avoid any confusion. For example, I noticed
that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Yes, absolutely.

+GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)

Could that out variable be named something differently? "countp" or
something like that. I did not check if there is some practice that we
follow, but I remember suffixing with "p" rather than prefixing with
"o" (for out I assume)

Sure.

+/* Progress parameters for CREATE INDEX */
+#define PROGRESS_CREATEIDX_PHASE 0
+/* 1 and 2 reserved for "waitfor" metrics */
+#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
+#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
+

Is there a reason to leave those reserve placeholders, only to fill them a
few lines down?

Yes -- those are going to be used by other reports that also use similar
metrics, such as the CLUSTER report.

I'm running out of columns to put the numbers into :-( Right now I have

1. phase
2. subphase (for index_build)
3. lockers total (to wait for)
4. lockers done
5. blocks total
6. blocks done
7. tapes total
8. tapes done
9. partitions total
10. partitions done

The "tapes total/done" bit is about reporting the performsort steps; I'm
not really sure yet it'll be tapes, but I hope it can be done with two
numbers. Anyway, in btree build I have these subphases

1. spool heapscan using IndexBuildHeapScan
2. _bt_parallel_heapscan stuff (only one of these in a build)
3. bt_leafbuild, performsort spool 1
4. bt_leafbuild, performsort spool 2
5. bt_load

and I don't have columns to put the metrics for the btload thing,
assuming I figure out what those are.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#5)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi all,

Based on the latest emails exchanged, the patch got some feedback from
Pavan which has not been addressed. So I am marking the patch as
returned with feedback.
--
Michael

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Rahila Syed (#4)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi Rahila, Pavan,

Thanks for the review. I incorporated some fixes per your comments.
More fixes are needed still. That said, the patch in attachment gives
good insight into how I think this should turn out.

index_build
-----------

OK.
I think the main phases in which index_build for most AMs can be divided is
as follows:
[...]

I ended up defining phases for the index_build phase in the AM itself;
the code reports a phase number using the regular API, and then the
pgstat_progress view obtains the name of each phase using a support
method.

For btree, I ended up not reporting anything about the sort per se; we
just scan the heap (reporting block numbers, which is easy because we
know the heap size in advance) and count heap tuples while scanning;
once that's done, we know how many tuples we need to write out to the
index, so that total becomes the next stage's target total. That's
simpler. (It is wrong for partial indexes currently, though.)

So for btree we have this:

/*
* btphasename() -- Return name of phase, for index build progress report
*/
char *
btphasename(int64 phasenum)
{
switch (phasenum)
{
case PROGRESS_CREATEIDX_SUBPHASE_INITIALIZE:
return "initializing (1/5)";
case PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN:
return "table scan (2/5)";
case PROGRESS_BTREE_PHASE_PERFORMSORT_1:
return "sorting tuples, spool 1 (3/5)";
case PROGRESS_BTREE_PHASE_PERFORMSORT_2:
return "sorting tuples, spool 2 (4/5)";
case PROGRESS_BTREE_PHASE_LEAF_LOAD:
return "final btree sort & load (5/5)";
default:
return NULL;
}
}

Now this is a bit strange, because the report looks like this:

phase | building index (3 of 8): initializing (1/5)
[...]
blocks total | 442478
blocks done | 3267

So for phase 3, we always have phase and subphase counters in the phase
name. However, I don't think there's any clean way to know from the
very beginning how many subphases are there going to be, and it would be
even more confusing to have the total "of N" number vary each time
depending on the access method. So this leaves the phase counter going
from 1 to 8, and then for phase 3 you have a second part that goes from
1 to N.

Anyway, at some point it completes the heap scan, and the phase changes
to this:

phase | building index (3 of 8): heap scan(2/5)

I think I should take Rahila's suggestion to report the number of
workers running (but I'm not sure I see the point in reporting number of
workers planned).

The heap scan quickly gives way to the next one,

phase | building index (3 of 8): final btree sort & load (5/5)
[...]
tuples total | 100000000
tuples done | 58103713

Finally,
phase | validating index scan (phase 5 of 8)
and
phase | validate index heapscan (phase 7 of 8)

and then it's over.

Now, it's clear that I haven't yet nailed all the progress updates
correctly, because there are some stalls where nothing seems to be
happening. The final index_validate phases have been ignored so far.

1. In the above code, I think it will be useful if we report phases as
'Initializing phase 1 of n'
'Waiting for lockers phase 2 of n' etc. i.e reporting total number of
phases as well.

Great idea, done.

2. IIUC, the above code in WaitForLockersMultiple can be written under
if(progress) condition like rest of the progress checking code in that
function
and pass NULL for count otherwise.

Yep, good point.

3. Will it be useful to report pid's of the backend's
for the transactions which CREATE INDEX concurrently is waiting for?
I think it can be used to debug long running transactions.

Done.

4. Should we have additional statistics update phase before
index_update_stats
like PROGRESS_VACUUM_PHASE_FINAL_CLEANUP?

Not sure about this one ... it's supposed to be a fairly quick
operation.

5. I think it may be useful to report number of parallel workers requested
and number of workers
actually running index build in case of btree.

True, I haven't done this one. I'll add it next.

6. Also, this can be reported as an additional validation phase for
exclusion constraint
in index_build function as it involves scanning all live tuples of heap
relation.

/*
* If it's for an exclusion constraint, make a second pass over the
heap
* to verify that the constraint is satisfied. We must not do this
until
* the index is fully valid. (Broken HOT chains shouldn't matter,
though;
* see comments for IndexCheckExclusion.)
*/
if (indexInfo->ii_ExclusionOps != NULL)
IndexCheckExclusion(heapRelation, indexRelation, indexInfo);
*/

Hmm, I haven't looked into exclusion constraints thus far ... I suppose
this is a critical point to keep in mind.

Thanks for the review!

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-create-index-progress.patchtext/x-diff; charset=us-asciiDownload+343-34
#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#7)
Re: monitoring CREATE INDEX [CONCURRENTLY]

I added metrics for the validate_index phases; patch attached. This is
still a bit raw, but it looks much better now. Here's a sample
concurrent index build on a 100M tuple table. There are no concurrent
transactions, so phases that wait for lockers don't appear. I'm not
making any effort to report metrics during phase 6, note. In phase 7
I'm currently reporting only tuple counts; I think it'd be better to
report block numbers.

I don't show it here, but when an index is built on a partitioned table,
the "partitions done" number goes up all the way to "partitions total"
and the phases come and go for each partition.

One thing is clear: given the phase mechanics of varying durations, it
seems hard to use these numbers to predict total index build time.

now | phase | blocks total | blocks done | tuples total | tuples done | partitions total | partitions done
--------------+--------------------------------------------------------+--------------+-------------+--------------+-------------+------------------+-----------------
15:56:33.792 | building index (3 of 8): initializing (1/5) | 442478 | 32 | 0 | 0 | 0 | 0
15:56:33.805 | building index (3 of 8): initializing (1/5) | 442478 | 188 | 0 | 0 | 0 | 0
[snip about 500 lines]
15:56:41.345 | building index (3 of 8): initializing (1/5) | 442478 | 439855 | 0 | 0 | 0 | 0
15:56:41.356 | building index (3 of 8): initializing (1/5) | 442478 | 440288 | 0 | 0 | 0 | 0
15:56:41.367 | building index (3 of 8): initializing (1/5) | 442478 | 440778 | 0 | 0 | 0 | 0
15:56:41.378 | building index (3 of 8): initializing (1/5) | 442478 | 441706 | 0 | 0 | 0 | 0
15:56:41.389 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:41.4 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0

[snip 300 lines] ... I'm not sure what happened in this 3 seconds period. No metrics were being updated.

15:56:44.65 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:44.661 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:44.672 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:44.682 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:44.694 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 442478 | 442399 | 100000000 | 0 | 0 | 0
15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 442478 | 442399 | 100000000 | 0 | 0 | 0
15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 79057 | 0 | 0
15:56:44.738 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 217018 | 0 | 0
15:56:44.75 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 353804 | 0 | 0
15:56:44.761 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 487892 | 0 | 0
15:56:44.773 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 634136 | 0 | 0
[snip 660 lines]
15:56:52.47 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 99111337 | 0 | 0
15:56:52.482 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 99285701 | 0 | 0
15:56:52.493 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 99444763 | 0 | 0
15:56:52.505 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 99612313 | 0 | 0
15:56:52.516 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 99782666 | 0 | 0
15:56:52.528 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 99937524 | 0 | 0
15:56:52.54 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.551 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.561 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.572 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.583 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.594 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.604 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.615 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.626 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.637 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.648 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.658 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.669 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.68 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.691 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.701 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.712 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 100000000 | 0 | 0
15:56:52.723 | validating index scan (phase 5 of 8) | 274194 | 1049 | 0 | 0 | 0 | 0
15:56:52.734 | validating index scan (phase 5 of 8) | 274194 | 2676 | 0 | 0 | 0 | 0
15:56:52.744 | validating index scan (phase 5 of 8) | 274194 | 2876 | 0 | 0 | 0 | 0
15:56:52.755 | validating index scan (phase 5 of 8) | 274194 | 4194 | 0 | 0 | 0 | 0
[snip 400 lines]
15:56:57.031 | validating index scan (phase 5 of 8) | 274194 | 268343 | 0 | 0 | 0 | 0
15:56:57.042 | validating index scan (phase 5 of 8) | 274194 | 268479 | 0 | 0 | 0 | 0
15:56:57.053 | validating index scan (phase 5 of 8) | 274194 | 270027 | 0 | 0 | 0 | 0
15:56:57.064 | validating index scan (phase 5 of 8) | 274194 | 271580 | 0 | 0 | 0 | 0
15:56:57.075 | validating index scan (phase 5 of 8) | 274194 | 273121 | 0 | 0 | 0 | 0
15:56:57.085 | sorting index scan results (phase 6 of 8) | 274194 | 274193 | 0 | 0 | 0 | 0
15:56:57.096 | sorting index scan results (phase 6 of 8) | 274194 | 274193 | 0 | 0 | 0 | 0
15:56:57.107 | sorting index scan results (phase 6 of 8) | 274194 | 274193 | 0 | 0 | 0 | 0
15:56:57.118 | sorting index scan results (phase 6 of 8) | 274194 | 274193 | 0 | 0 | 0 | 0
15:56:57.128 | sorting index scan results (phase 6 of 8) | 274194 | 274193 | 0 | 0 | 0 | 0
15:56:57.139 | sorting index scan results (phase 6 of 8) | 274194 | 274193 | 0 | 0 | 0 | 0
15:56:57.15 | sorting index scan results (phase 6 of 8) | 274194 | 274193 | 0 | 0 | 0 | 0
15:56:57.161 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 50152 | 0 | 0
15:56:57.172 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 175602 | 0 | 0
15:56:57.182 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 305326 | 0 | 0
15:56:57.193 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 430142 | 0 | 0
[snip 730 lines]
15:57:05.003 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 99125556 | 0 | 0
15:57:05.013 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 99276471 | 0 | 0
15:57:05.024 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 99425041 | 0 | 0
15:57:05.035 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 99580174 | 0 | 0
15:57:05.045 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 99720505 | 0 | 0
15:57:05.056 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 99862311 | 0 | 0
15:57:05.067 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.077 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.088 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.099 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.109 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.12 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.131 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.142 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.152 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.163 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
15:57:05.174 | validate index heapscan (phase 7 of 8) | 0 | 0 | 100000000 | 100000000 | 0 | 0
(2862 filas)

and then it's done.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-create-index-progress.patchtext/x-diff; charset=us-asciiDownload+414-35
#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#8)
Re: monitoring CREATE INDEX [CONCURRENTLY]

On Wed, 13 Feb 2019 at 00:46, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Here's a sample
concurrent index build on a 100M tuple table.

Cool

There are no concurrent
transactions, so phases that wait for lockers don't appear.

Would prefer to see them, so we know they are zero.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Simon Riggs (#9)
Re: monitoring CREATE INDEX [CONCURRENTLY]

On 2019/02/13 11:59, Simon Riggs wrote:

On Wed, 13 Feb 2019 at 00:46, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Here's a sample
concurrent index build on a 100M tuple table.

Cool

+1

Looking at the "([phase] x of x)" in the sample output, I thought
pg_stat_progress_vacuum's output should've had it too (not a concern of
Alvaro's patch though.)

Thanks,
Amit

#11Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#7)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi Alvaro,

On 2019/02/12 12:18, Alvaro Herrera wrote:

I ended up defining phases for the index_build phase in the AM itself;
the code reports a phase number using the regular API, and then the
pgstat_progress view obtains the name of each phase using a support
method.

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 05102724ead..189179a5667 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -127,6 +127,7 @@ typedef struct IndexAmRoutine
     amcostestimate_function amcostestimate;
     amoptions_function amoptions;
     amproperty_function amproperty;     /* can be NULL */
+    amphasename_function amphasename;   /* can be NULL */

Doesn't the name amphasename sound a bit too generic, given that it can
only describe the phases of ambuild? Maybe ambuildphase?

Thanks,
Amit

#12Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Alvaro Herrera (#8)
Re: monitoring CREATE INDEX [CONCURRENTLY]

On 2019/02/13 4:16, Alvaro Herrera wrote:

I added metrics for the validate_index phases; patch attached. This is
still a bit raw, but it looks much better now. Here's a sample
concurrent index build on a 100M tuple table. There are no concurrent

Great!

+		s.pid AS pid, S.datid AS datid, D.datname AS datname,
+		S.relid AS relid,
+		CASE s.param2 WHEN 0 THEN 'initializing (phase 1 of 8)'
+					  WHEN 1 THEN 'waiting for lockers 1 (phase 2 of 8)'
+					  WHEN 2 THEN 'building index (3 of 8): ' ||
+						pg_indexam_progress_phasename(s.param1::oid, s.param3)

It would be better to replace "s" with "S".

s/s.pid/S.pid/
s/s.param2/S.param2/
s/s.param1::oid, s.param3/S.param1::oid, S.param3/

These are not important comments but for readability. :)

Thanks,
Tatsuro Yamada

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#11)
Re: monitoring CREATE INDEX [CONCURRENTLY]

On 2019-Feb-13, Amit Langote wrote:

Doesn't the name amphasename sound a bit too generic, given that it can
only describe the phases of ambuild? Maybe ambuildphase?

Hmm, yeah, maybe it does. I renamed it "ambuildphasename", since it's
not about reporting the phase itself -- it's about translating the phase
number to the string that's reported to the user.

The attached patch does it that way. Also, when an index build uses an
AM that doesn't support progress reporting, it no longer reports a NULL
phase name while building. I also changed it to report the progress of
phase 7 (heap scan validation) using block numbers rather than tuple
counts. I also tweaked the strings reported in the view. They're
clearer now IMO.

One slight annoyance is that when parallel workers are used, the last
block number reported in phase 3/subphase 2 (IndexBuildHeapScan stuff)
is not necessarily accurate, since the tail of the table could well be
scanned by a worker that's not the leader, and we only report in the
leader when it gets a new block.

When the AM does not support progress reporting, everything stays as
zeros during the index build phase. It's easy to notice how slow hash
indexes are to build compared to btrees this way! Maybe it'd be
better fallback on reporting block numbers in IndexBuildHeapScan when
this happens. Thoughts?

I added docs to the monitoring section -- that's the bulkiest part of
the patch.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Report-progress-of-CREATE-INDEX-operations.patchtext/x-diff; charset=us-asciiDownload+668-39
#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#13)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hmm, looks like a very bare-bones support for hash indexes does not
require a lot of code, and gives a clear picture (you can sit all night
watching the numbers go up, instead of biting your fingernails wondering
if it'll be completed by dawn.) This part isn't 100% done -- it we
would better to have ambuildphasename support.

(I'm a bit confused about phase 5 not reporting anything for hash
indexes in CIC, though. That's part is supposed to be AM agnostic.)

I think it was a mistake to define the progress constants in one header
file commands/progress.h and the associated functions in pgstat.h. I
think it would be better to move the function decls to
commands/progress.h.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

progress-hash.patchtext/x-diff; charset=us-asciiDownload+11-1
#15Rahila Syed
rahila.syed@2ndquadrant.com
In reply to: Amit Langote (#11)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi Alvaro,

On 2019-Feb-13, Amit Langote wrote:

Doesn't the name amphasename sound a bit too generic, given that it can
only describe the phases of ambuild? Maybe ambuildphase?

Hmm, yeah, maybe it does. I renamed it "ambuildphasename", since it's
not about reporting the phase itself -- it's about translating the phase
number to the string that's reported to the user.

The attached patch does it that way. Also, when an index build uses an
AM that doesn't support progress reporting, it no longer reports a NULL
phase name while building. I also changed it to report the progress of
phase 7 (heap scan validation) using block numbers rather than tuple
counts. I also tweaked the strings reported in the view. They're
clearer now IMO.

One slight annoyance is that when parallel workers are used, the last
block number reported in phase 3/subphase 2 (IndexBuildHeapScan stuff)
is not necessarily accurate, since the tail of the table could well be
scanned by a worker that's not the leader, and we only report in the
leader when it gets a new block.

When the AM does not support progress reporting, everything stays as
zeros during the index build phase. It's easy to notice how slow hash
indexes are to build compared to btrees this way! Maybe it'd be
better fallback on reporting block numbers in IndexBuildHeapScan when
this happens. Thoughts?

I added docs to the monitoring section -- that's the bulkiest part of
the patch.

1. Thank you for incorporating review comments.
Can you please rebase the latest
0001-Report-progress-of-CREATE-INDEX-operations.patch on master? Currently
it does not apply on 754b90f657bd54b482524b73726dae4a9165031c

15:56:44.694 | building index (3 of 8): initializing (1/5) |
442478 | 442399 | 0 | 0 | 0
| 0
15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) |
442478 | 442399 | 100000000 | 0 | 0
| 0
15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) |
442478 | 442399 | 100000000 | 0 | 0
| 0
15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) |
442478 | 442399 | 100000000 | 79057 | 0
| 0

2. In the above report, even though we are reporting progress in terms of
tuples_done for final btree sort & load phase we have not cleared
the blocks_done entry from previous phases. I think this can be confusing
as the blocks_done does not correspond to the tuples_done in the current
phase.

--
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com <http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Rahila Syed
rahila.syed@2ndquadrant.com
In reply to: Alvaro Herrera (#13)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi Alvaro,

Resending the email as earlier one didn't get sent on pgsql-hackers.

On 2/23/19 3:24 AM, Alvaro Herrera wrote:

On 2019-Feb-13, Amit Langote wrote:

Doesn't the name amphasename sound a bit too generic, given that it can
only describe the phases of ambuild? Maybe ambuildphase?

Hmm, yeah, maybe it does. I renamed it "ambuildphasename", since it's
not about reporting the phase itself -- it's about translating the phase
number to the string that's reported to the user.

The attached patch does it that way. Also, when an index build uses an
AM that doesn't support progress reporting, it no longer reports a NULL
phase name while building. I also changed it to report the progress of
phase 7 (heap scan validation) using block numbers rather than tuple
counts. I also tweaked the strings reported in the view. They're
clearer now IMO.

One slight annoyance is that when parallel workers are used, the last
block number reported in phase 3/subphase 2 (IndexBuildHeapScan stuff)
is not necessarily accurate, since the tail of the table could well be
scanned by a worker that's not the leader, and we only report in the
leader when it gets a new block.

When the AM does not support progress reporting, everything stays as
zeros during the index build phase. It's easy to notice how slow hash
indexes are to build compared to btrees this way! Maybe it'd be
better fallback on reporting block numbers in IndexBuildHeapScan when
this happens. Thoughts?

I added docs to the monitoring section -- that's the bulkiest part of
the patch.

1. Thank you for incorporating review comments.
Can you please rebase the latest 0001-Report-progress-of-
CREATE-INDEX-operations.patch on master? Currently it does not apply on
754b90f657bd54b482524b73726dae4a9165031c

15:56:44.694 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 442478 | 442399 | 100000000 | 0 | 0 | 0
15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 442478 | 442399 | 100000000 | 0 | 0 | 0
15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 79057 | 0 | 0
15:56:44.738 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 217018 | 0 | 0
15:56:44.75 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 353804 | 0 | 0

2. In the above report, even though we are reporting progress in terms
of tuples_done for final btree sort & load phase we have not cleared
the blocks_done entry from previous phases. I think this can be
confusing as the blocks_done does not correspond to the tuples_done in
the final btree sort & load phase.

Thank you,
Rahila Syed

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Rahila Syed (#16)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi Rahila,

Thanks for looking.

On 2019-Mar-04, Rahila wrote:

1. Thank you for incorporating review comments.
Can you please rebase the latest 0001-Report-progress-of-
CREATE-INDEX-operations.patch on master? Currently it does not apply on
754b90f657bd54b482524b73726dae4a9165031c

Hmm, rebased to current master. Didn't conflict at all when rebasing,
so it's strange that it didn't apply.

15:56:44.694 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 442478 | 442399 | 100000000 | 0 | 0 | 0
15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 442478 | 442399 | 100000000 | 0 | 0 | 0
15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 79057 | 0 | 0
15:56:44.738 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 217018 | 0 | 0
15:56:44.75 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 353804 | 0 | 0

2. In the above report, even though we are reporting progress in terms of
tuples_done for final btree sort & load phase we have not cleared
the blocks_done entry from previous phases. I think this can be confusing as
the blocks_done does not correspond to the tuples_done in the final btree
sort & load phase.

Good point. Done in the attached version, wherein I also added comments
to explain the IndexBuildHeapScan API change. I didn't change the hash
AM implementation here.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v5-0001-Report-progress-of-CREATE-INDEX-operations.patchtext/x-diff; charset=us-asciiDownload+695-39
v5-0002-report-progress-of-hash-indexes.patchtext/x-diff; charset=us-asciiDownload+11-2
#18David Fetter
david@fetter.org
In reply to: Alvaro Herrera (#17)
Re: monitoring CREATE INDEX [CONCURRENTLY]

On Mon, Mar 04, 2019 at 05:46:07PM -0300, Alvaro Herrera wrote:

Hi Rahila,

Thanks for looking.

On 2019-Mar-04, Rahila wrote:

1. Thank you for incorporating review comments.
Can you please rebase the latest 0001-Report-progress-of-
CREATE-INDEX-operations.patch on master? Currently it does not apply on
754b90f657bd54b482524b73726dae4a9165031c

Hmm, rebased to current master. Didn't conflict at all when rebasing,
so it's strange that it didn't apply.

15:56:44.694 | building index (3 of 8): initializing (1/5) | 442478 | 442399 | 0 | 0 | 0 | 0
15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 442478 | 442399 | 100000000 | 0 | 0 | 0
15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) | 442478 | 442399 | 100000000 | 0 | 0 | 0
15:56:44.727 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 79057 | 0 | 0
15:56:44.738 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 217018 | 0 | 0
15:56:44.75 | building index (3 of 8): final btree sort & load (5/5) | 442478 | 442399 | 100000000 | 353804 | 0 | 0

2. In the above report, even though we are reporting progress in terms of
tuples_done for final btree sort & load phase we have not cleared
the blocks_done entry from previous phases. I think this can be confusing as
the blocks_done does not correspond to the tuples_done in the final btree
sort & load phase.

Good point. Done in the attached version, wherein I also added comments
to explain the IndexBuildHeapScan API change. I didn't change the hash
AM implementation here.

Would it be a very large lift to report progress for the rest of the
index types we support?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#19Rahila Syed
rahila.syed@2ndquadrant.com
In reply to: Alvaro Herrera (#17)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi Alvaro,

On Tue, 5 Mar 2019 at 08:32, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Hi Rahila,

Thanks for looking.

On 2019-Mar-04, Rahila wrote:

1. Thank you for incorporating review comments.
Can you please rebase the latest 0001-Report-progress-of-
CREATE-INDEX-operations.patch on master? Currently it does not apply on
754b90f657bd54b482524b73726dae4a9165031c

Hmm, rebased to current master. Didn't conflict at all when rebasing,
so it's strange that it didn't apply.

Thanks for updating the patch. Sorry, I think it wasn't that the patch
needed rebasing but
I failed to apply it correctly last time. I can apply it now.

+extern char *btbuildphasename(int64 phasenum);

1. I think int64 is too large a datatype for phasenum.
Also int32 is used for phasenum in pg_indexam_progress_phasename().
Can we have it as int8?

2.

if ((previous_blkno == InvalidBlockNumber) ||

+ (scan->rs_cblock != previous_blkno))

+ {

+

pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,

+

scan->rs_cblock);

+ previous_blkno = scan->rs_cblock;

+ }

. In validate_index_heapscan, we dont calculate blocks_done similar to
IndexBuildHeapScan i.e
block_done += scan->rs_cblock - previous_blkno which IMO is more accurate.
Reporting scan->rs_cblock as blocks_done might give slightly inaccurate
results as we are
still processing that block.

3. There is no if(progress) check in validate_index()/
validate_index_heapscan() code. Wont it be a problem if it is called from
other
index methods which dont support reporting progress at the moment?

4. Just to clarify my understanding can you please see below comment

Quoting your following comment in cluster command progress monitor thread
while referring to progress reporting from IndexBuildHeapScan,

"One, err, small issue with that idea is that we need the param numbers
not to conflict for any "progress update providers" that are to be used
simultaneously by any command."

Does that mean that we can't have any other INDEX progress monitoring, use
PROGRESS_SCAN_BLOCKS_TOTAL and PROGRESS_SCAN_BLOCKS_DONE
parameter numbers to report anything but the metrics they report now.

5.

15:56:44.682 | building index (3 of 8): initializing (1/5) |
442478 | 442399 | 0 | 0 | 0 |
0

15:56:44.694 | building index (3 of 8): initializing (1/5) |

442478 | 442399 | 0 | 0 | 0 |
0

15:56:44.705 | building index (3 of 8): sorting tuples, spool 1 (3/5) |

442478 | 442399 | 100000000 | 0 | 0 |
0

15:56:44.716 | building index (3 of 8): sorting tuples, spool 1 (3/5) |

442478 | 442399 | 100000000 | 0 | 0 |
0

I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
the blocks_done count is increasing. Shouldn't it have
changed to reflect PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
index(3 of 8): table scan(2/5) ?
Although I think it has been rectified in the latest patch as I now get
'table scan' phase in output as I do CREATE INDEX on table with 1000000
records

Thank you,
.--
Rahila Syed
Performance Engineer
2ndQuadrant
http://www.2ndQuadrant.com <http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Rahila Syed (#19)
Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi Rahila

On 2019-Mar-11, Rahila Syed wrote:

On Tue, 5 Mar 2019 at 08:32, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

+extern char *btbuildphasename(int64 phasenum);

1. I think int64 is too large a datatype for phasenum.
Also int32 is used for phasenum in pg_indexam_progress_phasename().
Can we have it as int8?

It does look strange, I agree, and the first code I wrote had it using a
smaller type. However, I later realized that since the value comes
directly from pg_stat_get_progress_info(), which returns int8 values, it
was pointless to only accept a small fraction of the possible values for
no good reason, so I widened it to int64 as you see now.

2.
. In validate_index_heapscan, we dont calculate blocks_done similar to
IndexBuildHeapScan i.e
block_done += scan->rs_cblock - previous_blkno which IMO is more accurate.
Reporting scan->rs_cblock as blocks_done might give slightly inaccurate
results as we are
still processing that block.

Thanks for pointing out that there's an off-by-one bug there (should be
cblock-1). However, IndexBuildHeapScan uses more complicated code
because it needs to cover for two additional things that
validate_index_heapscan doesn't: parallel heapscans and synchronized
seqscans. We could do that, I just saw no point in it.

3. There is no if(progress) check in validate_index()/
validate_index_heapscan() code. Wont it be a problem if it is called from
other index methods which dont support reporting progress at the
moment?

Good question. I'll have a look. Most likely, I'll end up having
things so that building an index using an unsupported index AM reports
progress based on IndexBuildHeapScan / validate_index /
validate_index_heapscan ... which might mean I should remove the
'progress' parameter from there and have them report unconditionally.

4. Just to clarify my understanding can you please see below comment

Quoting your following comment in cluster command progress monitor thread
while referring to progress reporting from IndexBuildHeapScan,

"One, err, small issue with that idea is that we need the param numbers
not to conflict for any "progress update providers" that are to be used
simultaneously by any command."

Does that mean that we can't have any other INDEX progress monitoring, use
PROGRESS_SCAN_BLOCKS_TOTAL and PROGRESS_SCAN_BLOCKS_DONE
parameter numbers to report anything but the metrics they report now.

What I mean is that the literal parameter numbers defined as
PROGRESS_SCAN_BLOCKS_DONE/TOTAL may not be used for other parameters by
commands that call IndexBuildHeapScan, if those other parameters are
used by the same commands simultaneously with IndexBuildHeapScan. So
those parameter numbers become "reserved".

5.

15:56:44.682 | building index (3 of 8): initializing (1/5) |
442478 | 442399 | 0 | 0 | 0 |

I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
the blocks_done count is increasing. Shouldn't it have
changed to reflect PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
index(3 of 8): table scan(2/5) ?
Although I think it has been rectified in the latest patch as I now get
'table scan' phase in output as I do CREATE INDEX on table with 1000000
records

Yeah, this was a bug that I fixed in v5. (It was a misunderstanding
about how parallel scanning is set up, IIRC). For v5, I tested both
parallel and non-parallel builds, with and without sync seqscans, and
everything seemed to behave correctly.

Thanks for looking! I intend to post a new version later this week.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#20)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#24)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#26)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#27)
#29Rahila Syed
rahila.syed@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
#30Rahila Syed
rahila.syed@2ndquadrant.com
In reply to: Rahila Syed (#29)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
#32Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#31)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#31)
#34Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#33)
#35Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#34)
#36Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#31)
#37Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Fetter (#18)
#38Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#36)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simon Riggs (#38)
#40Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#39)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#40)
#42Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#41)
#43Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#42)
#44Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#42)
#45Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#43)
#46Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#46)
#48Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#48)
#50Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#49)
#51Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Rahila Syed (#30)
#52Rahila Syed
rahila.syed@2ndquadrant.com
In reply to: Alvaro Herrera (#51)
#53Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Rahila Syed (#52)
#54Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Rahila Syed (#52)
#55Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#51)
#56Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#55)
#57Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#56)
#58Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#57)
#59Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#58)
#60Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#59)