Progress report of CREATE INDEX for nested partitioned tables

Started by Ilya Gladyshevover 3 years ago47 messageshackers
Jump to latest
#1Ilya Gladyshev
ilya.v.gladyshev@gmail.com

Hi,

I have noticed that progress reporting for CREATE INDEX of partitioned
tables seems to be working poorly for nested partitioned tables. In
particular, it overwrites total and done partitions count when it
recurses down to child partitioned tables and it only reports top-level
partitions. So it's not hard to see something like this during CREATE
INDEX now:

postgres=# select partitions_total, partitions_done from
pg_stat_progress_create_index ;
partitions_total | partitions_done
------------------+-----------------
1 | 2
(1 row)

I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

Thanks,
Ilya Gladyshev

Attachments:

0001-report-top-parent-progress-for-CREATE-INDEX.patchtext/x-patch; charset=UTF-8; name=0001-report-top-parent-progress-for-CREATE-INDEX.patchDownload+30-7
#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#1)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sat, Dec 10, 2022 at 12:18:32PM +0400, Ilya Gladyshev wrote:

Hi,

I have noticed that progress reporting for CREATE INDEX of partitioned
tables seems to be working poorly for nested partitioned tables. In
particular, it overwrites total and done partitions count when it
recurses down to child partitioned tables and it only reports top-level
partitions.�So it's not hard to see something like this during CREATE
INDEX now:

postgres=# select partitions_total, partitions_done from
pg_stat_progress_create_index ;
partitions_total | partitions_done
------------------+-----------------
1 | 2
(1 row)

Yeah. I didn't verify, but it looks like this is a bug going to back to
v12. As you said, when called recursively, DefineIndex() clobbers the
number of completed partitions.

Maybe DefineIndex() could flatten the list of partitions. But I don't
think that can work easily with iteration rather than recursion.

Could you check what I've written as a counter-proposal ?

As long as we're changing partitions_done to include nested
sub-partitions, it seems to me like we should exclude intermediate
"catalog-only" partitioned indexes, and count only physical leaf
partitions. Should it alo exclude any children with matching indexes,
which will also be catalog-only changes? Probably not.

The docs say:
|When creating an index on a partitioned table, this column is set to the
|total number of partitions on which the index is to be created. This
|field is 0 during a REINDEX.

I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

I wrote a bunch of assertions for this, which seems to have uncovered an
similar issue with COPY progress reporting, dating to 8a4f618e7. I'm
not sure the assertions are okay. I imagine they may break other
extensions, as with file_fdw.

--
Justin

Attachments:

0001-fix-progress-reporting-of-nested-partitioned-indexes.patchtext/x-diff; charset=us-asciiDownload+118-6
#3Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#2)
Re: Progress report of CREATE INDEX for nested partitioned tables

Could you check what I've written as a counter-proposal ?

I think that this might be a good solution to start with, it gives us the opportunity to improve the granularity later without any surprising changes for the end user. We could use this patch for previous versions and make more granular output in the latest. What do you think?

As long as we're changing partitions_done to include nested
sub-partitions, it seems to me like we should exclude intermediate
"catalog-only" partitioned indexes, and count only physical leaf
partitions. Should it alo exclude any children with matching indexes,
which will also be catalog-only changes? Probably not.

The docs say:
|When creating an index on a partitioned table, this column is set to the
|total number of partitions on which the index is to be created. This
|field is 0 during a REINDEX.

I agree with you on catalog-only partitioned indexes, but I think that in the perfect world we should exclude all the relations where the index isn’t actually created, so that means excluding attached indexes as well. However, IMO doing it this way will require too much of a code rewrite for quite a minor feature (but we could do it, ofc). I actually think that the progress view would be better off without the total number of partitions, but I’m not sure we have this option now. With this in mind, I think your proposal to exclude catalog-only indexes sounds reasonable to me, but I feel like the docs are off in this case, because the attached indexes are not created, but we pretend like they are in this metric, so we should fix one or the other.

Show quoted text

I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

I wrote a bunch of assertions for this, which seems to have uncovered an
similar issue with COPY progress reporting, dating to 8a4f618e7. I'm
not sure the assertions are okay. I imagine they may break other
extensions, as with file_fdw.

--
Justin
<0001-fix-progress-reporting-of-nested-partitioned-indexes.patch>

#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#3)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Mon, Dec 12, 2022 at 11:39:23PM +0400, Ilya Gladyshev wrote:

Could you check what I've written as a counter-proposal ?

I think that this might be a good solution to start with, it gives us the opportunity to improve the granularity later without any surprising changes for the end user. We could use this patch for previous versions and make more granular output in the latest. What do you think?

Somehow, it hadn't occured to me that my patch "lost granularity" by
incrementing the progress bar by more than one... Shoot.

I actually think that the progress view would be better off without the total number of partitions,

Just curious - why ?

With this in mind, I think your proposal to exclude catalog-only indexes sounds reasonable to me, but I feel like the docs are off in this case, because the attached indexes are not created, but we pretend like they are in this metric, so we should fix one or the other.

I agree that the docs should indicate whether we're counting "all
partitions", "direct partitions", and whether or not that includes
partitioned partitions, or just leaf partitions.

I have another proposal: since the original patch 3.5 years ago didn't
consider or account for sub-partitions, let's not start counting them
now. It was never defined whether they were included or not (and I
guess that they're not common) so we can take this opportunity to
clarify the definition.

Alternately, if it's okay to add nparts_done to the IndexStmt, then
that's easy.

--
Justin

Attachments:

0001-report-top-parent-progress-for-CREATE-INDEX.patchtext/x-diff; charset=us-asciiDownload+109-7
0001-report-top-parent-progress-for-CREATE-INDEX.txttext/x-diff; charset=us-asciiDownload+94-5
#5Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#4)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Mon, 2022-12-12 at 22:43 -0600, Justin Pryzby wrote:

On Mon, Dec 12, 2022 at 11:39:23PM +0400, Ilya Gladyshev wrote:

Could you check what I've written as a counter-proposal ?

I think that this might be a good solution to start with, it gives
us the opportunity to improve the granularity later without any
surprising changes for the end user. We could use this patch for
previous versions and make more granular output in the latest. What
do you think?

Somehow, it hadn't occured to me that my patch "lost granularity" by
incrementing the progress bar by more than one...  Shoot.

I actually think that the progress view would be better off without
the total number of partitions,

Just curious - why ?

We don't really know how many indexes we are going to create, unless we
have some kind of preliminary "planning" stage where we acumulate all
the relations that will need to have indexes created (rather than
attached). And if someone wants the total, it can be calculated
manually without this view, it's less user-friendly, but if we can't do
it well, I would leave it up to the user.

With this in mind, I think your proposal to exclude catalog-only
indexes sounds reasonable to me, but I feel like the docs are off
in this case, because the attached indexes are not created, but we
pretend like they are in this metric, so we should fix one or the
other.

I agree that the docs should indicate whether we're counting "all
partitions", "direct partitions", and whether or not that includes
partitioned partitions, or just leaf partitions.

Agree. I think that docs should also be explicit about the attached
indexes, if we decide to count them in as "created".

I have another proposal: since the original patch 3.5 years ago
didn't
consider or account for sub-partitions, let's not start counting them
now.  It was never defined whether they were included or not (and I
guess that they're not common) so we can take this opportunity to
clarify the definition.

I have had this thought initially, but then I thought that it's not
what I would want, if I was to track progress of multi-level
partitioned tables (but yeah, I guess it's pretty uncommon). In this
respect, I like your initial counter-proposal more, because it leaves
us room to improve this in the future. Otherwise, if we commit to
reporting only top-level partitions now, I'm not sure we will have the
opportunity to change this.

Alternately, if it's okay to add nparts_done to the IndexStmt, then
that's easy.

Yeah, or we could add another argument to DefineIndex. I don't know if
it's ok, or which option is better here in terms of compatibility and
interface-wise, so I have tried both of them, see the attached patches.

Attachments:

0001-parts_done-via-DefineIndex-args.patch.txttext/plain; charset=UTF-8; name=0001-parts_done-via-DefineIndex-args.patch.txtDownload+53-13
0001-parts_done-via-IndexStmt.patchtext/x-patch; charset=UTF-8; name=0001-parts_done-via-IndexStmt.patchDownload+48-6
#6Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#5)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Tue, Dec 13, 2022 at 10:18:58PM +0400, Ilya Gladyshev wrote:

I actually think that the progress view would be better off without
the total number of partitions,

Just curious - why ?

We don't really know how many indexes we are going to create, unless we
have some kind of preliminary "planning" stage where we acumulate all
the relations that will need to have indexes created (rather than
attached). And if someone wants the total, it can be calculated
manually without this view, it's less user-friendly, but if we can't do
it well, I would leave it up to the user.

Thanks. One other reason is that the partitions (and sub-partitions)
may not be equally sized. Also, I've said before that it's weird to
report macroscopic progress about the number of partitions finihed in
the same place as reporting microscopic details like the number of
blocks done of the relation currently being processed.

I have another proposal: since the original patch 3.5 years ago
didn't
consider or account for sub-partitions, let's not start counting them
now.� It was never defined whether they were included or not (and I
guess that they're not common) so we can take this opportunity to
clarify the definition.

I have had this thought initially, but then I thought that it's not
what I would want, if I was to track progress of multi-level
partitioned tables (but yeah, I guess it's pretty uncommon). In this
respect, I like your initial counter-proposal more, because it leaves
us room to improve this in the future. Otherwise, if we commit to
reporting only top-level partitions now, I'm not sure we will have the
opportunity to change this.

We have the common problem of too many patches.

/messages/by-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value being
lost to the caller.

/messages/by-id/20221211063334.GB27893@telsasoft.com
This also counts indirect children, but only increments the progress
reporting in the parent. This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter will
"jump" from (say) 20 to 30 without ever hitting 21-29.

/messages/by-id/20221213044331.GJ27893@telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
children. This is minimal, but discourages any future plan to track
progress involving intermediate partitions with finer granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows reporting
fine-grained progress involving intermediate partitions.

/messages/by-id/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
This also reports progress of intermediate children. The first patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch). And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

--
Justin

#7Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#6)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

We have the common problem of too many patches.

/messages/by-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value being
lost to the caller.

/messages/by-id/20221211063334.GB27893@telsasoft.com
This also counts indirect children, but only increments the progress
reporting in the parent. This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter will
"jump" from (say) 20 to 30 without ever hitting 21-29.

/messages/by-id/20221213044331.GJ27893@telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
children. This is minimal, but discourages any future plan to track
progress involving intermediate partitions with finer granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows reporting
fine-grained progress involving intermediate partitions.

/messages/by-id/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
This also reports progress of intermediate children. The first patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch). And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

This should be resolved before the "CIC on partitioned tables" patch,
which I think is otherwise done.

#8Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#7)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

We have the common problem of too many patches.

/messages/by-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value
being
lost to the caller.

/messages/by-id/20221211063334.GB27893@telsasoft.com
This also counts indirect children, but only increments the
progress
reporting in the parent.  This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter
will
"jump" from (say) 20 to 30 without ever hitting 21-29.

/messages/by-id/20221213044331.GJ27893@telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
  children.  This is minimal, but discourages any future plan to
track
  progress involving intermediate partitions with finer
granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows
reporting
  fine-grained progress involving intermediate partitions.

/messages/by-id/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
This also reports progress of intermediate children.  The first
patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch).  And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

This should be resolved before the "CIC on partitioned tables" patch,
which I think is otherwise done.

I suggest that we move on with the IndexStmt patch and see what the
committers have to say about it. I have brushed the patch up a bit,
fixing TODOs and adding docs as per our discussion above.

Attachments:

v2-0001-parts_done-via-IndexStmt.patchtext/x-patch; charset=UTF-8; name=v2-0001-parts_done-via-IndexStmt.patchDownload+59-8
#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Ilya Gladyshev (#8)
Re: Progress report of CREATE INDEX for nested partitioned tables

On 1/9/23 09:44, Ilya Gladyshev wrote:

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

...

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

AFAIK fields added at the end of a struct is seen as acceptable from the
ABI point of view. It's not risk-free, but we did that multiple times
when fixing bugs, IIRC.

The primary risk is old extensions (built on older minor version)
running on new server, getting confused by new fields (and implied
shifts in the structs). But fields at the end should be safe - the
extension simply ignores the stuff at the end. The one problem would be
arrays of structs, because even a field at the end changes the array
stride. But I don't think we do that with IndexStmt ...

Of course, if the "old" extension itself allocates the struct and passes
it to core code, that might still be an issue, because it'll allocate a
smaller struct, and core might see bogus data at the end.

On the other hand, new extensions on old server may get confused too,
because it may try setting a field that does not exist.

So ultimately it's about weighing risks vs. benefits - evaluating
whether fixing the issue is actually worth it.

The question is if/how many such extensions messing with IndexStmt in
this way actually exist. That is, allocate IndexStmt (or array of it). I
haven't found any, but maybe some extensions for index or partition
management do it? Not sure.

But ...

Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

This should be resolved before the "CIC on partitioned tables" patch,
which I think is otherwise done.

I suggest that we move on with the IndexStmt patch and see what the
committers have to say about it. I have brushed the patch up a bit,
fixing TODOs and adding docs as per our discussion above.

I did take a look at the patch, so here are my 2c:

1) num_leaf_partitions says it's "excluding foreign tables" but then it
uses RELKIND_HAS_STORAGE() which excludes various others relkinds, e.g.
partitioned tables etc. Minor, but perhaps a bit confusing.

2) I'd probably say count_leaf_partitions() instead.

3) The new part in DefineIndex counting leaf partitions should have a
comment before

if (!OidIsValid(parentIndexId))
{ ... }

4) It's a bit confusing that one of the branches in DefineIndex just
sets stmt->parts_done without calling pgstat_progress_update_param
(while the other one does both). AFAICS the call is not needed because
we already updated it during the recursive DefineIndex call, but maybe
the comment should mention that?

As for the earlier discussion about the "correct" behavior for leaf vs.
non-leaf partitions and whether to calculate partitions in advance:

* I agree it's desirable to count partitions in advance, instead of
adding incrementally. The view is meant to provide "overview" of the
CREATE INDEX progress, and imagine you get

partitions_total partitions_done
10 9

so that you believe you're ~90% done. But then it jumps to the next
child and now you get

partitions_total partitions_done
20 10

which makes the view a bit useless for it's primary purpose, IMHO.

* I don't care very much about leaf vs. non-leaf partitions. If we
exclude non-leaf ones, fine with me. But the number of non-leaf ones
should be much smaller than leaf ones, and if the partition already has
a matching index that distorts the tracking too. Furthermore the
partitions may have different size etc. so the progress is only
approximate anyway.

I wonder if we could improve this to track the size of partitions for
total/done? That'd make leaf/non-leaf distinction unnecessary, because
non-leaf partitions have size 0.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#9)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:

On 1/9/23 09:44, Ilya Gladyshev wrote:

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

...

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

AFAIK fields added at the end of a struct is seen as acceptable from the
ABI point of view. It's not risk-free, but we did that multiple times
when fixing bugs, IIRC.

My question isn't whether it's okay to add a field at the end of a
struct in general, but rather whether it's acceptable to add this field
at the end of this struct, and not because it's unsafe to do in a minor
release, but whether someone is going to say that it's an abuse of the
data structure.

Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

Good idea to try.

As for the earlier discussion about the "correct" behavior for leaf vs.
non-leaf partitions and whether to calculate partitions in advance:

* I agree it's desirable to count partitions in advance, instead of
adding incrementally. The view is meant to provide "overview" of the
CREATE INDEX progress, and imagine you get

partitions_total partitions_done
10 9

so that you believe you're ~90% done. But then it jumps to the next
child and now you get

partitions_total partitions_done
20 10

which makes the view a bit useless for it's primary purpose, IMHO.

To be clear, that's the current, buggy behavior, and this thread is
about fixing it. The proposed patches all ought to avoid that.

But the bug isn't caused by not "calculating partitions in advance".
Rather, the issue is that currently, the "total" is overwritten while
recursing.

That's a separate question from whether indirect partitions are counted
or not.

I wonder if we could improve this to track the size of partitions for
total/done? That'd make leaf/non-leaf distinction unnecessary, because
non-leaf partitions have size 0.

Maybe, but it's out of scope for this patch.

Thanks for looking.

--
Justin

#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Justin Pryzby (#10)
Re: Progress report of CREATE INDEX for nested partitioned tables

On 1/17/23 21:59, Justin Pryzby wrote:

On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:

On 1/9/23 09:44, Ilya Gladyshev wrote:

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

...

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

AFAIK fields added at the end of a struct is seen as acceptable from the
ABI point of view. It's not risk-free, but we did that multiple times
when fixing bugs, IIRC.

My question isn't whether it's okay to add a field at the end of a
struct in general, but rather whether it's acceptable to add this field
at the end of this struct, and not because it's unsafe to do in a minor
release, but whether someone is going to say that it's an abuse of the
data structure.

Ah, you mean whether it's the right place for the parameter?

I don't think it is, really. IndexStmt is meant to be a description of
the CREATE INDEX statement, not something that includes info about how
it's processed. But it's the only struct we pass to the DefineIndex for
child indexes, so the only alternatives I can think of is a global
variable and the (existing) param array field.

Nevertheless, ABI compatibility is still relevant for backbranches.

Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

Good idea to try.

OK

As for the earlier discussion about the "correct" behavior for leaf vs.
non-leaf partitions and whether to calculate partitions in advance:

* I agree it's desirable to count partitions in advance, instead of
adding incrementally. The view is meant to provide "overview" of the
CREATE INDEX progress, and imagine you get

partitions_total partitions_done
10 9

so that you believe you're ~90% done. But then it jumps to the next
child and now you get

partitions_total partitions_done
20 10

which makes the view a bit useless for it's primary purpose, IMHO.

To be clear, that's the current, buggy behavior, and this thread is
about fixing it. The proposed patches all ought to avoid that.

But the bug isn't caused by not "calculating partitions in advance".
Rather, the issue is that currently, the "total" is overwritten while
recursing.

You're right the issue us about overwriting the total - not sure what I
was thinking about when writing this. I guess I got distracted by the
discussion about "preliminary planning" etc. Sorry for the confusion.

That's a separate question from whether indirect partitions are counted
or not.

I wonder if we could improve this to track the size of partitions for
total/done? That'd make leaf/non-leaf distinction unnecessary, because
non-leaf partitions have size 0.

Maybe, but it's out of scope for this patch.

+1, it was just an idea for future.

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#10)
Re: Progress report of CREATE INDEX for nested partitioned tables

TBH, I think the best approach is what I did in:
0001-report-top-parent-progress-for-CREATE-INDEX.txt

That's a minimal patch, ideal for backpatching.

..which defines/clarifies that the progress reporting is only for
*direct* children. That avoids the need to change any data structures,
and it's what was probably intended by the original patch, which doesn't
seem to have considered intermediate partitioned tables.

I think it'd be fine to re-define that in some future release, to allow
showing indirect children (probably only "leaves", and not intermediate
partitioned tables). Or "total_bytes" or other global progress.

--
Justin

#13Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Justin Pryzby (#12)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, 18 Jan 2023 at 15:25, Justin Pryzby <pryzby@telsasoft.com> wrote:

TBH, I think the best approach is what I did in:
0001-report-top-parent-progress-for-CREATE-INDEX.txt

That's a minimal patch, ideal for backpatching.

..which defines/clarifies that the progress reporting is only for
*direct* children. That avoids the need to change any data structures,
and it's what was probably intended by the original patch, which doesn't
seem to have considered intermediate partitioned tables.

I think it'd be fine to re-define that in some future release, to allow
showing indirect children (probably only "leaves", and not intermediate
partitioned tables). Or "total_bytes" or other global progress.

Hmm. My expectation as a user is that partitions_total includes both
direct and indirect (leaf) child partitions, that it is set just once
at the start of the process, and that partitions_done increases from
zero to partitions_total as the index-build proceeds. I think that
should be achievable with a minimally invasive patch that doesn't
change any data structures.

I agree with all the review comments Tomas posted. In particular, this
shouldn't need any changes to IndexStmt. I think the best approach
would be to just add a new function to backend_progress.c that offsets
a specified progress parameter by a specified amount, so that you can
just increment partitions_done by one or more, at the appropriate
points.

Regards,
Dean

#14Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Tomas Vondra (#9)
Re: Progress report of CREATE INDEX for nested partitioned tables

17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch, that introduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we can assume that the only process that can write into MyBEEntry of the current backend is the current backend itself, therefore looping to get consistent reads from this array is not required. Please correct me, if I am wrong here.

Attachments:

0001-create-index-progress-increment.patchapplication/octet-stream; name=0001-create-index-progress-increment.patch; x-unix-mode=0644Download+76-8
#15Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#14)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Tue, Jan 31, 2023 at 07:32:20PM +0400, Ilya Gladyshev wrote:

17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch, that introduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we can assume that the only process that can write into MyBEEntry of the current backend is the current backend itself, therefore looping to get consistent reads from this array is not required. Please correct me, if I am wrong here.

Thanks for the updated patch.

I think you're right - pgstat_begin_read_activity() is used for cases
involving other backends. It ought to be safe for a process to read its
own status bits, since we know they're not also being written.

You changed DefineIndex() to update progress for the leaf indexes' when
called recursively. The caller updates the progress for "attached"
indexes, but not created ones. That allows providing fine-granularity
progress updates when using intermediate partitions, right ? (Rather
than updating the progress by more than one at a time in the case of
intermediate partitioning).

If my understanding is right, that's subtle, and adds a bit of
complexity to the current code, so could use careful commentary. I
suggest:

* If the index was attached, update progress for all its direct and
* indirect leaf indexes all at once. If the index was built by calling
* DefineIndex() recursively, the called function is responsible for
* updating the progress report for built indexes.

...

* If this is the top-level index, we're done. When called recursively
* for child tables, the done partition counter is incremented now,
* rather than in the caller.

I guess you know that there were compiler warnings (related to your
question).
https://cirrus-ci.com/task/6571212386598912

pgstat_progress_incr_param() could call pgstat_progress_update_param()
rather than using its own Assert() and WRITE_ACTIVITY calls. I'm not
sure which I prefer, though.

Also, there are whitespace/tab/style issues in
pgstat_progress_incr_param().

--
Justin

#16Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#15)
Re: Progress report of CREATE INDEX for nested partitioned tables

1 февр. 2023 г., в 08:29, Justin Pryzby <pryzby@telsasoft.com> написал(а):

On Tue, Jan 31, 2023 at 07:32:20PM +0400, Ilya Gladyshev wrote:

17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch, that introduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we can assume that the only process that can write into MyBEEntry of the current backend is the current backend itself, therefore looping to get consistent reads from this array is not required. Please correct me, if I am wrong here.

Thanks for the updated patch.

I think you're right - pgstat_begin_read_activity() is used for cases
involving other backends. It ought to be safe for a process to read its
own status bits, since we know they're not also being written.

You changed DefineIndex() to update progress for the leaf indexes' when
called recursively. The caller updates the progress for "attached"
indexes, but not created ones. That allows providing fine-granularity
progress updates when using intermediate partitions, right ? (Rather
than updating the progress by more than one at a time in the case of
intermediate partitioning).

If my understanding is right, that's subtle, and adds a bit of
complexity to the current code, so could use careful commentary. I
suggest:

* If the index was attached, update progress for all its direct and
* indirect leaf indexes all at once. If the index was built by calling
* DefineIndex() recursively, the called function is responsible for
* updating the progress report for built indexes.

...

* If this is the top-level index, we're done. When called recursively
* for child tables, the done partition counter is incremented now,
* rather than in the caller.

Yes, you are correct about the intended behavior, I added your comments to the patch.

I guess you know that there were compiler warnings (related to your
question).
https://cirrus-ci.com/task/6571212386598912

pgstat_progress_incr_param() could call pgstat_progress_update_param()
rather than using its own Assert() and WRITE_ACTIVITY calls. I'm not
sure which I prefer, though.

Also, there are whitespace/tab/style issues in
pgstat_progress_incr_param().

--
Justin

Thank you for the review, I fixed the aforementioned issues in the v2.

Attachments:

v2-0001-create-index-progress-increment.patchapplication/octet-stream; name=v2-0001-create-index-progress-increment.patch; x-unix-mode=0644Download+77-8
#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ilya Gladyshev (#16)
Re: Progress report of CREATE INDEX for nested partitioned tables

Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

I think there's probably not much point optimizing it further than that.
If there was, then we could think about creating a data representation
that we can build for the entire partitioning hierarchy in a single pass
with the count of leaf partitions that sit below each specific non-leaf;
but I think that's just over-engineering.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

#18Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Alvaro Herrera (#17)
Re: Progress report of CREATE INDEX for nested partitioned tables

1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):

Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

Attachments:

v3-0001-create-index-progress-increment.patchapplication/octet-stream; name=v3-0001-create-index-progress-increment.patch; x-unix-mode=0644Download+87-8
#19Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Ilya Gladyshev (#18)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):

Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

Kind regards,

Matthias van de Meent

#20Justin Pryzby
pryzby@telsasoft.com
In reply to: Matthias van de Meent (#19)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:

On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

But the TOTAL is constant, right ? Updating the total when being called
recursively is the problem these patches fix.

--
Justin

#21Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Justin Pryzby (#20)
#22Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Matthias van de Meent (#21)
#23Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Ilya Gladyshev (#22)
#24Justin Pryzby
pryzby@telsasoft.com
In reply to: Matthias van de Meent (#23)
#25Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#24)
#26Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#26)
#28Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#28)
#30Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#31)
#33Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#33)
#35Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#35)
#37Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#35)
#38Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#37)
#39Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#38)
#40Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#39)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#40)
#42Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#41)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#42)
#44Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#43)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#45)
#47Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#46)