Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

Started by Peter Geogheganover 1 year ago52 messages
Jump to latest

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node. This is
useful for index scans that happen to use SAOP arrays. It also seems
almost essential to offer this kind of instrumentation for the skip
scan patch [1]/messages/by-id/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com. Skip scan works by reusing all of the Postgres 17 work
(see commit 5bf748b8) to skip over irrelevant sections of a composite
index with a low cardinality leading column, so it has all the same
issues.

One reason to have this patch is to differentiate between similar
cases involving simple SAOP arrays. The user will have some reasonable
way of determining how a query such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
abalance
from
pgbench_accounts
where
aid in (1, 2, 3, 4, 5);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.007..0.008 rows=5 loops=1) │
│ Index Cond: (aid = ANY ('{1,2,3,4,5}'::integer[]))

│ Primitive Index Scans: 1

│ Buffers: shared hit=4

└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

...differs from a similar query, such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
abalance
from
pgbench_accounts
where
aid in (1000, 2000, 3000, 4000, 5000);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.006..0.012 rows=5 loops=1) │
│ Index Cond: (aid = ANY ('{1000,2000,3000,4000,5000}'::integer[]))

│ Primitive Index Scans: 5

│ Buffers: shared hit=20

└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

Another reason to have this patch is consistency. We're only showing
the user the number of times we've incremented
pg_stat_user_tables.idx_scan in each case. The fact that
pg_stat_user_tables.idx_scan counts primitive index scans like this is
nothing new. That issue was only documented quite recently, as part of
the Postgres 17 work, and it seems quite misleading. It's consistent,
but not necessarily nonsurprising. Making it readily apparent that
there is more than one primitive index scan involved here makes the
issue less surprising.

Skip scan
---------

Here's an example with this EXPLAIN ANALYZE patch applied on top of my
skip scan patch [1]/messages/by-id/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com, using the tenk1 table left behind when the
standard regression tests are run:

pg@regression:5432 [2070865]=# create index on tenk1 (four, stringu1);
CREATE INDEX
pg@regression:5432 [2070865]=# explain (analyze, buffers, costs off,
summary off)
select
stringu1
from
tenk1
where
-- Omitted: the leading column on "four"
stringu1 = 'YGAAAA';
┌───────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using tenk1_four_stringu1_idx on tenk1 (actual
time=0.011..0.017 rows=15 loops=1) │
│ Index Cond: (stringu1 = 'YGAAAA'::name)

│ Heap Fetches: 0

│ Primitive Index Scans: 5

│ Buffers: shared hit=11

└───────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

Notice that there are 5 primitive index scans here. That's what I'd
expect, given that there are exactly 4 distinct "logical subindexes"
implied by our use of a leading column on "four" as the scan's skip
column. Under the hood, an initial primitive index scan locates the
lowest "four" value. There are then 4 additional primitive index scans
to locate the next "four" value (needed when the current "four" value
gets past the value's "stringu1 = 'YGAAAA'" tuples).

Obviously, the cardinality of the leading column predicts the number
of primitive index scans at runtime. But it can be much more
complicated of a relationship than what I've shown here may suggest.
Skewness matters, too. Small clusters of index tuples with unique
leading column values will greatly increase column
cardinality/ndistinct, without a commensurate increase in the cost of
a skip scan (that skips using that column). Those small clusters of
unique values will appear on the same few leaf pages. It follows that
they cannot substantially increase the number of primitive scans
required at runtime -- they'll just be read all together at once.

An important goal of my design for skip scan is that we avoid the need
for special index paths within the optimizer. Whether or not we skip
is always a runtime decision (when a skippable index attribute exists
at all). The optimizer needs to know about skipping for costing
purposes only -- all of the required optimizer changes are in
selfuncs.c. That's why you didn't see some kind of special new index
scan node here -- you just saw the number of primitive index scans.

My motivation for working on this EXPLAIN ANALYZE patch is primarily
skip scan. I don't think that it necessarily matters, though. I think
that this patch can be treated as independent work. It would have been
weird to not bring it up skip scan even once here, though.

Presentation design choices
---------------------------

I've used the term "primitive index scan" for this. That is the
existing user-visible terminology [2]https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW -- see "Note" box -- Peter Geoghegan, though I suppose that that
could be revisited now.

Another quasi-arbitrary design choice: I don't break out primitive
index scans for scan nodes with multiple loops (e.g., the inner side
of a nested loop join). The count of primitive scans accumulates
across index_rescan calls. I did things this way because it felt
slightly more logical to follow what we show for "Buffers" --
primitive index scans are another physical cost. I'm certainly not
opposed to doing that part differently. It doesn't have to be one or
the other (could break it out both ways), if people think that the
added verbosity is worth it.

I think that we shouldn't be counting calls to _bt_first as a
primitive index scan unless they either call _bt_search or
_bt_endpoint to descend the index (in the case of nbtree scans). This
means that cases where we detect a contradictory qual in
_bt_preprocess_keys should count as having zero primitive index scans.
That is technically an independent thing, though it seems far more
logical to just do it that way.

Actually, I think that there might be existing bugs on HEAD, with
parallel index scan -- I think we might be overcounting. We're not
properly accounting for the fact that parallel workers usually don't
perform a primitive index scan when their backend calls into
_bt_first. I wonder if I should address that separately, as a bug
fix...

[1]: /messages/by-id/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
[2]: https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW -- see "Note" box -- Peter Geoghegan
-- see "Note" box
--
Peter Geoghegan

Attachments:

v1-0001-Show-primitive-scan-count-in-EXPLAIN-ANALYZE.patchapplication/octet-stream; name=v1-0001-Show-primitive-scan-count-in-EXPLAIN-ANALYZE.patchDownload+189-42
#2Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Peter Geoghegan (#1)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Thu, 15 Aug 2024 at 21:23, Peter Geoghegan <pg@bowt.ie> wrote:

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node. This is
useful for index scans that happen to use SAOP arrays. It also seems
almost essential to offer this kind of instrumentation for the skip
scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
(see commit 5bf748b8) to skip over irrelevant sections of a composite
index with a low cardinality leading column, so it has all the same
issues.

Did you notice the patch over at [0]/messages/by-id/TYWPR01MB10982D24AFA7CDC273445BFF0B1DC2@TYWPR01MB10982.jpnprd01.prod.outlook.com, where additional diagnostic
EXPLAIN output for btrees is being discussed, too? I'm asking, because
I'm not very convinced that 'primitive scans' are a useful metric
across all (or even: most) index AMs (e.g. BRIN probably never will
have a 'primitive scans' metric that differs from the loop count), so
maybe this would better be implemented in that framework?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0]: /messages/by-id/TYWPR01MB10982D24AFA7CDC273445BFF0B1DC2@TYWPR01MB10982.jpnprd01.prod.outlook.com

#3Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Peter Geoghegan (#1)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

Hi! Thank you for your work on this subject!

On 15.08.2024 22:22, Peter Geoghegan wrote:

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node. This is
useful for index scans that happen to use SAOP arrays. It also seems
almost essential to offer this kind of instrumentation for the skip
scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
(see commit 5bf748b8) to skip over irrelevant sections of a composite
index with a low cardinality leading column, so it has all the same
issues.

I think that it is enough to pass the IndexScanDesc parameter to the
function - this saves us from having to define the planstate type twice.

For this reason, I suggest some changes that I think may improve your patch.

One reason to have this patch is to differentiate between similar
cases involving simple SAOP arrays. The user will have some reasonable
way of determining how a query such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
abalance
from
pgbench_accounts
where
aid in (1, 2, 3, 4, 5);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.007..0.008 rows=5 loops=1) │
│ Index Cond: (aid = ANY ('{1,2,3,4,5}'::integer[]))

│ Primitive Index Scans: 1

│ Buffers: shared hit=4

└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

...differs from a similar query, such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
abalance
from
pgbench_accounts
where
aid in (1000, 2000, 3000, 4000, 5000);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.006..0.012 rows=5 loops=1) │
│ Index Cond: (aid = ANY ('{1000,2000,3000,4000,5000}'::integer[]))

│ Primitive Index Scans: 5

│ Buffers: shared hit=20

└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

Another reason to have this patch is consistency. We're only showing
the user the number of times we've incremented
pg_stat_user_tables.idx_scan in each case. The fact that
pg_stat_user_tables.idx_scan counts primitive index scans like this is
nothing new. That issue was only documented quite recently, as part of
the Postgres 17 work, and it seems quite misleading. It's consistent,
but not necessarily nonsurprising. Making it readily apparent that
there is more than one primitive index scan involved here makes the
issue less surprising.

Skip scan
---------

Here's an example with this EXPLAIN ANALYZE patch applied on top of my
skip scan patch [1], using the tenk1 table left behind when the
standard regression tests are run:

pg@regression:5432 [2070865]=# create index on tenk1 (four, stringu1);
CREATE INDEX
pg@regression:5432 [2070865]=# explain (analyze, buffers, costs off,
summary off)
select
stringu1
from
tenk1
where
-- Omitted: the leading column on "four"
stringu1 = 'YGAAAA';
┌───────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using tenk1_four_stringu1_idx on tenk1 (actual
time=0.011..0.017 rows=15 loops=1) │
│ Index Cond: (stringu1 = 'YGAAAA'::name)

│ Heap Fetches: 0

│ Primitive Index Scans: 5

│ Buffers: shared hit=11

└───────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

Notice that there are 5 primitive index scans here. That's what I'd
expect, given that there are exactly 4 distinct "logical subindexes"
implied by our use of a leading column on "four" as the scan's skip
column. Under the hood, an initial primitive index scan locates the
lowest "four" value. There are then 4 additional primitive index scans
to locate the next "four" value (needed when the current "four" value
gets past the value's "stringu1 = 'YGAAAA'" tuples).

Obviously, the cardinality of the leading column predicts the number
of primitive index scans at runtime. But it can be much more
complicated of a relationship than what I've shown here may suggest.
Skewness matters, too. Small clusters of index tuples with unique
leading column values will greatly increase column
cardinality/ndistinct, without a commensurate increase in the cost of
a skip scan (that skips using that column). Those small clusters of
unique values will appear on the same few leaf pages. It follows that
they cannot substantially increase the number of primitive scans
required at runtime -- they'll just be read all together at once.

An important goal of my design for skip scan is that we avoid the need
for special index paths within the optimizer. Whether or not we skip
is always a runtime decision (when a skippable index attribute exists
at all). The optimizer needs to know about skipping for costing
purposes only -- all of the required optimizer changes are in
selfuncs.c. That's why you didn't see some kind of special new index
scan node here -- you just saw the number of primitive index scans.

My motivation for working on this EXPLAIN ANALYZE patch is primarily
skip scan. I don't think that it necessarily matters, though. I think
that this patch can be treated as independent work. It would have been
weird to not bring it up skip scan even once here, though.

Presentation design choices
---------------------------

I've used the term "primitive index scan" for this. That is the
existing user-visible terminology [2], though I suppose that that
could be revisited now.

Another quasi-arbitrary design choice: I don't break out primitive
index scans for scan nodes with multiple loops (e.g., the inner side
of a nested loop join). The count of primitive scans accumulates
across index_rescan calls. I did things this way because it felt
slightly more logical to follow what we show for "Buffers" --
primitive index scans are another physical cost. I'm certainly not
opposed to doing that part differently. It doesn't have to be one or
the other (could break it out both ways), if people think that the
added verbosity is worth it.

I think that we shouldn't be counting calls to _bt_first as a
primitive index scan unless they either call _bt_search or
_bt_endpoint to descend the index (in the case of nbtree scans). This
means that cases where we detect a contradictory qual in
_bt_preprocess_keys should count as having zero primitive index scans.
That is technically an independent thing, though it seems far more
logical to just do it that way.

Actually, I think that there might be existing bugs on HEAD, with
parallel index scan -- I think we might be overcounting. We're not
properly accounting for the fact that parallel workers usually don't
perform a primitive index scan when their backend calls into
_bt_first. I wonder if I should address that separately, as a bug
fix...

[1]/messages/by-id/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
[2]https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW
-- see "Note" box
--
Peter Geoghegan

To be honest, I don't quite understand how information in explain
analyze about the number of used primitive indexes
will help me improve my database system as a user. Perhaps I'm missing
something.

Maybe it can tell me which columns are best to create an index on or
something like that?

Could you explain it me, please?

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachments:

diff.no-cfbottext/plain; charset=UTF-8; name=diff.no-cfbotDownload+4-22
In reply to: Matthias van de Meent (#2)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Thu, Aug 15, 2024 at 4:34 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node. This is
useful for index scans that happen to use SAOP arrays. It also seems
almost essential to offer this kind of instrumentation for the skip
scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
(see commit 5bf748b8) to skip over irrelevant sections of a composite
index with a low cardinality leading column, so it has all the same
issues.

Did you notice the patch over at [0], where additional diagnostic
EXPLAIN output for btrees is being discussed, too?

To be clear, for those that haven't been paying attention to that
other thread: that other EXPLAIN patch (the one authored by Masahiro
Ikeda) surfaces information about a distinction that the skip scan
patch renders obsolete. That is, the skip scan patch makes all "Non
Key Filter" quals into quals that can relocate the scan to a later
leaf page by starting a new primitive index scan. Technically, skip
scan removes the concept that that patch calls "Non Key Filter"
altogether.

Note that this isn't the same thing as making that other patch
obsolete. Skip scan renders the whole concept of "Non Key Filter"
obsolete *in name only*. You might prefer to think of it as making
that whole concept squishy. Just because we can theoretically use the
leading column to skip doesn't mean we actually will. It isn't an
either/or thing. We might skip during some parts of a scan, but not
during other parts.

It's just not clear how to handle those sorts of fuzzy distinctions
right now. It does seem worth pursuing, but I see no conflict.

I'm asking, because
I'm not very convinced that 'primitive scans' are a useful metric
across all (or even: most) index AMs (e.g. BRIN probably never will
have a 'primitive scans' metric that differs from the loop count), so
maybe this would better be implemented in that framework?

What do you mean by "within that framework"? They seem orthogonal?

It's true that BRIN index scans will probably never show more than a
single primitive index scan. I don't think that the same is true of
any other index AM, though. Don't they all support SAOPs, albeit
non-natively?

The important question is: what do you want to do about cases like the
BRIN case? Our choices are all fairly obvious choices. We can be
selective, and *not* show this information when a set of heuristics
indicate that it's not relevant. This is fairly straightforward to
implement. Which do you prefer: overall consistency, or less
verbosity?

Personally I think that the consistency argument works in favor of
displaying this information for every kind of index scan. That's a
hopelessly subjective position, though.

--
Peter Geoghegan

In reply to: Alena Rybakina (#3)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Thu, Aug 15, 2024 at 4:58 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:

I think that it is enough to pass the IndexScanDesc parameter to the function - this saves us from having to define the planstate type twice.

For this reason, I suggest some changes that I think may improve your patch.

Perhaps it's a little better that way. I'll consider it.

To be honest, I don't quite understand how information in explain analyze about the number of used primitive indexes
will help me improve my database system as a user. Perhaps I'm missing something.

There is probably no typical case. The patch shows implementation
details, which need to be interpreted in the context of a particular
problem.

Maybe the problem is that some of the heuristics added by one of my
nbtree patches interact relatively badly with some real world query.
It would be presumptuous of me to say that that will never happen.

Maybe it can tell me which columns are best to create an index on or something like that?

That's definitely going to be important in the case of skip scan.
Simply showing the user that the index scan skips at all will make
them aware that there are missing index columns. That could be a sign
that they'd be better off not using skip scan at all, by creating a
new index that suits the particular query (by not having the extra
skipped column).

It's almost always possible to beat skip scan by creating a new index
-- whether or not it's worth the trouble/expense of maintaining a
whole new index is the important question. Is this particular query
the most important query *to the business*, for whatever reason? Or is
having merely adequate performance acceptable?

Your OR-to-SAOP-rewrite patch effectively makes two or more bitmap
index scans into one single continuous index scan. Or...does it? The
true number of (primitive) index scans might be "the same" as it was
before (without your patch), or there might really only be one
(primitive) index scan with your patch. Or it might be anywhere in
between those two extremes. Users will benefit from knowing where on
this continuum a particular index scan falls. It's just useful to know
where time is spent.

Knowing this information might even allow the user to create a new
multicolumn index, with columns in an order better suited to an
affected query. It's not so much the cost of descending the index
multiple times that we need to worry about here, even though that's
what we're talking about counting here. Varying index column order
could make an index scan faster by increasing locality. Locality is
usually very important. Few index scans is a good proxy for greater
locality.

It's easiest to understand what I mean about locality with an example.
An index on (a, b) is good for queries with quals such as "where a =
42 and b in (1,2,3,4,5,6,7,8,9)" if it allows such a query to only
access one or two leaf pages, where all of the "b" values of interest
live side by side. Obviously that won't be true if it's the other way
around -- if the typical qual looks more like "where b = 7 and a in
(1,2,3,4,5,6,7,8,9)". This is the difference between 1 primitive
index scan and 9 primitive index scans -- quite a big difference. Note
that the main cost we need to worry about here *isn't* the cost of
descending the index. It's mostly the cost of reading the leaf pages.

--
Peter Geoghegan

#6Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Peter Geoghegan (#4)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Thu, 15 Aug 2024 at 23:10, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Aug 15, 2024 at 4:34 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node. This is
useful for index scans that happen to use SAOP arrays. It also seems
almost essential to offer this kind of instrumentation for the skip
scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
(see commit 5bf748b8) to skip over irrelevant sections of a composite
index with a low cardinality leading column, so it has all the same
issues.

Did you notice the patch over at [0], where additional diagnostic
EXPLAIN output for btrees is being discussed, too?

To be clear, for those that haven't been paying attention to that
other thread: that other EXPLAIN patch (the one authored by Masahiro
Ikeda) surfaces information about a distinction that the skip scan
patch renders obsolete. That is, the skip scan patch makes all "Non
Key Filter" quals into quals that can relocate the scan to a later
leaf page by starting a new primitive index scan. Technically, skip
scan removes the concept that that patch calls "Non Key Filter"
altogether.

Note that this isn't the same thing as making that other patch
obsolete. Skip scan renders the whole concept of "Non Key Filter"
obsolete *in name only*. You might prefer to think of it as making
that whole concept squishy. Just because we can theoretically use the
leading column to skip doesn't mean we actually will. It isn't an
either/or thing. We might skip during some parts of a scan, but not
during other parts.

Yes.

It's just not clear how to handle those sorts of fuzzy distinctions
right now. It does seem worth pursuing, but I see no conflict.

I'm asking, because
I'm not very convinced that 'primitive scans' are a useful metric
across all (or even: most) index AMs (e.g. BRIN probably never will
have a 'primitive scans' metric that differs from the loop count), so
maybe this would better be implemented in that framework?

What do you mean by "within that framework"? They seem orthogonal?

What I meant was putting this 'primitive scans' info into the
AM-specific explain callback as seen in the latest patch version.

It's true that BRIN index scans will probably never show more than a
single primitive index scan. I don't think that the same is true of
any other index AM, though. Don't they all support SAOPs, albeit
non-natively?

Not always. For Bitmap Index Scan the node's functions can allow
non-native SAOP support (it ORs the bitmaps), but normal indexes
without SAOP support won't get SAOP-functionality from the IS/IOS
node's infrastructure, it'll need to be added as Filter.

The important question is: what do you want to do about cases like the
BRIN case? Our choices are all fairly obvious choices. We can be
selective, and *not* show this information when a set of heuristics
indicate that it's not relevant. This is fairly straightforward to
implement. Which do you prefer: overall consistency, or less
verbosity?

Consistency, I suppose. But adding explain attributes left and right
in Index Scan's explain output when and where every index type needs
them doesn't scale, so I'd put index-specific output into it's own
system (see the linked thread for more rationale). And, in this case,
the use case seems quite index-specific, at least for IS/IOS nodes.

Personally I think that the consistency argument works in favor of
displaying this information for every kind of index scan.

Agreed, assuming "this information" is indeed shared (and useful)
across all AMs.

This made me notice that you add a new metric that should generally be
exactly the same as pg_stat_all_indexes.idx_scan (you mention the
same). Can't you pull that data, instead of inventing a new place
every AMs needs to touch for it's metrics?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In reply to: Matthias van de Meent (#6)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Thu, Aug 15, 2024 at 5:47 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I'm asking, because
I'm not very convinced that 'primitive scans' are a useful metric
across all (or even: most) index AMs (e.g. BRIN probably never will
have a 'primitive scans' metric that differs from the loop count), so
maybe this would better be implemented in that framework?

What do you mean by "within that framework"? They seem orthogonal?

What I meant was putting this 'primitive scans' info into the
AM-specific explain callback as seen in the latest patch version.

I don't see how that could work. This is fundamentally information
that is only known when the query has fully finished execution.

Again, this is already something that we track at the whole-table
level, within pg_stat_user_tables.idx_scan. It's already considered
index AM agnostic information, in that sense.

It's true that BRIN index scans will probably never show more than a
single primitive index scan. I don't think that the same is true of
any other index AM, though. Don't they all support SAOPs, albeit
non-natively?

Not always. For Bitmap Index Scan the node's functions can allow
non-native SAOP support (it ORs the bitmaps), but normal indexes
without SAOP support won't get SAOP-functionality from the IS/IOS
node's infrastructure, it'll need to be added as Filter.

Again, what do you want me to do about it? Almost anything is possible
in principle, and can be implemented without great difficulty. But you
have to clearly say what you want, and why you want it.

Yeah, non-native SAOP index scans are always bitmap scans. In the case
of GIN, there are only lossy/bitmap index scans, anyway -- can't see
that ever changing. In the case of GiST, we could in the future add
native SAOP support, so do we really want to be inconsistent in what
we show now? (Tom said something about that recently, in fact.)

I don't hate the idea of selectively not showing this information (for
BRIN, say). Just as I don't hate the idea of totally omitting
"loops=1" in the common case where we couldn't possibly be more than
one loop in practice. It's just that I don't think that it's worth it,
on balance. Not all redundancy is bad.

The important question is: what do you want to do about cases like the
BRIN case? Our choices are all fairly obvious choices. We can be
selective, and *not* show this information when a set of heuristics
indicate that it's not relevant. This is fairly straightforward to
implement. Which do you prefer: overall consistency, or less
verbosity?

Consistency, I suppose. But adding explain attributes left and right
in Index Scan's explain output when and where every index type needs
them doesn't scale, so I'd put index-specific output into it's own
system (see the linked thread for more rationale).

I can't argue with that. I just don't think it's directly relevant.

And, in this case,
the use case seems quite index-specific, at least for IS/IOS nodes.

I disagree. It's an existing concept, exposed in system views, and now
in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less.

The fact that it tends to be much more useful in the case of nbtree
(at least for now) makes this no less true.

This made me notice that you add a new metric that should generally be
exactly the same as pg_stat_all_indexes.idx_scan (you mention the
same).

I didn't imagine that that part was subtle.

Can't you pull that data, instead of inventing a new place
every AMs needs to touch for it's metrics?

No. At least not in a way that's scoped to a particular index scan.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#1)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Thu, Aug 15, 2024 at 3:22 PM Peter Geoghegan <pg@bowt.ie> wrote:

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node.

Attached is v2, which fixes bitrot.

v2 also uses new terminology. EXPLAIN ANALYZE will now show "Index
Searches: N", not "Primitive Index Scans: N". Although there is
limited precedent for using the primitive scan terminology, I think
that it's a bit unwieldy.

No other notable changes.

--
Peter Geoghegan

Attachments:

v2-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patchapplication/octet-stream; name=v2-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patchDownload+196-45
#9Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#8)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, Aug 27, 2024 at 11:16 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Aug 15, 2024 at 3:22 PM Peter Geoghegan <pg@bowt.ie> wrote:

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node.

Attached is v2, which fixes bitrot.

v2 also uses new terminology. EXPLAIN ANALYZE will now show "Index
Searches: N", not "Primitive Index Scans: N". Although there is
limited precedent for using the primitive scan terminology, I think
that it's a bit unwieldy.

I do like "Index Searches" better than "Primitive Index Scans."

But I think Matthias had some good points about this being
btree-specific. I'm not sure whether he was completely correct, but
you seemed to just dismiss his argument and say "well, that can't be
done," which doesn't seem convincing to me at all. If, for non-btree
indexes, the number of index searches will always be the same as the
loop count, then surely there is some way to avoid cluttering the
output for non-btree indexes with output that can never be of any use.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#9)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, Aug 27, 2024 at 1:45 PM Robert Haas <robertmhaas@gmail.com> wrote:

I do like "Index Searches" better than "Primitive Index Scans."

But I think Matthias had some good points about this being
btree-specific.

It's not B-Tree specific -- not really. Any index scan that can at
least non-natively support ScalarArrayOps (i.e. SAOP scans that the
executor manages using ExecIndexEvalArrayKeys() + bitmap scans) will
show information that is exactly equivalent to what B-Tree will show,
given a similar ScalarArrayOps query.

There is at best one limited sense in which the information shown is
B-Tree specific: it tends to be more interesting in the case of B-Tree
index scans. You cannot trivially derive the number based on the
number of array keys for B-Tree scans, since nbtree is now clever
about not needlessly searching the index anew. It's quite possible
that other index AMs will in the future be enhanced in about the same
way as nbtree was in commit 5bf748b86b, at which point even this will
no longer apply. (Tom speculated about adding something like that to
GiST recently).

I'm not sure whether he was completely correct, but
you seemed to just dismiss his argument and say "well, that can't be
done," which doesn't seem convincing to me at all.

To be clear, any variation that you can think of *can* be done without
much difficulty. I thought that Matthias was unclear about what he
even wanted, is all.

The problem isn't that there aren't any alternatives. The problem, if
any, is that there are a huge number of slightly different
alternatives. There are hopelessly subjective questions about what the
best trade-off between redundancy and consistency is. I'm absolutely
not set on doing things in exactly the way I've laid out.

What do you think should be done? Note that the number of loops
matters here, in addition to the number of SAOP primitive
scans/searches. If you want to suppress the information shown in the
typical "nsearches == 1" case, what does that mean for the less common
"nsearches == 0" case?

If, for non-btree
indexes, the number of index searches will always be the same as the
loop count, then surely there is some way to avoid cluttering the
output for non-btree indexes with output that can never be of any use.

Even if we assume that a given index/index AM will never use SAOPs,
it's still possible to show more than one "Index Search" per executor
node execution. For example, when an index scan node is the inner side
of a nestloop join.

I see value in making it obvious to users when and how
pg_stat_all_indexes.idx_scan advances. Being able to easily relate it
to EXPLAIN ANALYZE output is useful, independent of whether or not
SAOPs happen to be used. That's probably the single best argument in
favor of showing "Index Searches: N" unconditionally. But I'm
certainly not going to refuse to budge over that.

--
Peter Geoghegan

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#10)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

Peter Geoghegan <pg@bowt.ie> writes:

I see value in making it obvious to users when and how
pg_stat_all_indexes.idx_scan advances. Being able to easily relate it
to EXPLAIN ANALYZE output is useful, independent of whether or not
SAOPs happen to be used. That's probably the single best argument in
favor of showing "Index Searches: N" unconditionally. But I'm
certainly not going to refuse to budge over that.

TBH, I'm afraid that this patch basically is exposing numbers that
nobody but Peter Geoghegan and maybe two or three other hackers
will understand, and even fewer people will find useful (since the
how-many-primitive-scans behavior is not something users have any
control over, IIUC). I doubt that "it lines up with
pg_stat_all_indexes.idx_scan" is enough to justify the additional
clutter in EXPLAIN. Maybe we should be going the other direction
and trying to make pg_stat_all_indexes count in a less detailed but
less surprising way, ie once per indexscan plan node invocation.

regards, tom lane

In reply to: Tom Lane (#11)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, Aug 27, 2024 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

TBH, I'm afraid that this patch basically is exposing numbers that
nobody but Peter Geoghegan and maybe two or three other hackers
will understand, and even fewer people will find useful (since the
how-many-primitive-scans behavior is not something users have any
control over, IIUC).

You can make about the same argument against showing "Buffers". It's
not really something that you can address directly, either. It's
helpful only in the context of a specific problem.

I doubt that "it lines up with
pg_stat_all_indexes.idx_scan" is enough to justify the additional
clutter in EXPLAIN.

The scheme laid out in the patch is just a starting point for
discussion. I just think that it's particularly important that we have
this for skip scan -- that's the part that I feel strongly about.

With skip scan in place, every scan of the kind we'd currently call a
"full index scan" will be eligible to skip. Whether and to what extent
we actually skip is determined at runtime. We really need some way of
determining how much skipping has taken place. (There are many
disadvantages to having a dedicated skip scan index path, which I can
go into if you want.)

Maybe we should be going the other direction
and trying to make pg_stat_all_indexes count in a less detailed but
less surprising way, ie once per indexscan plan node invocation.

Is that less surprising, though? I think that it's more surprising.

--
Peter Geoghegan

#13Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Peter Geoghegan (#7)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Fri, 16 Aug 2024 at 00:34, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Aug 15, 2024 at 5:47 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I'm asking, because
I'm not very convinced that 'primitive scans' are a useful metric
across all (or even: most) index AMs (e.g. BRIN probably never will
have a 'primitive scans' metric that differs from the loop count), so
maybe this would better be implemented in that framework?

What do you mean by "within that framework"? They seem orthogonal?

What I meant was putting this 'primitive scans' info into the
AM-specific explain callback as seen in the latest patch version.

I don't see how that could work. This is fundamentally information
that is only known when the query has fully finished execution.

If the counter was put into the BTScanOpaque, rather than the
IndexScanDesc, then this could trivially be used in an explain AM
callback, as IndexScanDesc and ->opaque are both still available while
building the explain output. As a result, it wouldn't bloat the
IndexScanDesc for other index AMs who might not be interested in this
metric.

Again, this is already something that we track at the whole-table
level, within pg_stat_user_tables.idx_scan. It's already considered
index AM agnostic information, in that sense.

That's true, but for most indexes there is a 1:1 relationship between
loops and idx_scan counts, with ony btree behaving differently in that
regard. Not to say it isn't an important insight for btree, but just
that it seems to be only relevant for btree and no other index I can
think of right now.

It's true that BRIN index scans will probably never show more than a
single primitive index scan. I don't think that the same is true of
any other index AM, though. Don't they all support SAOPs, albeit
non-natively?

Not always. For Bitmap Index Scan the node's functions can allow
non-native SAOP support (it ORs the bitmaps), but normal indexes
without SAOP support won't get SAOP-functionality from the IS/IOS
node's infrastructure, it'll need to be added as Filter.

Again, what do you want me to do about it? Almost anything is possible
in principle, and can be implemented without great difficulty. But you
have to clearly say what you want, and why you want it.

I don't want anything, or anything done about it, but your statement
that all index AMs support SAOP (potentially non-natively) is not
true, as the non-native SAOP support is only for bitmap index scans,
and index AMs aren't guaranteed to support bitmap index scans (e.g.
pgvector's IVFFLAT and HNSW are good examples, as they only support
amgettuple).

Yeah, non-native SAOP index scans are always bitmap scans. In the case
of GIN, there are only lossy/bitmap index scans, anyway -- can't see
that ever changing.

GIN had amgettuple-based index scans until the fastinsert path was
added, and with some work (I don't think it needs to be a lot) the
feature can probably be returned to the AM. The GIN internals would
probably only need relatively few changes, as they already seem to
mostly use precise TID-based scans - the only addition would be a
filter that prohibits returning tuples that were previously returned
while scanning the fastinsert path during the normal index scan.

And, in this case,
the use case seems quite index-specific, at least for IS/IOS nodes.

I disagree. It's an existing concept, exposed in system views, and now
in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less.

To be precise, it is not precisely that, because it's a different
counter that an AM must update when the pgstat data is updated if it
wants the explain output to reflect the stats counter accurately. When
an AM forgets to update one of these metrics (or fails to realize they
have to both be updated) then they'd be out-of-sync. I'd prefer if an
AM didn't have to account for it's statistics in more than one place.

This made me notice that you add a new metric that should generally be
exactly the same as pg_stat_all_indexes.idx_scan (you mention the
same).

I didn't imagine that that part was subtle.

It wasn't, but it was not present in the first two paragraphs of the
mail, which I had only skimmed when I sent my first reply (as you
maybe could see indicated by the quote). That's why it took me until
my second reply to realise these were considered to be equivalent,
especially after I noticed the headerfile changes where you added a
new metric rather than pulling data from existing stats.

Can't you pull that data, instead of inventing a new place
every AMs needs to touch for it's metrics?

No. At least not in a way that's scoped to a particular index scan.

Similar per-node counter data is pulled for the global (!) counters of
pgBufferUsage, so why would it be less possible to gather such metrics
for just one index's stats here? While I do think it won't be easy to
find a good way to integrate this into EXPLAIN's Instrumentation, I
imagine other systems (e.g. table scans) may benefit from a better
integration and explanation of pgstat statistics in EXPLAIN, too. E.g.
I'd love to be able to explain how many times which function was
called in a plans' projections, and what the relevant time expendature
for those functions is in my plans. This data is available with
track_functions enabled, and diffing in the execution nodes should
allow this to be shown in EXPLAIN output. It'd certainly be more
expensive than not doing the analysis, but I believe that's what
EXPLAIN options are for - you can show a more detailed analysis at the
cost of increased overhead in the plan execution.

Alternatively, you could update the patch so that only the field in
IndexScan would need to be updated by the index AM by making the
executor responsible to update the relation's stats at once at the end
of the query with the data from the IndexScanDesc.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In reply to: Matthias van de Meent (#13)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, Aug 27, 2024 at 5:03 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

If the counter was put into the BTScanOpaque, rather than the
IndexScanDesc, then this could trivially be used in an explain AM
callback, as IndexScanDesc and ->opaque are both still available while
building the explain output.

Right, "trivial". Except in that it requires inventing a whole new
general purpose infrastructure. Meanwhile, Tom is arguing against even
showing this very basic information in EXPLAIN ANALYZE.You see the
problem?

As a result, it wouldn't bloat the
IndexScanDesc for other index AMs who might not be interested in this
metric.

Why do you persist with the idea that it isn't useful for other index
AMs? I mean it literally works in exactly the same way! It's literally
indistinguishable to users, and works in a way that's consistent with
historical behavior/definitions.

I don't want anything, or anything done about it, but your statement
that all index AMs support SAOP (potentially non-natively) is not
true, as the non-native SAOP support is only for bitmap index scans,
and index AMs aren't guaranteed to support bitmap index scans (e.g.
pgvector's IVFFLAT and HNSW are good examples, as they only support
amgettuple).

Yes, there are some very minor exceptions -- index AMs where even
non-native SAOPs won't be used. What difference does it make?

And, in this case,
the use case seems quite index-specific, at least for IS/IOS nodes.

I disagree. It's an existing concept, exposed in system views, and now
in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less.

To be precise, it is not precisely that, because it's a different
counter that an AM must update when the pgstat data is updated if it
wants the explain output to reflect the stats counter accurately.

Why does that matter? I could easily move the counter to the opaque
struct, but that would make the patch longer and more complicated, for
absolutely no benefit.

When an AM forgets to update one of these metrics (or fails to realize they
have to both be updated) then they'd be out-of-sync. I'd prefer if an
AM didn't have to account for it's statistics in more than one place.

I could easily change the pgstat_count_index_scan macro so that index
AMs were forced to do both, or neither. (Not that this is a real
problem.)

Can't you pull that data, instead of inventing a new place
every AMs needs to touch for it's metrics?

No. At least not in a way that's scoped to a particular index scan.

Similar per-node counter data is pulled for the global (!) counters of
pgBufferUsage, so why would it be less possible to gather such metrics
for just one index's stats here?

I told you why already, when we talked about this privately: there is
no guarantee that it's the index indicated by the scan
instrumentation. For example, due to syscache lookups. There's also
the question of how we maintain the count for things like nestloop
joins, where invocations of different index scan nodes may be freely
woven together. So it just won't work.

Besides, I thought that you wanted me to use some new field in
BTScanOpaque? But now you want me to use a global counter. Which is
it?

While I do think it won't be easy to
find a good way to integrate this into EXPLAIN's Instrumentation, I
imagine other systems (e.g. table scans) may benefit from a better
integration and explanation of pgstat statistics in EXPLAIN, too. E.g.
I'd love to be able to explain how many times which function was
called in a plans' projections, and what the relevant time expendature
for those functions is in my plans.

Seems completely unrelated.

Alternatively, you could update the patch so that only the field in
IndexScan would need to be updated by the index AM by making the
executor responsible to update the relation's stats at once at the end
of the query with the data from the IndexScanDesc.

I don't understand why this is an alternative to the other thing that
you said. Or even why it's desirable.

--
Peter Geoghegan

#15Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Peter Geoghegan (#14)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, 27 Aug 2024 at 23:40, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Aug 27, 2024 at 5:03 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

If the counter was put into the BTScanOpaque, rather than the
IndexScanDesc, then this could trivially be used in an explain AM
callback, as IndexScanDesc and ->opaque are both still available while
building the explain output.

Right, "trivial". Except in that it requires inventing a whole new
general purpose infrastructure.

Which seems to be in the process of being invented already elsewhere.

Meanwhile, Tom is arguing against even
showing this very basic information in EXPLAIN ANALYZE.You see the
problem?

I think Tom's main issue is additional clutter when running just plain
`explain analyze`, and he'd probably be fine with it if this was gated
behind e.g. VERBOSE or a new "get me the AM's view of this node"
-flag.

As a result, it wouldn't bloat the
IndexScanDesc for other index AMs who might not be interested in this
metric.

Why do you persist with the idea that it isn't useful for other index
AMs?

Because
- there are no other index AMs that would show a count that's
different from loops (Yes, I'm explicitly ignoring bitmapscan's synthetic SAOP)
- because there is already a place where this info is stored.

I mean it literally works in exactly the same way! It's literally
indistinguishable to users, and works in a way that's consistent with
historical behavior/definitions.

Historically, no statistics/explain-only info is stored in the
IndexScanDesc, all data inside that struct is relevant even when
EXPLAIN was removed from the codebase. The same is true for
TableScanDesc
Now, you want to add this metadata to the struct. I'm quite hesitant
to start walking on such a surface, as it might just be a slippery
slope.

I don't want anything, or anything done about it, but your statement
that all index AMs support SAOP (potentially non-natively) is not
true, as the non-native SAOP support is only for bitmap index scans,
and index AMs aren't guaranteed to support bitmap index scans (e.g.
pgvector's IVFFLAT and HNSW are good examples, as they only support
amgettuple).

Yes, there are some very minor exceptions -- index AMs where even
non-native SAOPs won't be used. What difference does it make?

That not all index types (even: most index types) have no interesting
performance numbers indicated by the count.

And, in this case,
the use case seems quite index-specific, at least for IS/IOS nodes.

I disagree. It's an existing concept, exposed in system views, and now
in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less.

To be precise, it is not precisely that, because it's a different
counter that an AM must update when the pgstat data is updated if it
wants the explain output to reflect the stats counter accurately.

Why does that matter?

Because to me it seels like one more thing an existing index AM's
author needs to needlessly add to its index.

When an AM forgets to update one of these metrics (or fails to realize they
have to both be updated) then they'd be out-of-sync. I'd prefer if an
AM didn't have to account for it's statistics in more than one place.

I could easily change the pgstat_count_index_scan macro so that index
AMs were forced to do both, or neither. (Not that this is a real
problem.)

That'd be one way to reduce the chances of accidental bugs, which
seems like a good start.

Can't you pull that data, instead of inventing a new place
every AMs needs to touch for it's metrics?

No. At least not in a way that's scoped to a particular index scan.

Similar per-node counter data is pulled for the global (!) counters of
pgBufferUsage, so why would it be less possible to gather such metrics
for just one index's stats here?

I told you why already, when we talked about this privately: there is
no guarantee that it's the index indicated by the scan
instrumentation.

For the pgstat entry in rel->pgstat_info, it is _exactly_ guaranteed
to be the index of the IndexScan node. pgBufferUsage happens to be
global, but pgstat_info is gathered at the relation level.

For example, due to syscache lookups.

Sure, if we're executing a query on catalogs looking at index's
numscans might count multiple index scans if the index scan needs to
access that same catalog table's data through that same catalog index,
but in those cases I think it's an acceptable count difference.

There's also
the question of how we maintain the count for things like nestloop
joins, where invocations of different index scan nodes may be freely
woven together. So it just won't work.

Gathering usage counters on interleaving execution nodes has been done
for pgBufferUsage, so I don't see how it just won't work. To me, it
seems very realistically possible.

Besides, I thought that you wanted me to use some new field in
BTScanOpaque? But now you want me to use a global counter. Which is
it?

If you think it's important to have this info on all indexes then I'd
prefer the pgstat approach over adding a field in IndexScanDescData.
If instead you think that this is primarily important to expose for
nbtree index scans, then I'd prefer putting it in the BTSO using e.g.
the index AM analyze hook approach, as I think that's much more
elegant than this.

While I do think it won't be easy to
find a good way to integrate this into EXPLAIN's Instrumentation, I
imagine other systems (e.g. table scans) may benefit from a better
integration and explanation of pgstat statistics in EXPLAIN, too. E.g.
I'd love to be able to explain how many times which function was
called in a plans' projections, and what the relevant time expendature
for those functions is in my plans.

Seems completely unrelated.

I'd call "exposing function's pgstat data in explain" at least
somewhat related to "exposing indexes' pgstat data in explain".

Alternatively, you could update the patch so that only the field in
IndexScan would need to be updated by the index AM by making the
executor responsible to update the relation's stats at once at the end
of the query with the data from the IndexScanDesc.

I don't understand why this is an alternative to the other thing that
you said. Or even why it's desirable.

I think it would be preferred over requiring Index AMs to maintain 2
fields in 2 very different locations but in the same way with the same
update pattern. With the mentioned change, they'd only have to keep
the ISD's numscans updated with rescans (or, _bt_first/_bt_search's).
Your alternative approach of making pgstat_count_index_scan update
both would probably have the same desired effect of requiring the AM
author to only mind this one entry point for counting index scan
stats.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In reply to: Matthias van de Meent (#15)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, Aug 27, 2024 at 7:22 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Tue, 27 Aug 2024 at 23:40, Peter Geoghegan <pg@bowt.ie> wrote:

Right, "trivial". Except in that it requires inventing a whole new
general purpose infrastructure.

Which seems to be in the process of being invented already elsewhere.

None of this stuff about implementation details really matters if
there isn't agreement on what actual user-visible behavior we want.
We're very far from that right now.

Meanwhile, Tom is arguing against even
showing this very basic information in EXPLAIN ANALYZE.You see the
problem?

I think Tom's main issue is additional clutter when running just plain
`explain analyze`, and he'd probably be fine with it if this was gated
behind e.g. VERBOSE or a new "get me the AM's view of this node"
-flag.

I'm not at all confident that you're right about that.

I mean it literally works in exactly the same way! It's literally
indistinguishable to users, and works in a way that's consistent with
historical behavior/definitions.

Historically, no statistics/explain-only info is stored in the
IndexScanDesc, all data inside that struct is relevant even when
EXPLAIN was removed from the codebase. The same is true for
TableScanDesc

Please try to separate questions about user-visible behavior from
questions about the implementation. Here you're answering a point I'm
making about user visible behavior with a point about where the
counter goes. It's just not relevant. At all.

Now, you want to add this metadata to the struct. I'm quite hesitant
to start walking on such a surface, as it might just be a slippery
slope.

I don't know why you seem to assume that it's inevitable that we'll
get a huge amount of similar EXPLAIN ANALYZE instrumentation, of which
this is just the start. It isn't. It's far from clear that even
something like my patch will get in.

Seems completely unrelated.

I'd call "exposing function's pgstat data in explain" at least
somewhat related to "exposing indexes' pgstat data in explain".

Not in any practical sense.

--
Peter Geoghegan

#17Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Peter Geoghegan (#16)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Wed, 28 Aug 2024 at 01:42, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Aug 27, 2024 at 7:22 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Tue, 27 Aug 2024 at 23:40, Peter Geoghegan <pg@bowt.ie> wrote:

Right, "trivial". Except in that it requires inventing a whole new
general purpose infrastructure.

Which seems to be in the process of being invented already elsewhere.

None of this stuff about implementation details really matters if
there isn't agreement on what actual user-visible behavior we want.
We're very far from that right now.

I'd expect the value to only be displayed for more verbose outputs
(such as under VERBOSE, or another option, or an as of yet
unimplemented unnamed "get me AM-specific info" option), and only if
it differed from nloops or if the index scan is otherwise interesting
and would benefit from showing this data, which would require AM
involvement to check if the scan is "interesting".
E.g. I think it's "interesting" to see only 1 index search /loop for
an index SAOP (with array >>1 attribute, or parameterized), but not at
all interesting to see 1 index search /loop for a scan with a single
equality scankey on the only key attribute: if it were anything else
that'd be an indication of serious issues (and we'd show it, because
it wouldn't be 1 search per loop).

and works in a way that's consistent with
historical behavior/definitions.

Historically, no statistics/explain-only info is stored in the
IndexScanDesc, all data inside that struct is relevant even when
EXPLAIN was removed from the codebase. The same is true for
TableScanDesc

Please try to separate questions about user-visible behavior from
questions about the implementation. Here you're answering a point I'm
making about user visible behavior with a point about where the
counter goes. It's just not relevant. At all.

I thought you were talking about type definitions with your
'definitions', but apparently not. What were you referring to with
"consistent with historical behavior/definitions"?

Now, you want to add this metadata to the struct. I'm quite hesitant
to start walking on such a surface, as it might just be a slippery
slope.

I don't know why you seem to assume that it's inevitable that we'll
get a huge amount of similar EXPLAIN ANALYZE instrumentation, of which
this is just the start. It isn't. It's far from clear that even
something like my patch will get in.

It doesn't have to be a huge amount, but I'd be extremely careful
setting a precedent where scandescs will have space reserved for data
that can be derived from other fields, and is also used by
approximately 0% of queries in any production workload (except when
autoanalyze is enabled, in which case there are other systems that
could probably gather this data).

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#18Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, Aug 27, 2024 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@bowt.ie> writes:

I see value in making it obvious to users when and how
pg_stat_all_indexes.idx_scan advances. Being able to easily relate it
to EXPLAIN ANALYZE output is useful, independent of whether or not
SAOPs happen to be used. That's probably the single best argument in
favor of showing "Index Searches: N" unconditionally. But I'm
certainly not going to refuse to budge over that.

TBH, I'm afraid that this patch basically is exposing numbers that
nobody but Peter Geoghegan and maybe two or three other hackers
will understand, and even fewer people will find useful (since the
how-many-primitive-scans behavior is not something users have any
control over, IIUC). I doubt that "it lines up with
pg_stat_all_indexes.idx_scan" is enough to justify the additional
clutter in EXPLAIN. Maybe we should be going the other direction
and trying to make pg_stat_all_indexes count in a less detailed but
less surprising way, ie once per indexscan plan node invocation.

I kind of had that reaction too initially, but I think that was mostly
because "Primitive Index Scans" seemed extremely unclear. I think
"Index Searches" is pretty comprehensible, honestly. Why shouldn't
someone be able to figure out what that means?

Might make sense to restrict this to VERBOSE mode, too.

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Robert Haas
robertmhaas@gmail.com
In reply to: Matthias van de Meent (#15)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Tue, Aug 27, 2024 at 7:22 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

Besides, I thought that you wanted me to use some new field in
BTScanOpaque? But now you want me to use a global counter. Which is
it?

If you think it's important to have this info on all indexes then I'd
prefer the pgstat approach over adding a field in IndexScanDescData.
If instead you think that this is primarily important to expose for
nbtree index scans, then I'd prefer putting it in the BTSO using e.g.
the index AM analyze hook approach, as I think that's much more
elegant than this.

I agree with this analysis. I don't see why IndexScanDesc would ever
be the right place for this.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#19)
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

On Wed, Aug 28, 2024 at 9:35 AM Robert Haas <robertmhaas@gmail.com> wrote:

If you think it's important to have this info on all indexes then I'd
prefer the pgstat approach over adding a field in IndexScanDescData.
If instead you think that this is primarily important to expose for
nbtree index scans, then I'd prefer putting it in the BTSO using e.g.
the index AM analyze hook approach, as I think that's much more
elegant than this.

I agree with this analysis. I don't see why IndexScanDesc would ever
be the right place for this.

Then what do you think is the right place?

There's no simple way to get to the planstate instrumentation from
within an index scan. You could do it by passing it down as an
argument to either ambeginscan or amrescan. But, realistically, it'd
probably be better to just add a pointer to the instrumentation to the
IndexScanDesc passed to amrescan. That's very close to what I've done
already.

--
Peter Geoghegan

#21Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#20)
In reply to: Robert Haas (#21)
In reply to: Robert Haas (#18)
In reply to: Robert Haas (#18)
#25Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Peter Geoghegan (#22)
#26Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Peter Geoghegan (#8)
In reply to: Alena Rybakina (#26)
#28Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Peter Geoghegan (#27)
In reply to: Alena Rybakina (#28)
#30Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Peter Geoghegan (#29)
#31Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Alena Rybakina (#30)
#32Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Matthias van de Meent (#31)
#33Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Alena Rybakina (#32)
In reply to: Peter Geoghegan (#27)
In reply to: Matthias van de Meent (#31)
#36Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Peter Geoghegan (#35)
In reply to: Peter Geoghegan (#22)
In reply to: Peter Geoghegan (#37)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#37)
In reply to: Robert Haas (#39)
In reply to: Peter Geoghegan (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#40)
In reply to: Robert Haas (#42)
In reply to: Peter Geoghegan (#43)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#44)
In reply to: Robert Haas (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#46)
In reply to: Robert Haas (#47)
In reply to: Robert Haas (#47)
In reply to: Peter Geoghegan (#49)
#51Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#50)
In reply to: Andres Freund (#51)