Improve output of BitmapAnd EXPLAIN ANALYZE

Started by Jim Nasbyover 9 years ago12 messageshackers
Jump to latest
#1Jim Nasby
Jim.Nasby@BlueTreble.com

A customer just pinged me wondering how it was that a BitmapAnd node was
reporting 0 tuples when the Bitmap Heap Scan above it showed it had in
fact generated tuples.

While this is mentioned in the docs, I think it would be very helpful to
have ANALYZE spit out "N/A" instead of 0 for these nodes. AFAICT that
would just require adding a special case to the "if (es->costs)" block
at line ~1204 in explain.c?

BTW, it looks like it would actually be possible to return a real
row-count if none of the TIDBitmap pages are chunks, but I'm not sure if
it's worth the extra effort.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#1)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

A customer just pinged me wondering how it was that a BitmapAnd node was
reporting 0 tuples when the Bitmap Heap Scan above it showed it had in
fact generated tuples.

While this is mentioned in the docs, I think it would be very helpful to
have ANALYZE spit out "N/A" instead of 0 for these nodes.

That would break code that tries to parse that stuff, eg depesz.com.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#2)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

A customer just pinged me wondering how it was that a BitmapAnd node was
reporting 0 tuples when the Bitmap Heap Scan above it showed it had in
fact generated tuples.

While this is mentioned in the docs, I think it would be very helpful to
have ANALYZE spit out "N/A" instead of 0 for these nodes.

That would break code that tries to parse that stuff, eg depesz.com.

I don't believe Jim was suggesting that we back-patch such a change.

Changing it in a new major release seems entirely reasonable.

Thanks!

Stephen

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#3)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

That would break code that tries to parse that stuff, eg depesz.com.

I don't believe Jim was suggesting that we back-patch such a change.

I don't either.

Changing it in a new major release seems entirely reasonable.

It's still a crock though. I wonder whether it wouldn't be better to
change the nodeBitmap code so that when EXPLAIN ANALYZE is active,
it expends extra effort to try to produce a rowcount number.

We could certainly run through the result bitmap and count the number
of exact-TID bits. I don't see a practical way of doing something
with lossy page bits, but maybe those occur infrequently enough
that we could ignore them? Or we could arbitrarily decide that
a lossy page should be counted as MaxHeapTuplesPerPage, or a bit
less arbitrarily, count it as the relation's average number
of tuples per page.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#4)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

Changing it in a new major release seems entirely reasonable.

It's still a crock though. I wonder whether it wouldn't be better to
change the nodeBitmap code so that when EXPLAIN ANALYZE is active,
it expends extra effort to try to produce a rowcount number.

I'm certainly all for doing something better, just didn't think that we
should be worried about making a change to the EXPLAIN ANALYZE output in
a major release because Depesz might have to update the explain site.

We could certainly run through the result bitmap and count the number
of exact-TID bits. I don't see a practical way of doing something
with lossy page bits, but maybe those occur infrequently enough
that we could ignore them? Or we could arbitrarily decide that
a lossy page should be counted as MaxHeapTuplesPerPage, or a bit
less arbitrarily, count it as the relation's average number
of tuples per page.

Counting each page as the relation's average number of tuples per page
seems entirely reasonable to me, for what that is trying to report.

That said, I'm a big fan of how we have more detail for things like a
HashJoin (buckets, batches, memory usage) and it might be nice to have
more information like that for a BitmapAnd (and friends). In
particular, I'm thinking of memory usage, exact vs. lossy pages, etc.
Knowing that the bitmap has gotten to the point of being lossy might
indicate that a user could up work_mem, for example, and possibly avoid
recheck costs.

Thanks!

Stephen

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Stephen Frost (#5)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

On 10/21/16 8:21 AM, Stephen Frost wrote:

Counting each page as the relation's average number of tuples per page
seems entirely reasonable to me, for what that is trying to report.

My concern is that still leaves a lot of room for confusion when
interpreting EXPLAIN ANALYZE. Every other node will tell you exactly
what happened and it's pretty easy to reason about whether rows should
have gone up or down based on the type of node. You can't do that for
Bitmap(And|Or) unless you know the details of how TIDBitmaps work.
Reporting N/A makes it crystal clear that these nodes operate very
differently than all the others.

(On a related note, it would also be nice if we reported fractional rows
when the row count low and loops is high.)

That said, I'm a big fan of how we have more detail for things like a
HashJoin (buckets, batches, memory usage) and it might be nice to have
more information like that for a BitmapAnd (and friends). In
particular, I'm thinking of memory usage, exact vs. lossy pages, etc.
Knowing that the bitmap has gotten to the point of being lossy might
indicate that a user could up work_mem, for example, and possibly avoid
recheck costs.

I think that's the best way to handle this: report N/A in the header and
then provide details on exact vs lossy. That provides a clear indication
to users that these kinds of nodes are special, as well as a reminder as
to why they're special. Certainly the node could report an exact
rowcount in the header if there were no lossy pages too.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Stephen Frost
sfrost@snowman.net
In reply to: Jim Nasby (#6)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:

On 10/21/16 8:21 AM, Stephen Frost wrote:

Counting each page as the relation's average number of tuples per page
seems entirely reasonable to me, for what that is trying to report.

My concern is that still leaves a lot of room for confusion when
interpreting EXPLAIN ANALYZE. Every other node will tell you exactly
what happened and it's pretty easy to reason about whether rows
should have gone up or down based on the type of node. You can't do
that for Bitmap(And|Or) unless you know the details of how
TIDBitmaps work. Reporting N/A makes it crystal clear that these
nodes operate very differently than all the others.

I don't see why you think the numbers reported by BitmapAnd based on
this approach wouldn't go up and down in a similar manner to what you
would expect to get, based on that node type. Reporting N/A is entirely
punting on it when we have perfectly useful information that can be
reported.

(On a related note, it would also be nice if we reported fractional
rows when the row count low and loops is high.)

I can certainly understand that, though I think I'd rather have an
actual 'total' value or similar instead, but that's really a different
discussion.

That said, I'm a big fan of how we have more detail for things like a
HashJoin (buckets, batches, memory usage) and it might be nice to have
more information like that for a BitmapAnd (and friends). In
particular, I'm thinking of memory usage, exact vs. lossy pages, etc.
Knowing that the bitmap has gotten to the point of being lossy might
indicate that a user could up work_mem, for example, and possibly avoid
recheck costs.

I think that's the best way to handle this: report N/A in the header
and then provide details on exact vs lossy. That provides a clear
indication to users that these kinds of nodes are special, as well
as a reminder as to why they're special. Certainly the node could
report an exact rowcount in the header if there were no lossy pages
too.

I don't see why we would want to stick 'N/A' in for the header, even if
we are reporting the details, when we can provide a pretty reasonable
number. In particular, I certainly don't think we would want to report
N/A sometimes (lossy case) and then an actual number other times (all
exact case). That strikes me as much more likely to be confusing.

Thanks!

Stephen

#8Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Stephen Frost (#7)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

On 10/21/16 12:30 PM, Stephen Frost wrote:

I don't see why we would want to stick 'N/A' in for the header, even if
we are reporting the details, when we can provide a pretty reasonable
number.

Because then it's absolutely clear that we don't have a valid rowcount,
only a guess (and a guess that's potentially off by a lot).

No one is used to seeing "N/A" in explain, so when they do see it
they'll immediately realize they don't know what's going on and hit
google or the docs up. Otherwise they'll just think it's an accurate
rowcount like for any other node...

In particular, I certainly don't think we would want to report
N/A sometimes (lossy case) and then an actual number other times (all
exact case). That strikes me as much more likely to be confusing.

Fair enough. I'd certainly rather have a constant N/A then a guess at
the rowcount.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Emre Hasegeli
emre@hasegeli.com
In reply to: Jim Nasby (#8)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

The BRIN Bitmap Index Scan has the same problem. I have seen people
confused by this. I think N/A would clearly improve the situation.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: Emre Hasegeli (#9)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

On Mon, Oct 31, 2016 at 6:56 AM, Emre Hasegeli <emre@hasegeli.com> wrote:

The BRIN Bitmap Index Scan has the same problem. I have seen people
confused by this. I think N/A would clearly improve the situation.

I agree. Or perhaps better still, leave rows=%.0f out altogether when
we don't have a meaningful value to report. If it were OK to use some
unimportant-looking value as a proxy for "undefined", the SQL standard
wouldn't include nulls.

I don't like Tom's proposal of trying to fake up a value here when
EXPLAIN ANALYZE is in use. Reporting "exact" and "lossy" values for
BitmapAnd would be a fine enhancement, but artificially trying to
flatten that back into a row count is going to be confusing, not
helpful. (Just last week I saw a case where the fact that many pages
were being lossified caused a performance problem ... so treating
lossy pages as if they don't exist would have led to a lot of
head-scratching, because under Tom's proposal the row count would have
been way off.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

Robert Haas <robertmhaas@gmail.com> writes:

I don't like Tom's proposal of trying to fake up a value here when
EXPLAIN ANALYZE is in use. Reporting "exact" and "lossy" values for
BitmapAnd would be a fine enhancement, but artificially trying to
flatten that back into a row count is going to be confusing, not
helpful. (Just last week I saw a case where the fact that many pages
were being lossified caused a performance problem ... so treating
lossy pages as if they don't exist would have led to a lot of
head-scratching, because under Tom's proposal the row count would have
been way off.)

It would very often be the case that the value I suggested would be exact,
so this complaint seems off-base to me.

If we were willing to add an additional output line, we could also report
the number of lossy pages in the result bitmap, and people would then
know not to trust the reported rowcount as gospel. But it's still useful
to have it. I'm envisioning output like

-> BitmapOr (cost=... rows=2000 width=0) (actual time=... rows=1942 loops=1)

in the no-lossy-pages case, otherwise

-> BitmapOr (cost=... rows=4000 width=0) (actual time=... rows=3945 loops=1)
Lossy Bitmap: exact entries=2469, lossy pages=123

There's nothing misleading about that, IMO. (Exercise for the reader:
what rows/page estimate did I assume?)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: Improve output of BitmapAnd EXPLAIN ANALYZE

On Tue, Nov 1, 2016 at 9:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I don't like Tom's proposal of trying to fake up a value here when
EXPLAIN ANALYZE is in use. Reporting "exact" and "lossy" values for
BitmapAnd would be a fine enhancement, but artificially trying to
flatten that back into a row count is going to be confusing, not
helpful. (Just last week I saw a case where the fact that many pages
were being lossified caused a performance problem ... so treating
lossy pages as if they don't exist would have led to a lot of
head-scratching, because under Tom's proposal the row count would have
been way off.)

It would very often be the case that the value I suggested would be exact,
so this complaint seems off-base to me.

From my point of view, something that very often gives the right
answers isn't acceptable. We certainly wouldn't accept a query
optimization that very often gives the right answers. It's gotta
always give the right answer.

If we were willing to add an additional output line, we could also report
the number of lossy pages in the result bitmap, and people would then
know not to trust the reported rowcount as gospel. But it's still useful
to have it. I'm envisioning output like

-> BitmapOr (cost=... rows=2000 width=0) (actual time=... rows=1942 loops=1)

in the no-lossy-pages case, otherwise

-> BitmapOr (cost=... rows=4000 width=0) (actual time=... rows=3945 loops=1)
Lossy Bitmap: exact entries=2469, lossy pages=123

There's nothing misleading about that, IMO. (Exercise for the reader:
what rows/page estimate did I assume?)

(4000-2469)/123 = 12.44715 ?

I think it's inherently misleading to report values that were
concocted specifically for EXPLAIN ANALYZE. Things that we report
there should have some underlying reality or relevance. People -
including me - tend to assume they do, and you don't want to spend
time chasing down something that's PURELY an EXPLAIN ANALYZE artifact
with no actual relevance to the runtime behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers