Document if width_bucket's low and high are inclusive/exclusive

Started by Ben Peachey Higdonabout 1 year ago10 messagesdocs
Jump to latest
#1Ben Peachey Higdon
bpeacheyhigdon@gmail.com

The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html <https://www.postgresql.org/docs/current/functions-math.html&gt;) does not mention if the range’s low and high are inclusive or exclusive.

Returns the number of the bucket in which operand falls in a histogram having count equal-width buckets spanning the range low to high. Returns 0 or count+1 for an input outside that range.

I had assumed that both the low and high were inclusive but actually the low is inclusive while the high is exclusive.

For example:
SELECT width_bucket(0, 0, 1, 4)

returns 1, the first of 4 bins

SELECT width_bucket(1, 0, 1, 4)

returns 5, because the high was outside the exclusive bound of high = 1

Thank you!

#2Robert Treat
xzilla@users.sourceforge.net
In reply to: Ben Peachey Higdon (#1)
Re: Document if width_bucket's low and high are inclusive/exclusive

On Fri, Feb 28, 2025 at 7:15 AM Ben Peachey Higdon
<bpeacheyhigdon@gmail.com> wrote:

The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not mention if the range’s low and high are inclusive or exclusive.

Returns the number of the bucket in which operand falls in a histogram having count equal-width buckets spanning the range low to high. Returns 0 or count+1 for an input outside that range.

I had assumed that both the low and high were inclusive but actually the low is inclusive while the high is exclusive.

I'm not sure it's the most ground breaking thing, but would probably
save a bunch of future people from having to gin up an example to test
it, so I'd probably update it per the following patch.

Robert Treat
https://xzilla.net

Attachments:

v1-0001-Document-width_bucket-range-as-inclusive-exclusiv.patchapplication/octet-stream; name=v1-0001-Document-width_bucket-range-as-inclusive-exclusiv.patchDownload+1-2
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Treat (#2)
Re: Document if width_bucket's low and high are inclusive/exclusive

Robert Treat <rob@xzilla.net> writes:

On Fri, Feb 28, 2025 at 7:15 AM Ben Peachey Higdon
<bpeacheyhigdon@gmail.com> wrote:

The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not mention if the range’s low and high are inclusive or exclusive.

I'm not sure it's the most ground breaking thing, but would probably
save a bunch of future people from having to gin up an example to test
it, so I'd probably update it per the following patch.

Seems reasonable, but do we need to do anything with the other
version of width_bucket (the one taking an array of lower bounds)?
Perhaps this change provides enough context, but I'm unsure.

regards, tom lane

#4Robert Treat
xzilla@users.sourceforge.net
In reply to: Tom Lane (#3)
Re: Document if width_bucket's low and high are inclusive/exclusive

On Wed, Jun 18, 2025 at 4:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Treat <rob@xzilla.net> writes:

On Fri, Feb 28, 2025 at 7:15 AM Ben Peachey Higdon
<bpeacheyhigdon@gmail.com> wrote:

The current documentation for width_bucket (https://www.postgresql.org/docs/current/functions-math.html) does not mention if the range’s low and high are inclusive or exclusive.

I'm not sure it's the most ground breaking thing, but would probably
save a bunch of future people from having to gin up an example to test
it, so I'd probably update it per the following patch.

Seems reasonable, but do we need to do anything with the other
version of width_bucket (the one taking an array of lower bounds)?
Perhaps this change provides enough context, but I'm unsure.

Since they are all lower bounds, they all operate the same way, so it
isn't quite as clear that it needs documenting. Are you thinking
something like this?

Returns the number of the bucket in which operand falls given an array
listing the lower bounds (inclusive) of the buckets

Robert Treat
https://xzilla.net

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Treat (#4)
Re: Document if width_bucket's low and high are inclusive/exclusive

Robert Treat <rob@xzilla.net> writes:

Since they are all lower bounds, they all operate the same way, so it
isn't quite as clear that it needs documenting. Are you thinking
something like this?

Returns the number of the bucket in which operand falls given an array
listing the lower bounds (inclusive) of the buckets

Yeah, though I might write "inclusive lower bounds" rather than use
parens. What's bugging me though is the lack of any mention of the
bucket upper bounds: you have to deduce that the upper bounds must
be exclusive if the lower bounds are inclusive. If that's obvious
here, why is it non-obvious for the other case? Maybe instead of
the parenthetical form you suggested, add a sentence like

Buckets have inclusive lower bounds, and therefore exclusive
upper bounds.

and then we could either rely on the reader remembering that,
or else repeat it, for the second form of width_bucket.

Another thing I just remembered (think I knew it once) is the
behavior of the first form when low > high. It's not an error!
I think we need to document that, perhaps along the lines of

If low > high, the behavior is mirror-reversed, with bucket 1
now being the one just below low, and the inclusive bounds
now being on the upper side.

plus an example.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: Document if width_bucket's low and high are inclusive/exclusive

I wrote:

Another thing I just remembered (think I knew it once) is the
behavior of the first form when low > high. It's not an error!

So concretely, how about the attached? In addition to what we
mentioned so far, I made the sentence about out-of-range cases
more explicit.

regards, tom lane

Attachments:

v2-0001-Document-width_bucket-range-as-inclusive-exclusiv.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Document-width_bucket-range-as-inclusive-exclusiv.p; name*1=atchDownload+15-4
#7Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#6)
Re: Document if width_bucket's low and high are inclusive/exclusive

On Fri, 20 Jun 2025 at 22:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So concretely, how about the attached?

LGTM (though I'm not sure it really needs the word "therefore" in the
first hunk).

There are also a couple of code comments that need fixing --
width_bucket_float8() comes with the following comment:

* 'bound1' and 'bound2' are the lower and upper bounds of the
* histogram's range, respectively. 'count' is the number of buckets
* in the histogram. width_bucket() returns an integer indicating the
* bucket number that 'operand' belongs to in an equiwidth histogram
* with the specified characteristics. An operand smaller than the
* lower bound is assigned to bucket 0. An operand greater than the
* upper bound is assigned to an additional bucket (with number
* count+1). We don't allow "NaN" for any of the float8 inputs, and we
* don't allow either of the histogram bounds to be +/- infinity.

so at the very least, that should be made to say "greater than or
equal to", instead of "greater than". Similarly for
width_bucket_numeric().

Also, since PG14, type numeric has supported infinity, so its comment
should probably include that last part about not allowing +/- infinity
in the histogram bounds.

Regards,
Dean

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#7)
Re: Document if width_bucket's low and high are inclusive/exclusive

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Fri, 20 Jun 2025 at 22:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So concretely, how about the attached?

LGTM (though I'm not sure it really needs the word "therefore" in the
first hunk).

OK, done that way.

There are also a couple of code comments that need fixing --

Good points, also done.

While looking at those comments, I also noted that there is a
strange inconsistency between width_bucket_array and
width_bucket_float8/width_bucket_numeric. Namely, the latter
two reject an "operand" that is NaN, while width_bucket_array
goes out of its way to accept it and treat it in our usual
fashion as sorting higher than all non-NaNs.

Clearly these functions must reject NaN histogram bounds, for
the same reason they reject infinite bounds. But I don't see
any reason why they couldn't treat a NaN operand as valid.
Should we change them? (I imagine this'd be a HEAD-only
change, and probably v19 material at this point.)

regards, tom lane

#9Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tom Lane (#8)
Re: Document if width_bucket's low and high are inclusive/exclusive

On Sat, 21 Jun 2025 at 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

While looking at those comments, I also noted that there is a
strange inconsistency between width_bucket_array and
width_bucket_float8/width_bucket_numeric. Namely, the latter
two reject an "operand" that is NaN, while width_bucket_array
goes out of its way to accept it and treat it in our usual
fashion as sorting higher than all non-NaNs.

Clearly these functions must reject NaN histogram bounds, for
the same reason they reject infinite bounds. But I don't see
any reason why they couldn't treat a NaN operand as valid.
Should we change them? (I imagine this'd be a HEAD-only
change, and probably v19 material at this point.)

Yes, I think that's a good idea (for v19 I would have thought).
Allowing the operand to be NaN definitely seems preferable to throwing
an error, since the operand might well come from data in a table
containing NaNs.

Regards,
Dean

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#9)
Re: Document if width_bucket's low and high are inclusive/exclusive

Dean Rasheed <dean.a.rasheed@gmail.com> writes:

On Sat, 21 Jun 2025 at 18:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Clearly these functions must reject NaN histogram bounds, for
the same reason they reject infinite bounds. But I don't see
any reason why they couldn't treat a NaN operand as valid.
Should we change them? (I imagine this'd be a HEAD-only
change, and probably v19 material at this point.)

Yes, I think that's a good idea (for v19 I would have thought).
Allowing the operand to be NaN definitely seems preferable to throwing
an error, since the operand might well come from data in a table
containing NaNs.

I started a new thread for that, since it's no longer docs material:

/messages/by-id/2822872.1750540911@sss.pgh.pa.us

regards, tom lane