[PATCH] expand the units that pg_size_pretty supports on output

Started by David Christensenabout 5 years ago37 messageshackers

david.christensen@crunchydata.com

about 5 years ago

Hi folks,

Enclosed is a patch that expands the unit output for
pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing
numeric output code to account for the larger number of units we're using
rather than just adding nesting levels.

There are also a few other places that could benefit from expanded units,
but this is a standalone starting point.

Best,

David

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: David Christensen (#1)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Wed, Apr 14, 2021 at 11:13:47AM -0500, David Christensen wrote:

Enclosed is a patch that expands the unit output for
pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing
numeric output code to account for the larger number of units we're using
rather than just adding nesting levels.

There are also a few other places that could benefit from expanded units,
but this is a standalone starting point.

Please don't forget to add this patch to the next commit fest of July
if you want to get some reviews:
https://commitfest.postgresql.org/33/

Note that the development of Postgres 14 is over, and that there was a
feature freeze last week, but this can be considered for 15.
--
Michael

David Christensen

david.christensen@crunchydata.com

about 5 years ago

In reply to: David Christensen (#1)

Re: [PATCH] expand the units that pg_size_pretty supports on output

A second patch to teach the same units to pg_size_bytes().

Best,

David

On Wed, Apr 14, 2021 at 11:13 AM David Christensen <
david.christensen@crunchydata.com> wrote:

Show quoted text

Hi folks,

Enclosed is a patch that expands the unit output for
pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing
numeric output code to account for the larger number of units we're using
rather than just adding nesting levels.

There are also a few other places that could benefit from expanded units,
but this is a standalone starting point.

Best,

David

Asif Rehman

asifr.rehman@gmail.com

almost 5 years ago

In reply to: David Christensen (#3)

Re: [PATCH] expand the units that pg_size_pretty supports on output

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

Hi David,

I was reviewing this patch and the compilation failed with following error on CentOS 7.

dbsize.c: In function ‘pg_size_bytes’:
dbsize.c:808:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
const int unit_count = 9; /* sizeof units table */
^
dbsize.c:809:3: error: variable length array ‘units’ is used [-Werror=vla]
const char *units[unit_count] = {
^

I believe "unit_count" ought to be a #define here.

Regards,
Asif

The new status of this patch is: Waiting on Author

David Christensen

david.christensen@crunchydata.com

almost 5 years ago

In reply to: David Christensen (#3)

Re: [PATCH] expand the units that pg_size_pretty supports on output

New versions attached to address the initial CF feedback and rebase on HEAD
as of now.

0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch

- expands the units that pg_size_pretty() can handle up to YB.

0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch

- expands the units that pg_size_bytes() can handle, up to YB.

Shinya11.Kato@nttdata.com

almost 5 years ago

In reply to: David Christensen (#5)

RE: [PATCH] expand the units that pg_size_pretty supports on output

From: David Christensen <david.christensen@crunchydata.com>
Sent: Friday, June 4, 2021 4:18 AM
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [PATCH] expand the units that pg_size_pretty supports on output

New versions attached to address the initial CF feedback and rebase on HEAD as of now.

0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch

- expands the units that pg_size_pretty() can handle up to YB.

0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch

- expands the units that pg_size_bytes() can handle, up to YB.

I don't see the need to extend the unit to YB.
What use case do you have in mind?

Regards,
Shinya Kato

David Christensen

david.christensen@crunchydata.com

almost 5 years ago

In reply to: Shinya11.Kato@nttdata.com (#6)

Re: [PATCH] expand the units that pg_size_pretty supports on output

I don't see the need to extend the unit to YB.
What use case do you have in mind?

Practical or no, I saw no reason not to support all defined units. I assume we’ll get to a need sooner or later. :)

David

Shinya11.Kato@nttdata.com

almost 5 years ago

In reply to: David Christensen (#7)

RE: [PATCH] expand the units that pg_size_pretty supports on output

I don't see the need to extend the unit to YB.
What use case do you have in mind?

Practical or no, I saw no reason not to support all defined units. I assume we’ll
get to a need sooner or later. :)

Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?

Best regards,
Shinya Kato

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Shinya11.Kato@nttdata.com (#8)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Tue, 15 Jun 2021 at 21:24, <Shinya11.Kato@nttdata.com> wrote:

Hmmm, I didn't think YB was necessary, but what do others think?

For me personally, without consulting Wikipedia, I know that Petabyte
comes after Terabyte and then I'm pretty sure it's Exabyte. After
that, I'd need to check.

Assuming I'm not the only person who can't tell exactly how many bytes
are in a Yottabyte, would it actually be a readability improvement if
we started showing these units to people?

I'd say there might be some argument to implement as far as PB one
day, maybe not that far out into the future, especially if we got
something like built-in clustering. But I just don't think there's any
need to go all out and take it all the way to YB. There's an above
zero chance we'll break something of someones by doing this, so I
think any changes here should be driven off an actual requirement.

I really think this change is more likely to upset someone than please someone.

Just my thoughts.

David

#10

Isaac Morland

isaac.morland@gmail.com

almost 5 years ago

In reply to: Shinya11.Kato@nttdata.com (#8)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Tue, 15 Jun 2021 at 05:24, <Shinya11.Kato@nttdata.com> wrote:

I don't see the need to extend the unit to YB.
What use case do you have in mind?

Practical or no, I saw no reason not to support all defined units. I

assume we’ll

get to a need sooner or later. :)

Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?

If I’m reading the code correctly, the difference between supporting YB and
not supporting it is whether there is a line for it in the list of prefixes
and their multiples. As such, I don’t see why we’re even discussing whether
or not to include all the standard prefixes. A YB is still an absurd amount
of storage, but that’s not the point; just put all the standard prefixes
and be done with it. If actual code changes were required in the new code
as they are in the old it might be worth discussing.

One question: why is there no “k” in the list of prefixes?

Also: why not have only the prefixes in the array, and use a single fixed
output format "%s %sB" all the time?

It feels like it should be possible to calculate the appropriate idx to use
(while adjusting the number to print as is done now) and then just have one
psprintf call for all cases.

A more significant question is YB vs. YiB. I know there is a long tradition
within computer-related fields of saying that k = 1024, M = 1024^2, etc.,
but we’re not special enough to override the more general principles of SI
(Système International) which provide that k = 1000, M = 1000^2 and so on
universally and provide the alternate prefixes ki, Mi, etc. which use 1024
as the multiple.

So I would suggest either display 2000000 as 2MB or as 1.907MiB.

#11

David Christensen

david.christensen@crunchydata.com

almost 5 years ago

In reply to: Isaac Morland (#10)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Tue, Jun 15, 2021 at 8:31 AM Isaac Morland <isaac.morland@gmail.com>
wrote:

On Tue, 15 Jun 2021 at 05:24, <Shinya11.Kato@nttdata.com> wrote:

I don't see the need to extend the unit to YB.
What use case do you have in mind?

Practical or no, I saw no reason not to support all defined units. I

assume we’ll

get to a need sooner or later. :)

Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?

If I’m reading the code correctly, the difference between supporting YB
and not supporting it is whether there is a line for it in the list of
prefixes and their multiples. As such, I don’t see why we’re even
discussing whether or not to include all the standard prefixes. A YB is
still an absurd amount of storage, but that’s not the point; just put all
the standard prefixes and be done with it. If actual code changes were
required in the new code as they are in the old it might be worth
discussing.

Agreed, this is why I went this way. One and done.

One question: why is there no “k” in the list of prefixes?

kB has a special-case code block before you get to this point. I didn't
look into the reasons, but assume there are some.

Also: why not have only the prefixes in the array, and use a single fixed
output format "%s %sB" all the time?

It feels like it should be possible to calculate the appropriate idx to
use (while adjusting the number to print as is done now) and then just have
one psprintf call for all cases.

Sure, if that seems more readable/understandable.

A more significant question is YB vs. YiB. I know there is a long
tradition within computer-related fields of saying that k = 1024, M =
1024^2, etc., but we’re not special enough to override the more general
principles of SI (Système International) which provide that k = 1000, M =
1000^2 and so on universally and provide the alternate prefixes ki, Mi,
etc. which use 1024 as the multiple.

So I would suggest either display 2000000 as 2MB or as 1.907MiB.

Heh, I was just expanding the existing logic; if others want to have this
particular battle go ahead and I'll adjust the code/prefixes, but obviously
the logic will need to change if we want to support true MB instead of MiB
as MB.

Also, this will presumably be a breaking change for anyone using the
existing units MB == 1024 * 1024, as we've had for something like 20
years. Changing these units to the *iB will be trivial with this patch,
but not looking forward to garnering the consensus to change this part.

David

#12

David Christensen

david.christensen@crunchydata.com

almost 5 years ago

In reply to: David Rowley (#9)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Tue, Jun 15, 2021 at 8:26 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 15 Jun 2021 at 21:24, <Shinya11.Kato@nttdata.com> wrote:

Hmmm, I didn't think YB was necessary, but what do others think?

For me personally, without consulting Wikipedia, I know that Petabyte
comes after Terabyte and then I'm pretty sure it's Exabyte. After
that, I'd need to check.

Assuming I'm not the only person who can't tell exactly how many bytes
are in a Yottabyte, would it actually be a readability improvement if
we started showing these units to people?

I hadn't really thought about that TBH; to me it seemed like an
improvement, but I do see that others might not, and adding confusion is
definitely not helpful. That said, it seems like having the code
structured in a way that we can expand via adding an element to a table
instead of the existing way it's written with nested if blocks is still a
useful refactor, whatever we decide the cutoff units should be.

I'd say there might be some argument to implement as far as PB one
day, maybe not that far out into the future, especially if we got
something like built-in clustering. But I just don't think there's any
need to go all out and take it all the way to YB. There's an above
zero chance we'll break something of someones by doing this, so I
think any changes here should be driven off an actual requirement.

I got motivated to do this due to some (granted synthetic) work/workloads,
where I was seeing 6+digit TB numbers and thought it was ugly. Looked at
the code and thought the refactor was the way to go, and just stuck all of
the known units in.

I really think this change is more likely to upset someone than please
someone.

I'd be interested to see reactions from people; to me, it seems a +1, but
seems like -1, 0, +1 all valid opinions here; I'd expect more 0's and +1s,
but I'm probably biased since I wrote this. :-)

#13

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Christensen (#12)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Wed, 16 Jun 2021 at 02:58, David Christensen
<david.christensen@crunchydata.com> wrote:

That said, it seems like having the code structured in a way that we can expand via adding an element to a table instead of the existing way it's written with nested if blocks is still a useful refactor, whatever we decide the cutoff units should be.

I had not really looked at the patch, but if there's a cleanup portion
to the same patch as you're adding the YB too, then maybe it's worth
separating those out into another patch so that the two can be
considered independently.

David

#14

Shinya11.Kato@nttdata.com

almost 5 years ago

In reply to: David Rowley (#13)

RE: [PATCH] expand the units that pg_size_pretty supports on output

I had not really looked at the patch, but if there's a cleanup portion to the same
patch as you're adding the YB too, then maybe it's worth separating those out
into another patch so that the two can be considered independently.

I agree with this opinion. It seems to me that we should think about units and refactoring separately.
Sorry for the confusion.

Best regards,
Shinya Kato

#15

David Christensen

david.christensen@crunchydata.com

almost 5 years ago

In reply to: Shinya11.Kato@nttdata.com (#14)

Re: [PATCH] expand the units that pg_size_pretty supports on output

I had not really looked at the patch, but if there's a cleanup portion to the same
patch as you're adding the YB too, then maybe it's worth separating those out
into another patch so that the two can be considered independently.

I agree with this opinion. It seems to me that we should think about units and refactoring separately.
Sorry for the confusion.

Sure thing, I think that makes sense. Refactor with existing units and debate the number of additions units to include. I do think Petabytes and Exabytes are at least within the realm of ones we should include; less tied to ZB and YB; just included for completeness.

#16

David Christensen

david.christensen@crunchydata.com

almost 5 years ago

In reply to: Shinya11.Kato@nttdata.com (#14)

Re: [PATCH] expand the units that pg_size_pretty supports on output

Shinya11.Kato@nttdata.com writes:

I had not really looked at the patch, but if there's a cleanup portion to the same
patch as you're adding the YB too, then maybe it's worth separating those out
into another patch so that the two can be considered independently.

I agree with this opinion. It seems to me that we should think about units and refactoring separately.
Sorry for the confusion.

Best regards,
Shinya Kato

Hi folks,

Had some time to rework this patch from the two that had previously been
here into two separate parts:

1) A basic refactor of the existing code to easily handle expanding the
units we use into a table-based format. This also includes changing the
return value of `pg_size_bytes()` from an int64 into a numeric, and
minor test adjustments to reflect this.

2) Expanding the units that both pg_size_bytes() and pg_size_pretty()
recognize up through Yottabytes. This includes documentation and test
updates to reflect the changes made here. How many additional units we
add here is up for discussion (inevitably), but my opinion remains that
there is no harm in supporting all units available.

Best,

David

#17

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Christensen (#16)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Wed, 30 Jun 2021 at 05:11, David Christensen
<david.christensen@crunchydata.com> wrote:

1) A basic refactor of the existing code to easily handle expanding the
units we use into a table-based format. This also includes changing the
return value of `pg_size_bytes()` from an int64 into a numeric, and
minor test adjustments to reflect this.

This is not quite what I had imagined when you said about adding a
table to make it easier to add new units in the future. I expected a
single table that handles all units, not just the ones above kB and
not one for each function.

There are actually two pg_size_pretty functions, one for BIGINT and
one for NUMERIC. I see you only changed the NUMERIC version. I'd
expect them both to have the same treatment and use the same table so
there's consistency between the two functions.

The attached is more like what I had in mind. There's a very small net
reduction in lines of code with this and it also helps keep
pg_size_pretty() and pg_size_pretty_numeric() in sync.

I don't really like the fact that I had to add the doHalfRound field
to get the same rounding behaviour as the original functions. I'm
wondering if it would just be too clever just to track how many bits
we've shifted right by in pg_size_pretty* and compare that to the
value of multishift for the current unit and do appropriate rounding
to get us to the value of multishift. In theory, we could just keep
calling the half_rounded macro until we make it to the multishift
value. My current thoughts are that it's unlikely that anyone would
twiddle with the size_pretty_units array in such a way for the extra
code to be worth it. Maybe someone else feels differently.

David

#18

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: David Rowley (#17)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Mon, 5 Jul 2021 at 20:00, David Rowley <dgrowleyml@gmail.com> wrote:

I don't really like the fact that I had to add the doHalfRound field
to get the same rounding behaviour as the original functions. I'm
wondering if it would just be too clever just to track how many bits
we've shifted right by in pg_size_pretty* and compare that to the
value of multishift for the current unit and do appropriate rounding
to get us to the value of multishift. In theory, we could just keep
calling the half_rounded macro until we make it to the multishift
value. My current thoughts are that it's unlikely that anyone would
twiddle with the size_pretty_units array in such a way for the extra
code to be worth it. Maybe someone else feels differently.

I made another pass over this and ended up removing the doHalfRound
field in favour of just doing rounding based on the previous
bitshifts.

I did a few other tidy ups and I think it's a useful patch as it
reduces the amount of code a bit and makes it dead simple to add new
units in the future. Most importantly it'll help keep pg_size_pretty,
pg_size_pretty_numeric and pg_size_bytes all in sync in regards to
what units they support.

Does anyone want to have a look over this? If not, I plan to push it
in the next day or so.

(I'm not sure why pgindent removed the space between != and NULL, but
it did, so I left it.)

David

#19

Dean Rasheed

dean.a.rasheed@gmail.com

almost 5 years ago

In reply to: David Rowley (#18)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Tue, 6 Jul 2021 at 10:20, David Rowley <dgrowleyml@gmail.com> wrote:

I made another pass over this and ended up removing the doHalfRound
field in favour of just doing rounding based on the previous
bitshifts.

When I first read this:

+            /* half-round until we get down to unitBits */
+            while (rightshifts++ < unit->unitBits)
+                size = half_rounded(size);

it looked to me like it would be invoking half_rounded() multiple
times, which raised alarm bells because that would risk rounding the
wrong way. Eventually I realised that by the time it reaches that,
rightshifts will always equal unit->unitBits or unit->unitBits - 1, so
it'll never do more than one half-round, which is important.

So perhaps using doHalfRound would be clearer, but it could just be a
local variable tracking whether or not it's the first time through the
loop.

Regards,
Dean

#20

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Dean Rasheed (#19)

Re: [PATCH] expand the units that pg_size_pretty supports on output

On Tue, 6 Jul 2021 at 23:39, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

When I first read this:
+            /* half-round until we get down to unitBits */
+            while (rightshifts++ < unit->unitBits)
+                size = half_rounded(size);
it looked to me like it would be invoking half_rounded() multiple
times, which raised alarm bells because that would risk rounding the
wrong way. Eventually I realised that by the time it reaches that,
rightshifts will always equal unit->unitBits or unit->unitBits - 1, so
it'll never do more than one half-round, which is important.

It's true that based on how the units table is set up now, it'll only
ever do it once for all but the first loop.

I wrote the attached .c file just to try to see if it ever goes wrong
and I didn't manage to find any inputs where it did. I always seem to
get the half rounded value either the same as the shifted value or 1
higher towards positive infinity

$ ./half_rounded -102 3
1. half_round(-102) == -51 :: -102 >> 1 = -51
2. half_round(-51) == -25 :: -51 >> 1 = -26
3. half_round(-25) == -12 :: -26 >> 1 = -13

$ ./half_rounded 6432 3
1. half_round(6432) == 3216 :: 6432 >> 1 = 3216
2. half_round(3216) == 1608 :: 3216 >> 1 = 1608
3. half_round(1608) == 804 :: 1608 >> 1 = 804

Can you give an example where calling half_rounded too many times will
give the wrong value? Keeping in mind we call half_rounded the number
of times that the passed in value would need to be left-shifted by to
get the equivalent truncated value.

David

#21

Dean Rasheed

dean.a.rasheed@gmail.com

almost 5 years ago

In reply to: David Rowley (#20)

#22

David Christensen

david.christensen@crunchydata.com

almost 5 years ago

In reply to: David Rowley (#18)

#23