[PATCH] expand the units that pg_size_pretty supports on output
Hi folks,
Enclosed is a patch that expands the unit output for
pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing
numeric output code to account for the larger number of units we're using
rather than just adding nesting levels.
There are also a few other places that could benefit from expanded units,
but this is a standalone starting point.
Best,
David
Attachments:
0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patchapplication/octet-stream; name=0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patchDownload
From e4ffb11d747e5253ef48e751821a53c441e25956 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Fri, 2 Apr 2021 14:35:31 -0500
Subject: [PATCH] Expand the units that pg_size_pretty(numeric) knows about
Rework the logic slightly to allow a lookup table-based approach for
displaying units. Add units up to yottabytes, with easy expansion if
this ever needs to grow.
---
doc/src/sgml/func.sgml | 2 +-
src/backend/utils/adt/dbsize.c | 43 ++++++++++--------
src/test/regress/expected/dbsize.out | 67 +++++++++++++++++++---------
src/test/regress/sql/dbsize.sql | 27 ++++++++---
4 files changed, 94 insertions(+), 45 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c6a45d9e55..c48137c929 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26339,7 +26339,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
</para>
<para>
Converts a size in bytes into a more easily human-readable format with
- size units (bytes, kB, MB, GB or TB as appropriate). Note that the
+ size units (bytes, kB, MB, GB, TB, PB, EB, ZB, or YB as appropriate). Note that the
units are powers of 2 rather than powers of 10, so 1kB is 1024 bytes,
1MB is 1024<superscript>2</superscript> = 1048576 bytes, and so on.
</para></entry>
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index da1a879f1f..31ed49f8ed 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -638,7 +638,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
Numeric size = PG_GETARG_NUMERIC(0);
Numeric limit,
limit2;
- char *result;
+ char *result = NULL;
limit = int64_to_numeric(10 * 1024);
limit2 = int64_to_numeric(10 * 1024 * 2 - 1);
@@ -660,31 +660,36 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
}
else
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s MB", numeric_to_cstring(size));
- }
- else
- {
+ int idx, max_iter = 6; /* highest index of table_below */
+ char *output_formats[] = {
+ "%s MB",
+ "%s GB",
+ "%s TB",
+ "%s PB",
+ "%s EB",
+ "%s ZB",
+ "%s YB"
+ };
+
+ for (idx = 0; idx < max_iter; idx++) {
/* size >>= 10 */
size = numeric_shift_right(size, 10);
-
if (numeric_is_less(numeric_absolute(size), limit2))
{
size = numeric_half_rounded(size);
- result = psprintf("%s GB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- size = numeric_half_rounded(size);
- result = psprintf("%s TB", numeric_to_cstring(size));
+ result = psprintf(output_formats[idx], numeric_to_cstring(size));
+ break;
}
}
+
+ if (!result) {
+ /* this uses the last format in the table above for anything else */
+
+ /* size >>= 10 */
+ size = numeric_shift_right(size, 10);
+ size = numeric_half_rounded(size);
+ result = psprintf(output_formats[max_iter], numeric_to_cstring(size));
+ }
}
}
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index e901a2c92a..b93d40fb02 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -13,27 +13,54 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(6 rows)
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
- (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
- (1000000000::numeric), (1000000000000::numeric),
+ (VALUES (10::numeric),
+ (1000::numeric),
+ (1000000::numeric),
+ (1000000000::numeric),
+ (1000000000000::numeric),
(1000000000000000::numeric),
- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
- (1000000000.5::numeric), (1000000000000.5::numeric),
- (1000000000000000.5::numeric)) x(size);
- size | pg_size_pretty | pg_size_pretty
---------------------+----------------+----------------
- 10 | 10 bytes | -10 bytes
- 1000 | 1000 bytes | -1000 bytes
- 1000000 | 977 kB | -977 kB
- 1000000000 | 954 MB | -954 MB
- 1000000000000 | 931 GB | -931 GB
- 1000000000000000 | 909 TB | -909 TB
- 10.5 | 10.5 bytes | -10.5 bytes
- 1000.5 | 1000.5 bytes | -1000.5 bytes
- 1000000.5 | 977 kB | -977 kB
- 1000000000.5 | 954 MB | -954 MB
- 1000000000000.5 | 931 GB | -931 GB
- 1000000000000000.5 | 909 TB | -909 TB
-(12 rows)
+ (1000000000000000000::numeric),
+ (1000000000000000000000::numeric),
+ (1000000000000000000000000::numeric),
+ (1000000000000000000000000000::numeric),
+ (1000000000000000000000000000000::numeric),
+ (10.5::numeric),
+ (1000.5::numeric),
+ (1000000.5::numeric),
+ (1000000000.5::numeric),
+ (1000000000000.5::numeric),
+ (1000000000000000.5::numeric),
+ (1000000000000000000.5::numeric),
+ (1000000000000000000000.5::numeric),
+ (1000000000000000000000000.5::numeric),
+ (1000000000000000000000000000.5::numeric),
+ (1000000000000000000000000000000.5::numeric)
+ ) x(size);
+ size | pg_size_pretty | pg_size_pretty
+-----------------------------------+----------------+----------------
+ 10 | 10 bytes | -10 bytes
+ 1000 | 1000 bytes | -1000 bytes
+ 1000000 | 977 kB | -977 kB
+ 1000000000 | 954 MB | -954 MB
+ 1000000000000 | 931 GB | -931 GB
+ 1000000000000000 | 909 TB | -909 TB
+ 1000000000000000000 | 888 PB | -888 PB
+ 1000000000000000000000 | 867 EB | -867 EB
+ 1000000000000000000000000 | 847 ZB | -847 ZB
+ 1000000000000000000000000000 | 827 YB | -827 YB
+ 1000000000000000000000000000000 | 827181 YB | -827181 YB
+ 10.5 | 10.5 bytes | -10.5 bytes
+ 1000.5 | 1000.5 bytes | -1000.5 bytes
+ 1000000.5 | 977 kB | -977 kB
+ 1000000000.5 | 954 MB | -954 MB
+ 1000000000000.5 | 931 GB | -931 GB
+ 1000000000000000.5 | 909 TB | -909 TB
+ 1000000000000000000.5 | 888 PB | -888 PB
+ 1000000000000000000000.5 | 867 EB | -867 EB
+ 1000000000000000000000000.5 | 847 ZB | -847 ZB
+ 1000000000000000000000000000.5 | 827 YB | -827 YB
+ 1000000000000000000000000000000.5 | 827181 YB | -827181 YB
+(22 rows)
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index d10a4d7f68..a97dacbf7b 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -4,12 +4,29 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(1000000000000000::bigint)) x(size);
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
- (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
- (1000000000::numeric), (1000000000000::numeric),
+ (VALUES (10::numeric),
+ (1000::numeric),
+ (1000000::numeric),
+ (1000000000::numeric),
+ (1000000000000::numeric),
(1000000000000000::numeric),
- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
- (1000000000.5::numeric), (1000000000000.5::numeric),
- (1000000000000000.5::numeric)) x(size);
+ (1000000000000000000::numeric),
+ (1000000000000000000000::numeric),
+ (1000000000000000000000000::numeric),
+ (1000000000000000000000000000::numeric),
+ (1000000000000000000000000000000::numeric),
+ (10.5::numeric),
+ (1000.5::numeric),
+ (1000000.5::numeric),
+ (1000000000.5::numeric),
+ (1000000000000.5::numeric),
+ (1000000000000000.5::numeric),
+ (1000000000000000000.5::numeric),
+ (1000000000000000000000.5::numeric),
+ (1000000000000000000000000.5::numeric),
+ (1000000000000000000000000000.5::numeric),
+ (1000000000000000000000000000000.5::numeric)
+ ) x(size);
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
--
2.24.3 (Apple Git-128)
On Wed, Apr 14, 2021 at 11:13:47AM -0500, David Christensen wrote:
Enclosed is a patch that expands the unit output for
pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing
numeric output code to account for the larger number of units we're using
rather than just adding nesting levels.There are also a few other places that could benefit from expanded units,
but this is a standalone starting point.
Please don't forget to add this patch to the next commit fest of July
if you want to get some reviews:
https://commitfest.postgresql.org/33/
Note that the development of Postgres 14 is over, and that there was a
feature freeze last week, but this can be considered for 15.
--
Michael
A second patch to teach the same units to pg_size_bytes().
Best,
David
On Wed, Apr 14, 2021 at 11:13 AM David Christensen <
david.christensen@crunchydata.com> wrote:
Show quoted text
Hi folks,
Enclosed is a patch that expands the unit output for
pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing
numeric output code to account for the larger number of units we're using
rather than just adding nesting levels.There are also a few other places that could benefit from expanded units,
but this is a standalone starting point.Best,
David
Attachments:
0001-Expand-the-supported-units-in-pg_size_bytes-to-cover.patchapplication/octet-stream; name=0001-Expand-the-supported-units-in-pg_size_bytes-to-cover.patchDownload
From 3bb6c7c388db49c78d1e027f86d77e7f744a1667 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 28 Apr 2021 10:38:37 -0500
Subject: [PATCH] Expand the supported units in pg_size_bytes to cover all
units
---
src/backend/utils/adt/dbsize.c | 62 +++++++++-----
src/include/catalog/pg_proc.dat | 2 +-
src/test/regress/expected/dbsize.out | 117 ++++++++++++++++-----------
src/test/regress/sql/dbsize.sql | 14 ++--
4 files changed, 121 insertions(+), 74 deletions(-)
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 31ed49f8ed..d517a90483 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -760,8 +760,9 @@ pg_size_bytes(PG_FUNCTION_ARGS)
char *cp;
/*
- * Note we might one day support EB units, so if what follows 'E'
- * isn't a number, just treat it all as a unit to be parsed.
+ * If what follows 'e' isn't a number, we just treat it all as a unit
+ * to be parsed; this allows us to support both exponential notation
+ * and EB units.
*/
exponent = strtol(endptr + 1, &cp, 10);
(void) exponent; /* Silence -Wunused-result warnings */
@@ -791,7 +792,8 @@ pg_size_bytes(PG_FUNCTION_ARGS)
/* Handle possible unit */
if (*strptr != '\0')
{
- int64 multiplier = 0;
+ int64 multiplier = 1;
+ int i;
/* Trim any trailing whitespace */
endptr = str + VARSIZE_ANY_EXHDR(arg) - 1;
@@ -803,25 +805,39 @@ pg_size_bytes(PG_FUNCTION_ARGS)
*endptr = '\0';
/* Parse the unit case-insensitively */
- if (pg_strcasecmp(strptr, "bytes") == 0)
- multiplier = (int64) 1;
- else if (pg_strcasecmp(strptr, "kb") == 0)
- multiplier = (int64) 1024;
- else if (pg_strcasecmp(strptr, "mb") == 0)
- multiplier = ((int64) 1024) * 1024;
-
- else if (pg_strcasecmp(strptr, "gb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024;
-
- else if (pg_strcasecmp(strptr, "tb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024 * 1024;
+ const int unit_count = 9; /* sizeof units table */
+ const char *units[unit_count] = {
+ "bytes",
+ "kb",
+ "mb",
+ "gb",
+ "tb",
+ "pb",
+ "eb",
+ "zb",
+ "yb",
+ };
+
+ for (i = 0; i < unit_count; i++) {
+ printf("strptr: %s units: %s", strptr, units[i]);
+ if (pg_strcasecmp(strptr, units[i]) == 0)
+ break;
+ /*
+ * Note: int64 isn't large enough to store the full multiplier
+ * going past ~ 9EB, but since this is a fixed value, we can apply
+ * it twice, thus storing use 2 ** 5 = 32 here, but 2 ** 10 = 1024
+ * on actual conversion to numeric.
+ */
+ multiplier *= 32;
+ }
- else
+ if (i == unit_count)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
errdetail("Invalid size unit: \"%s\".", strptr),
- errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", and \"TB\".")));
+ errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", \"TB\", "
+ "\"PB\", \"EB\", \"ZB\", and \"YB\".")));
if (multiplier > 1)
{
@@ -832,13 +848,19 @@ pg_size_bytes(PG_FUNCTION_ARGS)
num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
NumericGetDatum(mul_num),
NumericGetDatum(num)));
+
+ /* second application to get around int64 limitations in unit multipliers */
+ num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
+ NumericGetDatum(mul_num),
+ NumericGetDatum(num)));
}
}
- result = DatumGetInt64(DirectFunctionCall1(numeric_int8,
- NumericGetDatum(num)));
+ /* now finally truncate, since this is always in integer-like units */
+ num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil,
+ NumericGetDatum(num)));
- PG_RETURN_INT64(result);
+ PG_RETURN_NUMERIC(num);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 91f0ea2212..8be7598a79 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7200,7 +7200,7 @@
prosrc => 'pg_size_pretty_numeric' },
{ oid => '3334',
descr => 'convert a size in human-readable format with size units into bytes',
- proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text',
+ proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text',
prosrc => 'pg_size_bytes' },
{ oid => '2997',
descr => 'disk space usage for the specified table, including TOAST, free space and visibility map',
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index b93d40fb02..8993522904 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -64,53 +64,65 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
- ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
- size | pg_size_bytes
-----------+------------------
- 1 | 1
- 123bytes | 123
- 1kB | 1024
- 1MB | 1048576
- 1 GB | 1073741824
- 1.5 GB | 1610612736
- 1TB | 1099511627776
- 3000 TB | 3298534883328000
- 1e6 MB | 1048576000000
-(9 rows)
+ ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'),
+ ('1.17 YB')) x(size);
+ size | pg_size_bytes
+----------+---------------------------
+ 1 | 1
+ 123bytes | 123
+ 1kB | 1024
+ 1MB | 1048576
+ 1 GB | 1073741824
+ 1.5 GB | 1610612736
+ 1TB | 1099511627776
+ 3000 TB | 3298534883328000
+ 1e6 MB | 1048576000000
+ 99 PB | 111464090777419776
+ 45 EB | 51881467707308113920
+ 5.1ZB | 6021017265658797647463
+ 1.17 YB | 1414443208949116134406226
+(13 rows)
-- case-insensitive units are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
- ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
- size | pg_size_bytes
-----------+------------------
- 1 | 1
- 123bYteS | 123
- 1kb | 1024
- 1mb | 1048576
- 1 Gb | 1073741824
- 1.5 gB | 1610612736
- 1tb | 1099511627776
- 3000 tb | 3298534883328000
- 1e6 mb | 1048576000000
-(9 rows)
+ ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'),
+ ('1.17 yb')) x(size);
+ size | pg_size_bytes
+----------+---------------------------
+ 1 | 1
+ 123bYteS | 123
+ 1kb | 1024
+ 1mb | 1048576
+ 1 Gb | 1073741824
+ 1.5 gB | 1610612736
+ 1tb | 1099511627776
+ 3000 tb | 3298534883328000
+ 1e6 mb | 1048576000000
+ 99 pb | 111464090777419776
+ 45 eB | 51881467707308113920
+ 5.1Zb | 6021017265658797647463
+ 1.17 yb | 1414443208949116134406226
+(13 rows)
-- negative numbers are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
- ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
- size | pg_size_bytes
------------+-------------------
- -1 | -1
- -123bytes | -123
- -1kb | -1024
- -1mb | -1048576
- -1 Gb | -1073741824
- -1.5 gB | -1610612736
- -1tb | -1099511627776
- -3000 TB | -3298534883328000
- -10e-1 MB | -1048576
-(9 rows)
+ ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size);
+ size | pg_size_bytes
+-----------+-----------------------------
+ -1 | -1
+ -123bytes | -123
+ -1kb | -1024
+ -1mb | -1048576
+ -1 Gb | -1073741824
+ -1.5 gB | -1610612736
+ -1tb | -1099511627776
+ -3000 TB | -3298534883328000
+ -10e-1 MB | -1048576
+ -19e-4eb | -2190550858753009
+ -18YB | -21760664753063325144711168
+(11 rows)
-- different cases with allowed points
SELECT size, pg_size_bytes(size) FROM
@@ -128,29 +140,38 @@ SELECT size, pg_size_bytes(size) FROM
-.0 gb | 0
(8 rows)
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+ pg_size_bytes
+---------------------
+ 9223372036854775808
+(1 row)
+
+SELECT pg_size_bytes('1e100');
+ pg_size_bytes
+-------------------------------------------------------------------------------------------------------
+ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+(1 row)
+
-- invalid inputs
SELECT pg_size_bytes('1 AB');
ERROR: invalid size: "1 AB"
DETAIL: Invalid size unit: "AB".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1 AB A');
ERROR: invalid size: "1 AB A"
DETAIL: Invalid size unit: "AB A".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1 AB A ');
ERROR: invalid size: "1 AB A "
DETAIL: Invalid size unit: "AB A".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
-SELECT pg_size_bytes('9223372036854775807.9');
-ERROR: bigint out of range
-SELECT pg_size_bytes('1e100');
-ERROR: bigint out of range
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1e1000000000000000000');
ERROR: value overflows numeric format
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
ERROR: invalid size: "1 byte"
DETAIL: Invalid size unit: "byte".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('');
ERROR: invalid size: ""
SELECT pg_size_bytes('kb');
@@ -168,6 +189,6 @@ ERROR: invalid size: ".+912"
SELECT pg_size_bytes('+912+ kB');
ERROR: invalid size: "+912+ kB"
DETAIL: Invalid size unit: "+ kB".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('++123 kB');
ERROR: invalid size: "++123 kB"
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index a97dacbf7b..2a4abcfc65 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -30,29 +30,33 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
- ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
+ ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'),
+ ('1.17 YB')) x(size);
-- case-insensitive units are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
- ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
+ ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'),
+ ('1.17 yb')) x(size);
-- negative numbers are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
- ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
+ ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size);
-- different cases with allowed points
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'),
('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size);
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+SELECT pg_size_bytes('1e100');
+
-- invalid inputs
SELECT pg_size_bytes('1 AB');
SELECT pg_size_bytes('1 AB A');
SELECT pg_size_bytes('1 AB A ');
-SELECT pg_size_bytes('9223372036854775807.9');
-SELECT pg_size_bytes('1e100');
SELECT pg_size_bytes('1e1000000000000000000');
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
SELECT pg_size_bytes('');
--
2.30.1 (Apple Git-130)
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested
Hi David,
I was reviewing this patch and the compilation failed with following error on CentOS 7.
dbsize.c: In function ‘pg_size_bytes’:
dbsize.c:808:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
const int unit_count = 9; /* sizeof units table */
^
dbsize.c:809:3: error: variable length array ‘units’ is used [-Werror=vla]
const char *units[unit_count] = {
^
I believe "unit_count" ought to be a #define here.
Regards,
Asif
The new status of this patch is: Waiting on Author
New versions attached to address the initial CF feedback and rebase on HEAD
as of now.
0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch
- expands the units that pg_size_pretty() can handle up to YB.
0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch
- expands the units that pg_size_bytes() can handle, up to YB.
Attachments:
0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patchapplication/octet-stream; name=0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patchDownload
From 2efd906f8f4ae21d76dfa1c9e2e37468dec68560 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 28 Apr 2021 10:38:37 -0500
Subject: [PATCH 2/2] Expand the supported units in pg_size_bytes to cover all
units
---
src/backend/utils/adt/dbsize.c | 63 ++++++++++-----
src/include/catalog/pg_proc.dat | 2 +-
src/test/regress/expected/dbsize.out | 117 ++++++++++++++++-----------
src/test/regress/sql/dbsize.sql | 14 ++--
4 files changed, 120 insertions(+), 76 deletions(-)
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index f25ec5c1af..ffa6488da8 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -708,7 +708,6 @@ pg_size_bytes(PG_FUNCTION_ARGS)
*endptr;
char saved_char;
Numeric num;
- int64 result;
bool have_digits = false;
str = text_to_cstring(arg);
@@ -760,8 +759,9 @@ pg_size_bytes(PG_FUNCTION_ARGS)
char *cp;
/*
- * Note we might one day support EB units, so if what follows 'E'
- * isn't a number, just treat it all as a unit to be parsed.
+ * If what follows 'e' isn't a number, we just treat it all as a unit
+ * to be parsed; this allows us to support both exponential notation
+ * and EB units.
*/
exponent = strtol(endptr + 1, &cp, 10);
(void) exponent; /* Silence -Wunused-result warnings */
@@ -791,7 +791,20 @@ pg_size_bytes(PG_FUNCTION_ARGS)
/* Handle possible unit */
if (*strptr != '\0')
{
- int64 multiplier = 0;
+ int64 multiplier = 1;
+ int i;
+ int unit_count = 9; /* sizeof units table */
+ char *units[] = {
+ "bytes",
+ "kb",
+ "mb",
+ "gb",
+ "tb",
+ "pb",
+ "eb",
+ "zb",
+ "yb",
+ };
/* Trim any trailing whitespace */
endptr = str + VARSIZE_ANY_EXHDR(arg) - 1;
@@ -802,26 +815,26 @@ pg_size_bytes(PG_FUNCTION_ARGS)
endptr++;
*endptr = '\0';
- /* Parse the unit case-insensitively */
- if (pg_strcasecmp(strptr, "bytes") == 0)
- multiplier = (int64) 1;
- else if (pg_strcasecmp(strptr, "kb") == 0)
- multiplier = (int64) 1024;
- else if (pg_strcasecmp(strptr, "mb") == 0)
- multiplier = ((int64) 1024) * 1024;
-
- else if (pg_strcasecmp(strptr, "gb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024;
-
- else if (pg_strcasecmp(strptr, "tb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024 * 1024;
+ for (i = 0; i < unit_count; i++) {
+ printf("strptr: %s units: %s", strptr, units[i]);
+ if (pg_strcasecmp(strptr, units[i]) == 0)
+ break;
+ /*
+ * Note: int64 isn't large enough to store the full multiplier
+ * going past ~ 9EB, but since this is a fixed value, we can apply
+ * it twice, thus storing use 2 ** 5 = 32 here, but 2 ** 10 = 1024
+ * on actual conversion to numeric.
+ */
+ multiplier *= 32;
+ }
- else
+ if (i == unit_count)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
errdetail("Invalid size unit: \"%s\".", strptr),
- errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", and \"TB\".")));
+ errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", \"TB\", "
+ "\"PB\", \"EB\", \"ZB\", and \"YB\".")));
if (multiplier > 1)
{
@@ -832,13 +845,19 @@ pg_size_bytes(PG_FUNCTION_ARGS)
num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
NumericGetDatum(mul_num),
NumericGetDatum(num)));
+
+ /* second application to get around int64 limitations in unit multipliers */
+ num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
+ NumericGetDatum(mul_num),
+ NumericGetDatum(num)));
}
}
- result = DatumGetInt64(DirectFunctionCall1(numeric_int8,
- NumericGetDatum(num)));
+ /* now finally truncate, since this is always in integer-like units */
+ num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil,
+ NumericGetDatum(num)));
- PG_RETURN_INT64(result);
+ PG_RETURN_NUMERIC(num);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index acbcae4607..628dcd69a8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7192,7 +7192,7 @@
prosrc => 'pg_size_pretty_numeric' },
{ oid => '3334',
descr => 'convert a size in human-readable format with size units into bytes',
- proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text',
+ proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text',
prosrc => 'pg_size_bytes' },
{ oid => '2997',
descr => 'disk space usage for the specified table, including TOAST, free space and visibility map',
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index b93d40fb02..8993522904 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -64,53 +64,65 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
- ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
- size | pg_size_bytes
-----------+------------------
- 1 | 1
- 123bytes | 123
- 1kB | 1024
- 1MB | 1048576
- 1 GB | 1073741824
- 1.5 GB | 1610612736
- 1TB | 1099511627776
- 3000 TB | 3298534883328000
- 1e6 MB | 1048576000000
-(9 rows)
+ ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'),
+ ('1.17 YB')) x(size);
+ size | pg_size_bytes
+----------+---------------------------
+ 1 | 1
+ 123bytes | 123
+ 1kB | 1024
+ 1MB | 1048576
+ 1 GB | 1073741824
+ 1.5 GB | 1610612736
+ 1TB | 1099511627776
+ 3000 TB | 3298534883328000
+ 1e6 MB | 1048576000000
+ 99 PB | 111464090777419776
+ 45 EB | 51881467707308113920
+ 5.1ZB | 6021017265658797647463
+ 1.17 YB | 1414443208949116134406226
+(13 rows)
-- case-insensitive units are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
- ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
- size | pg_size_bytes
-----------+------------------
- 1 | 1
- 123bYteS | 123
- 1kb | 1024
- 1mb | 1048576
- 1 Gb | 1073741824
- 1.5 gB | 1610612736
- 1tb | 1099511627776
- 3000 tb | 3298534883328000
- 1e6 mb | 1048576000000
-(9 rows)
+ ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'),
+ ('1.17 yb')) x(size);
+ size | pg_size_bytes
+----------+---------------------------
+ 1 | 1
+ 123bYteS | 123
+ 1kb | 1024
+ 1mb | 1048576
+ 1 Gb | 1073741824
+ 1.5 gB | 1610612736
+ 1tb | 1099511627776
+ 3000 tb | 3298534883328000
+ 1e6 mb | 1048576000000
+ 99 pb | 111464090777419776
+ 45 eB | 51881467707308113920
+ 5.1Zb | 6021017265658797647463
+ 1.17 yb | 1414443208949116134406226
+(13 rows)
-- negative numbers are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
- ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
- size | pg_size_bytes
------------+-------------------
- -1 | -1
- -123bytes | -123
- -1kb | -1024
- -1mb | -1048576
- -1 Gb | -1073741824
- -1.5 gB | -1610612736
- -1tb | -1099511627776
- -3000 TB | -3298534883328000
- -10e-1 MB | -1048576
-(9 rows)
+ ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size);
+ size | pg_size_bytes
+-----------+-----------------------------
+ -1 | -1
+ -123bytes | -123
+ -1kb | -1024
+ -1mb | -1048576
+ -1 Gb | -1073741824
+ -1.5 gB | -1610612736
+ -1tb | -1099511627776
+ -3000 TB | -3298534883328000
+ -10e-1 MB | -1048576
+ -19e-4eb | -2190550858753009
+ -18YB | -21760664753063325144711168
+(11 rows)
-- different cases with allowed points
SELECT size, pg_size_bytes(size) FROM
@@ -128,29 +140,38 @@ SELECT size, pg_size_bytes(size) FROM
-.0 gb | 0
(8 rows)
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+ pg_size_bytes
+---------------------
+ 9223372036854775808
+(1 row)
+
+SELECT pg_size_bytes('1e100');
+ pg_size_bytes
+-------------------------------------------------------------------------------------------------------
+ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+(1 row)
+
-- invalid inputs
SELECT pg_size_bytes('1 AB');
ERROR: invalid size: "1 AB"
DETAIL: Invalid size unit: "AB".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1 AB A');
ERROR: invalid size: "1 AB A"
DETAIL: Invalid size unit: "AB A".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1 AB A ');
ERROR: invalid size: "1 AB A "
DETAIL: Invalid size unit: "AB A".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
-SELECT pg_size_bytes('9223372036854775807.9');
-ERROR: bigint out of range
-SELECT pg_size_bytes('1e100');
-ERROR: bigint out of range
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1e1000000000000000000');
ERROR: value overflows numeric format
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
ERROR: invalid size: "1 byte"
DETAIL: Invalid size unit: "byte".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('');
ERROR: invalid size: ""
SELECT pg_size_bytes('kb');
@@ -168,6 +189,6 @@ ERROR: invalid size: ".+912"
SELECT pg_size_bytes('+912+ kB');
ERROR: invalid size: "+912+ kB"
DETAIL: Invalid size unit: "+ kB".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('++123 kB');
ERROR: invalid size: "++123 kB"
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index a97dacbf7b..2a4abcfc65 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -30,29 +30,33 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
- ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
+ ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'),
+ ('1.17 YB')) x(size);
-- case-insensitive units are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
- ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
+ ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'),
+ ('1.17 yb')) x(size);
-- negative numbers are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
- ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
+ ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size);
-- different cases with allowed points
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'),
('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size);
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+SELECT pg_size_bytes('1e100');
+
-- invalid inputs
SELECT pg_size_bytes('1 AB');
SELECT pg_size_bytes('1 AB A');
SELECT pg_size_bytes('1 AB A ');
-SELECT pg_size_bytes('9223372036854775807.9');
-SELECT pg_size_bytes('1e100');
SELECT pg_size_bytes('1e1000000000000000000');
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
SELECT pg_size_bytes('');
--
2.30.1 (Apple Git-130)
0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patchapplication/octet-stream; name=0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patchDownload
From e1d250ba19b13fb6858125b68adbbd6a78ed4393 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Fri, 2 Apr 2021 14:35:31 -0500
Subject: [PATCH 1/2] Expand the units that pg_size_pretty(numeric) knows about
Rework the logic slightly to allow a lookup table-based approach for
displaying units. Add units up to yottabytes, with easy expansion if
this ever needs to grow.
---
doc/src/sgml/func.sgml | 2 +-
src/backend/utils/adt/dbsize.c | 43 ++++++++++--------
src/test/regress/expected/dbsize.out | 67 +++++++++++++++++++---------
src/test/regress/sql/dbsize.sql | 27 ++++++++---
4 files changed, 94 insertions(+), 45 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 08b07f561e..f6739f150c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26371,7 +26371,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
</para>
<para>
Converts a size in bytes into a more easily human-readable format with
- size units (bytes, kB, MB, GB or TB as appropriate). Note that the
+ size units (bytes, kB, MB, GB, TB, PB, EB, ZB, or YB as appropriate). Note that the
units are powers of 2 rather than powers of 10, so 1kB is 1024 bytes,
1MB is 1024<superscript>2</superscript> = 1048576 bytes, and so on.
</para></entry>
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 3c70bb5943..f25ec5c1af 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -638,7 +638,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
Numeric size = PG_GETARG_NUMERIC(0);
Numeric limit,
limit2;
- char *result;
+ char *result = NULL;
limit = int64_to_numeric(10 * 1024);
limit2 = int64_to_numeric(10 * 1024 * 2 - 1);
@@ -660,31 +660,36 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
}
else
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s MB", numeric_to_cstring(size));
- }
- else
- {
+ int idx, max_iter = 6; /* highest index of table_below */
+ char *output_formats[] = {
+ "%s MB",
+ "%s GB",
+ "%s TB",
+ "%s PB",
+ "%s EB",
+ "%s ZB",
+ "%s YB"
+ };
+
+ for (idx = 0; idx < max_iter; idx++) {
/* size >>= 10 */
size = numeric_shift_right(size, 10);
-
if (numeric_is_less(numeric_absolute(size), limit2))
{
size = numeric_half_rounded(size);
- result = psprintf("%s GB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- size = numeric_half_rounded(size);
- result = psprintf("%s TB", numeric_to_cstring(size));
+ result = psprintf(output_formats[idx], numeric_to_cstring(size));
+ break;
}
}
+
+ if (!result) {
+ /* this uses the last format in the table above for anything else */
+
+ /* size >>= 10 */
+ size = numeric_shift_right(size, 10);
+ size = numeric_half_rounded(size);
+ result = psprintf(output_formats[max_iter], numeric_to_cstring(size));
+ }
}
}
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index e901a2c92a..b93d40fb02 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -13,27 +13,54 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(6 rows)
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
- (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
- (1000000000::numeric), (1000000000000::numeric),
+ (VALUES (10::numeric),
+ (1000::numeric),
+ (1000000::numeric),
+ (1000000000::numeric),
+ (1000000000000::numeric),
(1000000000000000::numeric),
- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
- (1000000000.5::numeric), (1000000000000.5::numeric),
- (1000000000000000.5::numeric)) x(size);
- size | pg_size_pretty | pg_size_pretty
---------------------+----------------+----------------
- 10 | 10 bytes | -10 bytes
- 1000 | 1000 bytes | -1000 bytes
- 1000000 | 977 kB | -977 kB
- 1000000000 | 954 MB | -954 MB
- 1000000000000 | 931 GB | -931 GB
- 1000000000000000 | 909 TB | -909 TB
- 10.5 | 10.5 bytes | -10.5 bytes
- 1000.5 | 1000.5 bytes | -1000.5 bytes
- 1000000.5 | 977 kB | -977 kB
- 1000000000.5 | 954 MB | -954 MB
- 1000000000000.5 | 931 GB | -931 GB
- 1000000000000000.5 | 909 TB | -909 TB
-(12 rows)
+ (1000000000000000000::numeric),
+ (1000000000000000000000::numeric),
+ (1000000000000000000000000::numeric),
+ (1000000000000000000000000000::numeric),
+ (1000000000000000000000000000000::numeric),
+ (10.5::numeric),
+ (1000.5::numeric),
+ (1000000.5::numeric),
+ (1000000000.5::numeric),
+ (1000000000000.5::numeric),
+ (1000000000000000.5::numeric),
+ (1000000000000000000.5::numeric),
+ (1000000000000000000000.5::numeric),
+ (1000000000000000000000000.5::numeric),
+ (1000000000000000000000000000.5::numeric),
+ (1000000000000000000000000000000.5::numeric)
+ ) x(size);
+ size | pg_size_pretty | pg_size_pretty
+-----------------------------------+----------------+----------------
+ 10 | 10 bytes | -10 bytes
+ 1000 | 1000 bytes | -1000 bytes
+ 1000000 | 977 kB | -977 kB
+ 1000000000 | 954 MB | -954 MB
+ 1000000000000 | 931 GB | -931 GB
+ 1000000000000000 | 909 TB | -909 TB
+ 1000000000000000000 | 888 PB | -888 PB
+ 1000000000000000000000 | 867 EB | -867 EB
+ 1000000000000000000000000 | 847 ZB | -847 ZB
+ 1000000000000000000000000000 | 827 YB | -827 YB
+ 1000000000000000000000000000000 | 827181 YB | -827181 YB
+ 10.5 | 10.5 bytes | -10.5 bytes
+ 1000.5 | 1000.5 bytes | -1000.5 bytes
+ 1000000.5 | 977 kB | -977 kB
+ 1000000000.5 | 954 MB | -954 MB
+ 1000000000000.5 | 931 GB | -931 GB
+ 1000000000000000.5 | 909 TB | -909 TB
+ 1000000000000000000.5 | 888 PB | -888 PB
+ 1000000000000000000000.5 | 867 EB | -867 EB
+ 1000000000000000000000000.5 | 847 ZB | -847 ZB
+ 1000000000000000000000000000.5 | 827 YB | -827 YB
+ 1000000000000000000000000000000.5 | 827181 YB | -827181 YB
+(22 rows)
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index d10a4d7f68..a97dacbf7b 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -4,12 +4,29 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(1000000000000000::bigint)) x(size);
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
- (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
- (1000000000::numeric), (1000000000000::numeric),
+ (VALUES (10::numeric),
+ (1000::numeric),
+ (1000000::numeric),
+ (1000000000::numeric),
+ (1000000000000::numeric),
(1000000000000000::numeric),
- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
- (1000000000.5::numeric), (1000000000000.5::numeric),
- (1000000000000000.5::numeric)) x(size);
+ (1000000000000000000::numeric),
+ (1000000000000000000000::numeric),
+ (1000000000000000000000000::numeric),
+ (1000000000000000000000000000::numeric),
+ (1000000000000000000000000000000::numeric),
+ (10.5::numeric),
+ (1000.5::numeric),
+ (1000000.5::numeric),
+ (1000000000.5::numeric),
+ (1000000000000.5::numeric),
+ (1000000000000000.5::numeric),
+ (1000000000000000000.5::numeric),
+ (1000000000000000000000.5::numeric),
+ (1000000000000000000000000.5::numeric),
+ (1000000000000000000000000000.5::numeric),
+ (1000000000000000000000000000000.5::numeric)
+ ) x(size);
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
--
2.30.1 (Apple Git-130)
From: David Christensen <david.christensen@crunchydata.com>
Sent: Friday, June 4, 2021 4:18 AM
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [PATCH] expand the units that pg_size_pretty supports on outputNew versions attached to address the initial CF feedback and rebase on HEAD as of now.
0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch
- expands the units that pg_size_pretty() can handle up to YB.
0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch
- expands the units that pg_size_bytes() can handle, up to YB.
I don't see the need to extend the unit to YB.
What use case do you have in mind?
Regards,
Shinya Kato
I don't see the need to extend the unit to YB.
What use case do you have in mind?
Practical or no, I saw no reason not to support all defined units. I assume we’ll get to a need sooner or later. :)
David
I don't see the need to extend the unit to YB.
What use case do you have in mind?Practical or no, I saw no reason not to support all defined units. I assume we’ll
get to a need sooner or later. :)
Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?
Best regards,
Shinya Kato
On Tue, 15 Jun 2021 at 21:24, <Shinya11.Kato@nttdata.com> wrote:
Hmmm, I didn't think YB was necessary, but what do others think?
For me personally, without consulting Wikipedia, I know that Petabyte
comes after Terabyte and then I'm pretty sure it's Exabyte. After
that, I'd need to check.
Assuming I'm not the only person who can't tell exactly how many bytes
are in a Yottabyte, would it actually be a readability improvement if
we started showing these units to people?
I'd say there might be some argument to implement as far as PB one
day, maybe not that far out into the future, especially if we got
something like built-in clustering. But I just don't think there's any
need to go all out and take it all the way to YB. There's an above
zero chance we'll break something of someones by doing this, so I
think any changes here should be driven off an actual requirement.
I really think this change is more likely to upset someone than please someone.
Just my thoughts.
David
On Tue, 15 Jun 2021 at 05:24, <Shinya11.Kato@nttdata.com> wrote:
I don't see the need to extend the unit to YB.
What use case do you have in mind?Practical or no, I saw no reason not to support all defined units. I
assume we’ll
get to a need sooner or later. :)
Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?
If I’m reading the code correctly, the difference between supporting YB and
not supporting it is whether there is a line for it in the list of prefixes
and their multiples. As such, I don’t see why we’re even discussing whether
or not to include all the standard prefixes. A YB is still an absurd amount
of storage, but that’s not the point; just put all the standard prefixes
and be done with it. If actual code changes were required in the new code
as they are in the old it might be worth discussing.
One question: why is there no “k” in the list of prefixes?
Also: why not have only the prefixes in the array, and use a single fixed
output format "%s %sB" all the time?
It feels like it should be possible to calculate the appropriate idx to use
(while adjusting the number to print as is done now) and then just have one
psprintf call for all cases.
A more significant question is YB vs. YiB. I know there is a long tradition
within computer-related fields of saying that k = 1024, M = 1024^2, etc.,
but we’re not special enough to override the more general principles of SI
(Système International) which provide that k = 1000, M = 1000^2 and so on
universally and provide the alternate prefixes ki, Mi, etc. which use 1024
as the multiple.
So I would suggest either display 2000000 as 2MB or as 1.907MiB.
On Tue, Jun 15, 2021 at 8:31 AM Isaac Morland <isaac.morland@gmail.com>
wrote:
On Tue, 15 Jun 2021 at 05:24, <Shinya11.Kato@nttdata.com> wrote:
I don't see the need to extend the unit to YB.
What use case do you have in mind?Practical or no, I saw no reason not to support all defined units. I
assume we’ll
get to a need sooner or later. :)
Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?If I’m reading the code correctly, the difference between supporting YB
and not supporting it is whether there is a line for it in the list of
prefixes and their multiples. As such, I don’t see why we’re even
discussing whether or not to include all the standard prefixes. A YB is
still an absurd amount of storage, but that’s not the point; just put all
the standard prefixes and be done with it. If actual code changes were
required in the new code as they are in the old it might be worth
discussing.
Agreed, this is why I went this way. One and done.
One question: why is there no “k” in the list of prefixes?
kB has a special-case code block before you get to this point. I didn't
look into the reasons, but assume there are some.
Also: why not have only the prefixes in the array, and use a single fixed
output format "%s %sB" all the time?It feels like it should be possible to calculate the appropriate idx to
use (while adjusting the number to print as is done now) and then just have
one psprintf call for all cases.
Sure, if that seems more readable/understandable.
A more significant question is YB vs. YiB. I know there is a long
tradition within computer-related fields of saying that k = 1024, M =
1024^2, etc., but we’re not special enough to override the more general
principles of SI (Système International) which provide that k = 1000, M =
1000^2 and so on universally and provide the alternate prefixes ki, Mi,
etc. which use 1024 as the multiple.So I would suggest either display 2000000 as 2MB or as 1.907MiB.
Heh, I was just expanding the existing logic; if others want to have this
particular battle go ahead and I'll adjust the code/prefixes, but obviously
the logic will need to change if we want to support true MB instead of MiB
as MB.
Also, this will presumably be a breaking change for anyone using the
existing units MB == 1024 * 1024, as we've had for something like 20
years. Changing these units to the *iB will be trivial with this patch,
but not looking forward to garnering the consensus to change this part.
David
On Tue, Jun 15, 2021 at 8:26 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 15 Jun 2021 at 21:24, <Shinya11.Kato@nttdata.com> wrote:
Hmmm, I didn't think YB was necessary, but what do others think?
For me personally, without consulting Wikipedia, I know that Petabyte
comes after Terabyte and then I'm pretty sure it's Exabyte. After
that, I'd need to check.Assuming I'm not the only person who can't tell exactly how many bytes
are in a Yottabyte, would it actually be a readability improvement if
we started showing these units to people?
I hadn't really thought about that TBH; to me it seemed like an
improvement, but I do see that others might not, and adding confusion is
definitely not helpful. That said, it seems like having the code
structured in a way that we can expand via adding an element to a table
instead of the existing way it's written with nested if blocks is still a
useful refactor, whatever we decide the cutoff units should be.
I'd say there might be some argument to implement as far as PB one
day, maybe not that far out into the future, especially if we got
something like built-in clustering. But I just don't think there's any
need to go all out and take it all the way to YB. There's an above
zero chance we'll break something of someones by doing this, so I
think any changes here should be driven off an actual requirement.
I got motivated to do this due to some (granted synthetic) work/workloads,
where I was seeing 6+digit TB numbers and thought it was ugly. Looked at
the code and thought the refactor was the way to go, and just stuck all of
the known units in.
I really think this change is more likely to upset someone than please
someone.
I'd be interested to see reactions from people; to me, it seems a +1, but
seems like -1, 0, +1 all valid opinions here; I'd expect more 0's and +1s,
but I'm probably biased since I wrote this. :-)
On Wed, 16 Jun 2021 at 02:58, David Christensen
<david.christensen@crunchydata.com> wrote:
That said, it seems like having the code structured in a way that we can expand via adding an element to a table instead of the existing way it's written with nested if blocks is still a useful refactor, whatever we decide the cutoff units should be.
I had not really looked at the patch, but if there's a cleanup portion
to the same patch as you're adding the YB too, then maybe it's worth
separating those out into another patch so that the two can be
considered independently.
David
I had not really looked at the patch, but if there's a cleanup portion to the same
patch as you're adding the YB too, then maybe it's worth separating those out
into another patch so that the two can be considered independently.
I agree with this opinion. It seems to me that we should think about units and refactoring separately.
Sorry for the confusion.
Best regards,
Shinya Kato
I had not really looked at the patch, but if there's a cleanup portion to the same
patch as you're adding the YB too, then maybe it's worth separating those out
into another patch so that the two can be considered independently.I agree with this opinion. It seems to me that we should think about units and refactoring separately.
Sorry for the confusion.
Sure thing, I think that makes sense. Refactor with existing units and debate the number of additions units to include. I do think Petabytes and Exabytes are at least within the realm of ones we should include; less tied to ZB and YB; just included for completeness.
Shinya11.Kato@nttdata.com writes:
I had not really looked at the patch, but if there's a cleanup portion to the same
patch as you're adding the YB too, then maybe it's worth separating those out
into another patch so that the two can be considered independently.I agree with this opinion. It seems to me that we should think about units and refactoring separately.
Sorry for the confusion.Best regards,
Shinya Kato
Hi folks,
Had some time to rework this patch from the two that had previously been
here into two separate parts:
1) A basic refactor of the existing code to easily handle expanding the
units we use into a table-based format. This also includes changing the
return value of `pg_size_bytes()` from an int64 into a numeric, and
minor test adjustments to reflect this.
2) Expanding the units that both pg_size_bytes() and pg_size_pretty()
recognize up through Yottabytes. This includes documentation and test
updates to reflect the changes made here. How many additional units we
add here is up for discussion (inevitably), but my opinion remains that
there is no harm in supporting all units available.
Best,
David
Attachments:
0001-Refactor-pg_size_pretty-and-pg_size_bytes-to-allow-f.patchtext/x-patchDownload
From ac30b06e3ddcb57eebb380560c2f4a47430dfd74 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Tue, 29 Jun 2021 10:20:05 -0500
Subject: [PATCH 1/2] Refactor pg_size_pretty and pg_size_bytes to allow for
supported unit expansion
---
src/backend/utils/adt/dbsize.c | 90 ++++++++++++++++------------
src/include/catalog/pg_proc.dat | 2 +-
src/test/regress/expected/dbsize.out | 4 --
src/test/regress/sql/dbsize.sql | 2 -
4 files changed, 53 insertions(+), 45 deletions(-)
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 9c39e7d3b3..df08845932 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -638,7 +638,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
Numeric size = PG_GETARG_NUMERIC(0);
Numeric limit,
limit2;
- char *result;
+ char *result = NULL;
limit = int64_to_numeric(10 * 1024);
limit2 = int64_to_numeric(10 * 1024 * 2 - 1);
@@ -660,31 +660,32 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
}
else
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s MB", numeric_to_cstring(size));
- }
- else
- {
+ int idx, max_iter = 2; /* highest index of table_below */
+ char *output_formats[] = {
+ "%s MB",
+ "%s GB",
+ "%s TB"
+ };
+
+ for (idx = 0; idx < max_iter; idx++) {
/* size >>= 10 */
size = numeric_shift_right(size, 10);
-
if (numeric_is_less(numeric_absolute(size), limit2))
{
size = numeric_half_rounded(size);
- result = psprintf("%s GB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- size = numeric_half_rounded(size);
- result = psprintf("%s TB", numeric_to_cstring(size));
+ result = psprintf(output_formats[idx], numeric_to_cstring(size));
+ break;
}
}
+
+ if (!result) {
+ /* this uses the last format in the table above for anything else */
+
+ /* size >>= 10 */
+ size = numeric_shift_right(size, 10);
+ size = numeric_half_rounded(size);
+ result = psprintf(output_formats[max_iter], numeric_to_cstring(size));
+ }
}
}
@@ -703,7 +704,6 @@ pg_size_bytes(PG_FUNCTION_ARGS)
*endptr;
char saved_char;
Numeric num;
- int64 result;
bool have_digits = false;
str = text_to_cstring(arg);
@@ -786,7 +786,16 @@ pg_size_bytes(PG_FUNCTION_ARGS)
/* Handle possible unit */
if (*strptr != '\0')
{
- int64 multiplier = 0;
+ int64 multiplier = 1;
+ int i;
+ int unit_count = 5; /* sizeof units table */
+ char *units[] = {
+ "bytes",
+ "kb",
+ "mb",
+ "gb",
+ "tb",
+ };
/* Trim any trailing whitespace */
endptr = str + VARSIZE_ANY_EXHDR(arg) - 1;
@@ -797,21 +806,20 @@ pg_size_bytes(PG_FUNCTION_ARGS)
endptr++;
*endptr = '\0';
- /* Parse the unit case-insensitively */
- if (pg_strcasecmp(strptr, "bytes") == 0)
- multiplier = (int64) 1;
- else if (pg_strcasecmp(strptr, "kb") == 0)
- multiplier = (int64) 1024;
- else if (pg_strcasecmp(strptr, "mb") == 0)
- multiplier = ((int64) 1024) * 1024;
-
- else if (pg_strcasecmp(strptr, "gb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024;
-
- else if (pg_strcasecmp(strptr, "tb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024 * 1024;
+ for (i = 0; i < unit_count; i++) {
+ printf("strptr: %s units: %s", strptr, units[i]);
+ if (pg_strcasecmp(strptr, units[i]) == 0)
+ break;
+ /*
+ * Note: int64 isn't large enough to store the full multiplier
+ * going past ~ 9EB, but since this is a fixed value, we can apply
+ * it twice, thus storing use 2 ** 5 = 32 here, but 2 ** 10 = 1024
+ * on actual conversion to numeric.
+ */
+ multiplier *= 32;
+ }
- else
+ if (i == unit_count)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
@@ -827,13 +835,19 @@ pg_size_bytes(PG_FUNCTION_ARGS)
num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
NumericGetDatum(mul_num),
NumericGetDatum(num)));
+
+ /* second application to get around int64 limitations in unit multipliers */
+ num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
+ NumericGetDatum(mul_num),
+ NumericGetDatum(num)));
}
}
- result = DatumGetInt64(DirectFunctionCall1(numeric_int8,
- NumericGetDatum(num)));
+ /* now finally truncate, since this is always in integer-like units */
+ num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil,
+ NumericGetDatum(num)));
- PG_RETURN_INT64(result);
+ PG_RETURN_NUMERIC(num);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fde251fa4f..73326e3618 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7187,7 +7187,7 @@
prosrc => 'pg_size_pretty_numeric' },
{ oid => '3334',
descr => 'convert a size in human-readable format with size units into bytes',
- proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text',
+ proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text',
prosrc => 'pg_size_bytes' },
{ oid => '2997',
descr => 'disk space usage for the specified table, including TOAST, free space and visibility map',
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index e901a2c92a..624948ccd2 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -114,10 +114,6 @@ SELECT pg_size_bytes('1 AB A ');
ERROR: invalid size: "1 AB A "
DETAIL: Invalid size unit: "AB A".
HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
-SELECT pg_size_bytes('9223372036854775807.9');
-ERROR: bigint out of range
-SELECT pg_size_bytes('1e100');
-ERROR: bigint out of range
SELECT pg_size_bytes('1e1000000000000000000');
ERROR: value overflows numeric format
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index d10a4d7f68..37023a01c3 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -34,8 +34,6 @@ SELECT size, pg_size_bytes(size) FROM
SELECT pg_size_bytes('1 AB');
SELECT pg_size_bytes('1 AB A');
SELECT pg_size_bytes('1 AB A ');
-SELECT pg_size_bytes('9223372036854775807.9');
-SELECT pg_size_bytes('1e100');
SELECT pg_size_bytes('1e1000000000000000000');
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
SELECT pg_size_bytes('');
--
2.30.1 (Apple Git-130)
0002-Full-expansion-of-all-units.patchtext/x-patchDownload
From e3fc63d9be6af5725a0ed24e16513cb711631f4a Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Tue, 29 Jun 2021 10:02:50 -0500
Subject: [PATCH 2/2] Full expansion of all units
---
doc/src/sgml/func.sgml | 2 +-
src/backend/utils/adt/dbsize.c | 22 +++-
src/test/regress/expected/dbsize.out | 180 +++++++++++++++++----------
src/test/regress/sql/dbsize.sql | 39 ++++--
4 files changed, 164 insertions(+), 79 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6388385edc..687d2a7ac8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26354,7 +26354,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
</para>
<para>
Converts a size in bytes into a more easily human-readable format with
- size units (bytes, kB, MB, GB or TB as appropriate). Note that the
+ size units (bytes, kB, MB, GB, TB, PB, EB, ZB, or YB as appropriate). Note that the
units are powers of 2 rather than powers of 10, so 1kB is 1024 bytes,
1MB is 1024<superscript>2</superscript> = 1048576 bytes, and so on.
</para></entry>
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index df08845932..5650bb9f2b 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -660,11 +660,15 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
}
else
{
- int idx, max_iter = 2; /* highest index of table_below */
+ int idx, max_iter = 6; /* highest index of table_below */
char *output_formats[] = {
"%s MB",
"%s GB",
- "%s TB"
+ "%s TB",
+ "%s PB",
+ "%s EB",
+ "%s ZB",
+ "%s YB"
};
for (idx = 0; idx < max_iter; idx++) {
@@ -755,8 +759,9 @@ pg_size_bytes(PG_FUNCTION_ARGS)
char *cp;
/*
- * Note we might one day support EB units, so if what follows 'E'
- * isn't a number, just treat it all as a unit to be parsed.
+ * If what follows 'e' isn't a number, we just treat it all as a unit
+ * to be parsed; this allows us to support both exponential notation
+ * and EB units.
*/
exponent = strtol(endptr + 1, &cp, 10);
(void) exponent; /* Silence -Wunused-result warnings */
@@ -788,13 +793,17 @@ pg_size_bytes(PG_FUNCTION_ARGS)
{
int64 multiplier = 1;
int i;
- int unit_count = 5; /* sizeof units table */
+ int unit_count = 9; /* sizeof units table */
char *units[] = {
"bytes",
"kb",
"mb",
"gb",
"tb",
+ "pb",
+ "eb",
+ "zb",
+ "yb",
};
/* Trim any trailing whitespace */
@@ -824,7 +833,8 @@ pg_size_bytes(PG_FUNCTION_ARGS)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
errdetail("Invalid size unit: \"%s\".", strptr),
- errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", and \"TB\".")));
+ errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", \"TB\", "
+ "\"PB\", \"EB\", \"ZB\", and \"YB\".")));
if (multiplier > 1)
{
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index 624948ccd2..8993522904 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -13,77 +13,116 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(6 rows)
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
- (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
- (1000000000::numeric), (1000000000000::numeric),
+ (VALUES (10::numeric),
+ (1000::numeric),
+ (1000000::numeric),
+ (1000000000::numeric),
+ (1000000000000::numeric),
(1000000000000000::numeric),
- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
- (1000000000.5::numeric), (1000000000000.5::numeric),
- (1000000000000000.5::numeric)) x(size);
- size | pg_size_pretty | pg_size_pretty
---------------------+----------------+----------------
- 10 | 10 bytes | -10 bytes
- 1000 | 1000 bytes | -1000 bytes
- 1000000 | 977 kB | -977 kB
- 1000000000 | 954 MB | -954 MB
- 1000000000000 | 931 GB | -931 GB
- 1000000000000000 | 909 TB | -909 TB
- 10.5 | 10.5 bytes | -10.5 bytes
- 1000.5 | 1000.5 bytes | -1000.5 bytes
- 1000000.5 | 977 kB | -977 kB
- 1000000000.5 | 954 MB | -954 MB
- 1000000000000.5 | 931 GB | -931 GB
- 1000000000000000.5 | 909 TB | -909 TB
-(12 rows)
+ (1000000000000000000::numeric),
+ (1000000000000000000000::numeric),
+ (1000000000000000000000000::numeric),
+ (1000000000000000000000000000::numeric),
+ (1000000000000000000000000000000::numeric),
+ (10.5::numeric),
+ (1000.5::numeric),
+ (1000000.5::numeric),
+ (1000000000.5::numeric),
+ (1000000000000.5::numeric),
+ (1000000000000000.5::numeric),
+ (1000000000000000000.5::numeric),
+ (1000000000000000000000.5::numeric),
+ (1000000000000000000000000.5::numeric),
+ (1000000000000000000000000000.5::numeric),
+ (1000000000000000000000000000000.5::numeric)
+ ) x(size);
+ size | pg_size_pretty | pg_size_pretty
+-----------------------------------+----------------+----------------
+ 10 | 10 bytes | -10 bytes
+ 1000 | 1000 bytes | -1000 bytes
+ 1000000 | 977 kB | -977 kB
+ 1000000000 | 954 MB | -954 MB
+ 1000000000000 | 931 GB | -931 GB
+ 1000000000000000 | 909 TB | -909 TB
+ 1000000000000000000 | 888 PB | -888 PB
+ 1000000000000000000000 | 867 EB | -867 EB
+ 1000000000000000000000000 | 847 ZB | -847 ZB
+ 1000000000000000000000000000 | 827 YB | -827 YB
+ 1000000000000000000000000000000 | 827181 YB | -827181 YB
+ 10.5 | 10.5 bytes | -10.5 bytes
+ 1000.5 | 1000.5 bytes | -1000.5 bytes
+ 1000000.5 | 977 kB | -977 kB
+ 1000000000.5 | 954 MB | -954 MB
+ 1000000000000.5 | 931 GB | -931 GB
+ 1000000000000000.5 | 909 TB | -909 TB
+ 1000000000000000000.5 | 888 PB | -888 PB
+ 1000000000000000000000.5 | 867 EB | -867 EB
+ 1000000000000000000000000.5 | 847 ZB | -847 ZB
+ 1000000000000000000000000000.5 | 827 YB | -827 YB
+ 1000000000000000000000000000000.5 | 827181 YB | -827181 YB
+(22 rows)
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
- ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
- size | pg_size_bytes
-----------+------------------
- 1 | 1
- 123bytes | 123
- 1kB | 1024
- 1MB | 1048576
- 1 GB | 1073741824
- 1.5 GB | 1610612736
- 1TB | 1099511627776
- 3000 TB | 3298534883328000
- 1e6 MB | 1048576000000
-(9 rows)
+ ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'),
+ ('1.17 YB')) x(size);
+ size | pg_size_bytes
+----------+---------------------------
+ 1 | 1
+ 123bytes | 123
+ 1kB | 1024
+ 1MB | 1048576
+ 1 GB | 1073741824
+ 1.5 GB | 1610612736
+ 1TB | 1099511627776
+ 3000 TB | 3298534883328000
+ 1e6 MB | 1048576000000
+ 99 PB | 111464090777419776
+ 45 EB | 51881467707308113920
+ 5.1ZB | 6021017265658797647463
+ 1.17 YB | 1414443208949116134406226
+(13 rows)
-- case-insensitive units are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
- ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
- size | pg_size_bytes
-----------+------------------
- 1 | 1
- 123bYteS | 123
- 1kb | 1024
- 1mb | 1048576
- 1 Gb | 1073741824
- 1.5 gB | 1610612736
- 1tb | 1099511627776
- 3000 tb | 3298534883328000
- 1e6 mb | 1048576000000
-(9 rows)
+ ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'),
+ ('1.17 yb')) x(size);
+ size | pg_size_bytes
+----------+---------------------------
+ 1 | 1
+ 123bYteS | 123
+ 1kb | 1024
+ 1mb | 1048576
+ 1 Gb | 1073741824
+ 1.5 gB | 1610612736
+ 1tb | 1099511627776
+ 3000 tb | 3298534883328000
+ 1e6 mb | 1048576000000
+ 99 pb | 111464090777419776
+ 45 eB | 51881467707308113920
+ 5.1Zb | 6021017265658797647463
+ 1.17 yb | 1414443208949116134406226
+(13 rows)
-- negative numbers are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
- ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
- size | pg_size_bytes
------------+-------------------
- -1 | -1
- -123bytes | -123
- -1kb | -1024
- -1mb | -1048576
- -1 Gb | -1073741824
- -1.5 gB | -1610612736
- -1tb | -1099511627776
- -3000 TB | -3298534883328000
- -10e-1 MB | -1048576
-(9 rows)
+ ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size);
+ size | pg_size_bytes
+-----------+-----------------------------
+ -1 | -1
+ -123bytes | -123
+ -1kb | -1024
+ -1mb | -1048576
+ -1 Gb | -1073741824
+ -1.5 gB | -1610612736
+ -1tb | -1099511627776
+ -3000 TB | -3298534883328000
+ -10e-1 MB | -1048576
+ -19e-4eb | -2190550858753009
+ -18YB | -21760664753063325144711168
+(11 rows)
-- different cases with allowed points
SELECT size, pg_size_bytes(size) FROM
@@ -101,25 +140,38 @@ SELECT size, pg_size_bytes(size) FROM
-.0 gb | 0
(8 rows)
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+ pg_size_bytes
+---------------------
+ 9223372036854775808
+(1 row)
+
+SELECT pg_size_bytes('1e100');
+ pg_size_bytes
+-------------------------------------------------------------------------------------------------------
+ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+(1 row)
+
-- invalid inputs
SELECT pg_size_bytes('1 AB');
ERROR: invalid size: "1 AB"
DETAIL: Invalid size unit: "AB".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1 AB A');
ERROR: invalid size: "1 AB A"
DETAIL: Invalid size unit: "AB A".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1 AB A ');
ERROR: invalid size: "1 AB A "
DETAIL: Invalid size unit: "AB A".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('1e1000000000000000000');
ERROR: value overflows numeric format
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
ERROR: invalid size: "1 byte"
DETAIL: Invalid size unit: "byte".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('');
ERROR: invalid size: ""
SELECT pg_size_bytes('kb');
@@ -137,6 +189,6 @@ ERROR: invalid size: ".+912"
SELECT pg_size_bytes('+912+ kB');
ERROR: invalid size: "+912+ kB"
DETAIL: Invalid size unit: "+ kB".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
SELECT pg_size_bytes('++123 kB');
ERROR: invalid size: "++123 kB"
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index 37023a01c3..2a4abcfc65 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -4,32 +4,55 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(1000000000000000::bigint)) x(size);
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
- (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
- (1000000000::numeric), (1000000000000::numeric),
+ (VALUES (10::numeric),
+ (1000::numeric),
+ (1000000::numeric),
+ (1000000000::numeric),
+ (1000000000000::numeric),
(1000000000000000::numeric),
- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
- (1000000000.5::numeric), (1000000000000.5::numeric),
- (1000000000000000.5::numeric)) x(size);
+ (1000000000000000000::numeric),
+ (1000000000000000000000::numeric),
+ (1000000000000000000000000::numeric),
+ (1000000000000000000000000000::numeric),
+ (1000000000000000000000000000000::numeric),
+ (10.5::numeric),
+ (1000.5::numeric),
+ (1000000.5::numeric),
+ (1000000000.5::numeric),
+ (1000000000000.5::numeric),
+ (1000000000000000.5::numeric),
+ (1000000000000000000.5::numeric),
+ (1000000000000000000000.5::numeric),
+ (1000000000000000000000000.5::numeric),
+ (1000000000000000000000000000.5::numeric),
+ (1000000000000000000000000000000.5::numeric)
+ ) x(size);
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
- ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
+ ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'),
+ ('1.17 YB')) x(size);
-- case-insensitive units are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
- ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
+ ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'),
+ ('1.17 yb')) x(size);
-- negative numbers are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
- ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
+ ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size);
-- different cases with allowed points
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'),
('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size);
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+SELECT pg_size_bytes('1e100');
+
-- invalid inputs
SELECT pg_size_bytes('1 AB');
SELECT pg_size_bytes('1 AB A');
--
2.30.1 (Apple Git-130)
On Wed, 30 Jun 2021 at 05:11, David Christensen
<david.christensen@crunchydata.com> wrote:
1) A basic refactor of the existing code to easily handle expanding the
units we use into a table-based format. This also includes changing the
return value of `pg_size_bytes()` from an int64 into a numeric, and
minor test adjustments to reflect this.
This is not quite what I had imagined when you said about adding a
table to make it easier to add new units in the future. I expected a
single table that handles all units, not just the ones above kB and
not one for each function.
There are actually two pg_size_pretty functions, one for BIGINT and
one for NUMERIC. I see you only changed the NUMERIC version. I'd
expect them both to have the same treatment and use the same table so
there's consistency between the two functions.
The attached is more like what I had in mind. There's a very small net
reduction in lines of code with this and it also helps keep
pg_size_pretty() and pg_size_pretty_numeric() in sync.
I don't really like the fact that I had to add the doHalfRound field
to get the same rounding behaviour as the original functions. I'm
wondering if it would just be too clever just to track how many bits
we've shifted right by in pg_size_pretty* and compare that to the
value of multishift for the current unit and do appropriate rounding
to get us to the value of multishift. In theory, we could just keep
calling the half_rounded macro until we make it to the multishift
value. My current thoughts are that it's unlikely that anyone would
twiddle with the size_pretty_units array in such a way for the extra
code to be worth it. Maybe someone else feels differently.
David
Attachments:
tidy_up_pg_size_pretty_functions.patchapplication/octet-stream; name=tidy_up_pg_size_pretty_functions.patchDownload
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 9c39e7d3b3..2fcfef76f9 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -34,6 +34,30 @@
/* Divide by two and round towards positive infinity. */
#define half_rounded(x) (((x) + ((x) < 0 ? 0 : 1)) / 2)
+/*
+ * Units used in pg_size_pretty functions
+ */
+struct size_pretty_unit {
+ char *unit; /* e.g bytes, kB, MB etc */
+ uint64 limit; /* upper limit after reducing by shiftright
+ * bits from previous units */
+ uint8 shiftRight; /* number of bits to shift right by after
+ * limit is exceeded */
+ uint8 multishift; /* how many bits to shift "1" left by to make
+ * one of these units. */
+ bool doHalfRound; /* perform halfround operation */
+};
+
+/* When adding units here also update the error message in pg_size_bytes */
+static const struct size_pretty_unit size_pretty_units[] = {
+ {"bytes", 10 * 1024, 9, 0, false}, /* keep one extra bit for rounding */
+ {"kB", 20 * 1024 - 1, 10, 10, true},
+ {"MB", 20 * 1024 - 1, 10, 20, true},
+ {"GB", 20 * 1024 - 1, 10, 30, true},
+ {"TB", 20 * 1024 - 1, 10, 40, true},
+ {NULL, 0, 0, 0, false}
+};
+
/* Return physical size of directory contents, or 0 if dir doesn't exist */
static int64
db_dir_size(const char *path)
@@ -535,37 +559,30 @@ pg_size_pretty(PG_FUNCTION_ARGS)
{
int64 size = PG_GETARG_INT64(0);
char buf[64];
- int64 limit = 10 * 1024;
- int64 limit2 = limit * 2 - 1;
+ uint32 i;
- if (Abs(size) < limit)
- snprintf(buf, sizeof(buf), INT64_FORMAT " bytes", size);
- else
+ for (i = 0; size_pretty_units[i].unit != NULL; i++)
{
- size >>= 9; /* keep one extra bit for rounding */
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " kB",
- half_rounded(size));
- else
+ /*
+ * Use this set of units if the number is under the limit or there
+ * is no other unit to use.
+ */
+ if (Abs(size) < size_pretty_units[i].limit ||
+ size_pretty_units[i + 1].unit == NULL)
{
- size >>= 10;
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " MB",
- half_rounded(size));
- else
- {
- size >>= 10;
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " GB",
- half_rounded(size));
- else
- {
- size >>= 10;
- snprintf(buf, sizeof(buf), INT64_FORMAT " TB",
- half_rounded(size));
- }
- }
+ if (size_pretty_units[i].doHalfRound)
+ size = half_rounded(size);
+
+ snprintf(buf, sizeof(buf), INT64_FORMAT " %s", size,
+ size_pretty_units[i].unit);
+ break;
}
+
+ /*
+ * limit for the current unit has been exceeded. Shift right by the
+ * specified number of bits and try the next unit
+ */
+ size >>= size_pretty_units[i].shiftRight;
}
PG_RETURN_TEXT_P(cstring_to_text(buf));
@@ -636,56 +653,34 @@ Datum
pg_size_pretty_numeric(PG_FUNCTION_ARGS)
{
Numeric size = PG_GETARG_NUMERIC(0);
- Numeric limit,
- limit2;
+ Numeric limit;
char *result;
+ uint32 i;
- limit = int64_to_numeric(10 * 1024);
- limit2 = int64_to_numeric(10 * 1024 * 2 - 1);
-
- if (numeric_is_less(numeric_absolute(size), limit))
+ for (i = 0; size_pretty_units[i].unit != NULL; i++)
{
- result = psprintf("%s bytes", numeric_to_cstring(size));
- }
- else
- {
- /* keep one extra bit for rounding */
- /* size >>= 9 */
- size = numeric_shift_right(size, 9);
+ limit = int64_to_numeric(size_pretty_units[i].limit);
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s kB", numeric_to_cstring(size));
- }
- else
+ /*
+ * Use this set of units if the number is under the limit or there
+ * is no other unit to use.
+ */
+ if (numeric_is_less(numeric_absolute(size), limit) ||
+ size_pretty_units[i + 1].unit == NULL)
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
+ if (size_pretty_units[i].doHalfRound)
size = numeric_half_rounded(size);
- result = psprintf("%s MB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
-
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s GB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- size = numeric_half_rounded(size);
- result = psprintf("%s TB", numeric_to_cstring(size));
- }
- }
+
+ result = psprintf("%s %s", numeric_to_cstring(size),
+ size_pretty_units[i].unit);
+ break;
}
+
+ /*
+ * limit for the current unit has been exceeded. Shift right by the
+ * specified number of bits and try the next unit
+ */
+ size = numeric_shift_right(size, size_pretty_units[i].shiftRight);
}
PG_RETURN_TEXT_P(cstring_to_text(result));
@@ -787,6 +782,7 @@ pg_size_bytes(PG_FUNCTION_ARGS)
if (*strptr != '\0')
{
int64 multiplier = 0;
+ uint32 i;
/* Trim any trailing whitespace */
endptr = str + VARSIZE_ANY_EXHDR(arg) - 1;
@@ -797,21 +793,18 @@ pg_size_bytes(PG_FUNCTION_ARGS)
endptr++;
*endptr = '\0';
- /* Parse the unit case-insensitively */
- if (pg_strcasecmp(strptr, "bytes") == 0)
- multiplier = (int64) 1;
- else if (pg_strcasecmp(strptr, "kb") == 0)
- multiplier = (int64) 1024;
- else if (pg_strcasecmp(strptr, "mb") == 0)
- multiplier = ((int64) 1024) * 1024;
-
- else if (pg_strcasecmp(strptr, "gb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024;
-
- else if (pg_strcasecmp(strptr, "tb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024 * 1024;
-
- else
+ for (i = 0; size_pretty_units[i].unit != NULL; i++)
+ {
+ /* Parse the unit case-insensitively */
+ if (pg_strcasecmp(strptr, size_pretty_units[i].unit) == 0)
+ {
+ multiplier = ((int64) 1) << size_pretty_units[i].multishift;
+ break;
+ }
+ }
+
+ /* Verify we found a valid unit in the loop above */
+ if (size_pretty_units[i].unit == NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
On Mon, 5 Jul 2021 at 20:00, David Rowley <dgrowleyml@gmail.com> wrote:
I don't really like the fact that I had to add the doHalfRound field
to get the same rounding behaviour as the original functions. I'm
wondering if it would just be too clever just to track how many bits
we've shifted right by in pg_size_pretty* and compare that to the
value of multishift for the current unit and do appropriate rounding
to get us to the value of multishift. In theory, we could just keep
calling the half_rounded macro until we make it to the multishift
value. My current thoughts are that it's unlikely that anyone would
twiddle with the size_pretty_units array in such a way for the extra
code to be worth it. Maybe someone else feels differently.
I made another pass over this and ended up removing the doHalfRound
field in favour of just doing rounding based on the previous
bitshifts.
I did a few other tidy ups and I think it's a useful patch as it
reduces the amount of code a bit and makes it dead simple to add new
units in the future. Most importantly it'll help keep pg_size_pretty,
pg_size_pretty_numeric and pg_size_bytes all in sync in regards to
what units they support.
Does anyone want to have a look over this? If not, I plan to push it
in the next day or so.
(I'm not sure why pgindent removed the space between != and NULL, but
it did, so I left it.)
David
Attachments:
v1-0001-Use-lookup-table-for-units-in-pg_size_pretty-and-.patchapplication/octet-stream; name=v1-0001-Use-lookup-table-for-units-in-pg_size_pretty-and-.patchDownload
From 8b387db81b93349968be3ae9d6cd7cd77d05d2e7 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 5 Jul 2021 20:03:27 +1200
Subject: [PATCH v1] Use lookup table for units in pg_size_pretty and
pg_size_bytes
We've grown 2 versions of pg_size_pretty over the years, one for BIGINT
and one for NUMERIC. Both should output the same, but keeping them in
sync would be harder than needed due to neither function sharing a source
of truth about which units to use.
Here we add a static array that defines the units that we recognize and
have both pg_size_pretty and pg_size_pretty_numeric use it. This will
make adding any units in the future a very simple task.
The table contains all information required to allow us to also modify
pg_size_bytes to also use it there too, so adjust that too.
There are no behavioral changes here.
---
src/backend/utils/adt/dbsize.c | 153 +++++++++++++++------------------
1 file changed, 68 insertions(+), 85 deletions(-)
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 9c39e7d3b3..4b7331d85c 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -34,6 +34,29 @@
/* Divide by two and round towards positive infinity. */
#define half_rounded(x) (((x) + ((x) < 0 ? 0 : 1)) / 2)
+/* Units used in pg_size_pretty functions. All units must be powers of 2 */
+struct size_pretty_unit
+{
+ char *text; /* bytes, kB, MB, GB etc */
+ uint32 limit; /* upper limit after reducing by shiftRight
+ * bits from previous unit */
+ uint8 shiftRight; /* number of bits to shift right by after
+ * limit is exceeded before considering the
+ * next unit. Remaining shifts will be half
+ * rounded until we get to unitBits */
+ uint8 unitBits; /* 1 << unitBits make 1 of this unit */
+};
+
+/* When adding units here also update the error message in pg_size_bytes */
+static const struct size_pretty_unit size_pretty_units[] = {
+ {"bytes", 10 * 1024, 9, 0}, /* keep one extra bit for rounding */
+ {"kB", 20 * 1024 - 1, 10, 10},
+ {"MB", 20 * 1024 - 1, 10, 20},
+ {"GB", 20 * 1024 - 1, 10, 30},
+ {"TB", 20 * 1024 - 1, 10, 40},
+ {NULL, 0, 0, 0}
+};
+
/* Return physical size of directory contents, or 0 if dir doesn't exist */
static int64
db_dir_size(const char *path)
@@ -535,37 +558,25 @@ pg_size_pretty(PG_FUNCTION_ARGS)
{
int64 size = PG_GETARG_INT64(0);
char buf[64];
- int64 limit = 10 * 1024;
- int64 limit2 = limit * 2 - 1;
+ uint8 rightshifts = 0;
+ const struct size_pretty_unit *unit;
- if (Abs(size) < limit)
- snprintf(buf, sizeof(buf), INT64_FORMAT " bytes", size);
- else
+ for (unit = size_pretty_units; unit->text !=NULL; unit++)
{
- size >>= 9; /* keep one extra bit for rounding */
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " kB",
- half_rounded(size));
- else
+ /* use this unit if below limit or there's no more units */
+ if (Abs(size) < unit->limit || unit[1].text == NULL)
{
- size >>= 10;
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " MB",
- half_rounded(size));
- else
- {
- size >>= 10;
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " GB",
- half_rounded(size));
- else
- {
- size >>= 10;
- snprintf(buf, sizeof(buf), INT64_FORMAT " TB",
- half_rounded(size));
- }
- }
+ /* half-round until we get down to unitBits */
+ while (rightshifts++ < unit->unitBits)
+ size = half_rounded(size);
+
+ snprintf(buf, sizeof(buf), INT64_FORMAT " %s", size, unit->text);
+ break;
}
+
+ /* size too big for this unit, shift right and try the next unit */
+ size >>= unit->shiftRight;
+ rightshifts += unit->shiftRight;
}
PG_RETURN_TEXT_P(cstring_to_text(buf));
@@ -636,56 +647,30 @@ Datum
pg_size_pretty_numeric(PG_FUNCTION_ARGS)
{
Numeric size = PG_GETARG_NUMERIC(0);
- Numeric limit,
- limit2;
- char *result;
-
- limit = int64_to_numeric(10 * 1024);
- limit2 = int64_to_numeric(10 * 1024 * 2 - 1);
+ Numeric limit;
+ char *result = NULL;
+ uint8 rightshifts = 0;
+ const struct size_pretty_unit *unit;
- if (numeric_is_less(numeric_absolute(size), limit))
- {
- result = psprintf("%s bytes", numeric_to_cstring(size));
- }
- else
+ for (unit = size_pretty_units; unit->text !=NULL; unit++)
{
- /* keep one extra bit for rounding */
- /* size >>= 9 */
- size = numeric_shift_right(size, 9);
+ limit = int64_to_numeric(unit->limit);
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s kB", numeric_to_cstring(size));
- }
- else
+ /* use this unit if below limit or there's no more units */
+ if (numeric_is_less(numeric_absolute(size), limit) ||
+ unit[1].text == NULL)
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
+ /* half-round until we get down to unitBits */
+ while (rightshifts++ < unit->unitBits)
size = numeric_half_rounded(size);
- result = psprintf("%s MB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
-
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s GB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- size = numeric_half_rounded(size);
- result = psprintf("%s TB", numeric_to_cstring(size));
- }
- }
+
+ result = psprintf("%s %s", numeric_to_cstring(size), unit->text);
+ break;
}
+
+ /* size too big for this unit, shift right and try the next unit */
+ size = numeric_shift_right(size, unit->shiftRight);
+ rightshifts += unit->shiftRight;
}
PG_RETURN_TEXT_P(cstring_to_text(result));
@@ -786,6 +771,7 @@ pg_size_bytes(PG_FUNCTION_ARGS)
/* Handle possible unit */
if (*strptr != '\0')
{
+ const struct size_pretty_unit *unit;
int64 multiplier = 0;
/* Trim any trailing whitespace */
@@ -797,21 +783,18 @@ pg_size_bytes(PG_FUNCTION_ARGS)
endptr++;
*endptr = '\0';
- /* Parse the unit case-insensitively */
- if (pg_strcasecmp(strptr, "bytes") == 0)
- multiplier = (int64) 1;
- else if (pg_strcasecmp(strptr, "kb") == 0)
- multiplier = (int64) 1024;
- else if (pg_strcasecmp(strptr, "mb") == 0)
- multiplier = ((int64) 1024) * 1024;
-
- else if (pg_strcasecmp(strptr, "gb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024;
-
- else if (pg_strcasecmp(strptr, "tb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024 * 1024;
+ for (unit = size_pretty_units; unit->text !=NULL; unit++)
+ {
+ /* Parse the unit case-insensitively */
+ if (pg_strcasecmp(strptr, unit->text) == 0)
+ {
+ multiplier = ((int64) 1) << unit->unitBits;
+ break;
+ }
+ }
- else
+ /* Verify we found a valid unit in the loop above */
+ if (unit->text == NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
--
2.30.2
On Tue, 6 Jul 2021 at 10:20, David Rowley <dgrowleyml@gmail.com> wrote:
I made another pass over this and ended up removing the doHalfRound
field in favour of just doing rounding based on the previous
bitshifts.
When I first read this:
+ /* half-round until we get down to unitBits */
+ while (rightshifts++ < unit->unitBits)
+ size = half_rounded(size);
it looked to me like it would be invoking half_rounded() multiple
times, which raised alarm bells because that would risk rounding the
wrong way. Eventually I realised that by the time it reaches that,
rightshifts will always equal unit->unitBits or unit->unitBits - 1, so
it'll never do more than one half-round, which is important.
So perhaps using doHalfRound would be clearer, but it could just be a
local variable tracking whether or not it's the first time through the
loop.
Regards,
Dean
On Tue, 6 Jul 2021 at 23:39, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
When I first read this:
+ /* half-round until we get down to unitBits */ + while (rightshifts++ < unit->unitBits) + size = half_rounded(size);it looked to me like it would be invoking half_rounded() multiple
times, which raised alarm bells because that would risk rounding the
wrong way. Eventually I realised that by the time it reaches that,
rightshifts will always equal unit->unitBits or unit->unitBits - 1, so
it'll never do more than one half-round, which is important.
It's true that based on how the units table is set up now, it'll only
ever do it once for all but the first loop.
I wrote the attached .c file just to try to see if it ever goes wrong
and I didn't manage to find any inputs where it did. I always seem to
get the half rounded value either the same as the shifted value or 1
higher towards positive infinity
$ ./half_rounded -102 3
1. half_round(-102) == -51 :: -102 >> 1 = -51
2. half_round(-51) == -25 :: -51 >> 1 = -26
3. half_round(-25) == -12 :: -26 >> 1 = -13
$ ./half_rounded 6432 3
1. half_round(6432) == 3216 :: 6432 >> 1 = 3216
2. half_round(3216) == 1608 :: 3216 >> 1 = 1608
3. half_round(1608) == 804 :: 1608 >> 1 = 804
Can you give an example where calling half_rounded too many times will
give the wrong value? Keeping in mind we call half_rounded the number
of times that the passed in value would need to be left-shifted by to
get the equivalent truncated value.
David
Attachments:
On Tue, 6 Jul 2021 at 13:15, David Rowley <dgrowleyml@gmail.com> wrote:
Can you give an example where calling half_rounded too many times will
give the wrong value? Keeping in mind we call half_rounded the number
of times that the passed in value would need to be left-shifted by to
get the equivalent truncated value.
./half_rounded 10241 10
1. half_round(10241) == 5121 :: 10241 >> 1 = 5120
2. half_round(5121) == 2561 :: 5120 >> 1 = 2560
3. half_round(2561) == 1281 :: 2560 >> 1 = 1280
4. half_round(1281) == 641 :: 1280 >> 1 = 640
5. half_round(641) == 321 :: 640 >> 1 = 320
6. half_round(321) == 161 :: 320 >> 1 = 160
7. half_round(161) == 81 :: 160 >> 1 = 80
8. half_round(81) == 41 :: 80 >> 1 = 40
9. half_round(41) == 21 :: 40 >> 1 = 20
10. half_round(21) == 11 :: 20 >> 1 = 10
The correct result should be 10 (it would be very odd to claim that
10241 bytes should be displayed as 11kb), but the half-rounding keeps
rounding up at each stage.
That's a general property of rounding -- you need to be very careful
when rounding more than once, since otherwise errors will propagate.
C.f. 4083f445c0, which removed a double-round in numeric sqrt().
To be clear, I'm not saying that the current code half-rounds more
than once, just that it reads as if it does.
Regards,
Dean
David Rowley writes:
On Mon, 5 Jul 2021 at 20:00, David Rowley <dgrowleyml@gmail.com> wrote:
I don't really like the fact that I had to add the doHalfRound field
to get the same rounding behaviour as the original functions. I'm
wondering if it would just be too clever just to track how many bits
we've shifted right by in pg_size_pretty* and compare that to the
value of multishift for the current unit and do appropriate rounding
to get us to the value of multishift. In theory, we could just keep
calling the half_rounded macro until we make it to the multishift
value. My current thoughts are that it's unlikely that anyone would
twiddle with the size_pretty_units array in such a way for the extra
code to be worth it. Maybe someone else feels differently.I made another pass over this and ended up removing the doHalfRound
field in favour of just doing rounding based on the previous
bitshifts.I did a few other tidy ups and I think it's a useful patch as it
reduces the amount of code a bit and makes it dead simple to add new
units in the future. Most importantly it'll help keep pg_size_pretty,
pg_size_pretty_numeric and pg_size_bytes all in sync in regards to
what units they support.Does anyone want to have a look over this? If not, I plan to push it
in the next day or so.(I'm not sure why pgindent removed the space between != and NULL, but
it did, so I left it.)David
I like the approach you took here; much cleaner to have one table for all of the individual
codepaths. Testing worked as expected; if we do decide to expand the units table there will be a
few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch
to `numeric`).
Thanks,
David
David Rowley <dgrowleyml@gmail.com> writes:
Does anyone want to have a look over this? If not, I plan to push it
in the next day or so.
Minor nit: use "const char *text" in the struct declaration, so
that all of the static data can be placed in fixed storage.
(I'm not sure why pgindent removed the space between != and NULL, but
it did, so I left it.)
It did that because "text" is a typedef name, so it's a bit confused
about whether the statement is really a declaration. Personally I'd
have used "name" or something like that for that field, anyway.
regards, tom lane
On Wed, 7 Jul 2021 at 00:51, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
10. half_round(21) == 11 :: 20 >> 1 = 10
The correct result should be 10 (it would be very odd to claim that
10241 bytes should be displayed as 11kb), but the half-rounding keeps
rounding up at each stage.That's a general property of rounding -- you need to be very careful
when rounding more than once, since otherwise errors will propagate.
C.f. 4083f445c0, which removed a double-round in numeric sqrt().
Thanks. I've adjusted the patch to re-add the round bool flag and get
rid of the rightShift field. I'm now calculating how many bits to
shift right by based on the difference between the unitbits of the
current and next unit then taking 1 bit less if the next unit does
half rounding and the current one does not, or adding an extra bit on
in the opposite case.
I'll post another patch shortly.
David
On Wed, 7 Jul 2021 at 02:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Minor nit: use "const char *text" in the struct declaration, so
that all of the static data can be placed in fixed storage.
Thanks for pointing that out.
David Rowley <dgrowleyml@gmail.com> writes:
(I'm not sure why pgindent removed the space between != and NULL, but
it did, so I left it.)It did that because "text" is a typedef name, so it's a bit confused
about whether the statement is really a declaration. Personally I'd
have used "name" or something like that for that field, anyway.
I should have thought of that. Thanks for highlighting it. I've
renamed the field.
Updated patch attached.
David
Attachments:
v2-0001-Use-lookup-table-for-units-in-pg_size_pretty-and-.patchapplication/octet-stream; name=v2-0001-Use-lookup-table-for-units-in-pg_size_pretty-and-.patchDownload
From ec3a5513066c0d37ccda6d4ebc122ff1a4d45148 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 5 Jul 2021 20:03:27 +1200
Subject: [PATCH v2] Use lookup table for units in pg_size_pretty and
pg_size_bytes
We've grown 2 versions of pg_size_pretty over the years, one for BIGINT
and one for NUMERIC. Both should output the same, but keeping them in
sync would be harder than needed due to neither function sharing a source
of truth about which units to use.
Here we add a static array that defines the units that we recognize and
have both pg_size_pretty and pg_size_pretty_numeric use it. This will
make adding any units in the future a very simple task.
The table contains all information required to allow us to also modify
pg_size_bytes to also use it there too, so adjust that too.
There are no behavioral changes here.
---
src/backend/utils/adt/dbsize.c | 158 +++++++++++++++------------------
1 file changed, 73 insertions(+), 85 deletions(-)
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 9c39e7d3b3..9a15625e89 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -34,6 +34,27 @@
/* Divide by two and round towards positive infinity. */
#define half_rounded(x) (((x) + ((x) < 0 ? 0 : 1)) / 2)
+/* Units used in pg_size_pretty functions. All units must be powers of 2 */
+struct size_pretty_unit
+{
+ const char *name; /* bytes, kB, MB, GB etc */
+ uint32 limit; /* upper limit, prior to half rounding after
+ * converting to this unit. */
+ bool round; /* do half rounding for this unit */
+ uint8 unitbits; /* (1 << unitbits) bytes to make 1 of this
+ * unit */
+};
+
+/* When adding units here also update the error message in pg_size_bytes */
+static const struct size_pretty_unit size_pretty_units[] = {
+ {"bytes", 10 * 1024, false, 0},
+ {"kB", 20 * 1024 - 1, true, 10},
+ {"MB", 20 * 1024 - 1, true, 20},
+ {"GB", 20 * 1024 - 1, true, 30},
+ {"TB", 20 * 1024 - 1, true, 40},
+ {NULL, 0, false, 0}
+};
+
/* Return physical size of directory contents, or 0 if dir doesn't exist */
static int64
db_dir_size(const char *path)
@@ -535,37 +556,28 @@ pg_size_pretty(PG_FUNCTION_ARGS)
{
int64 size = PG_GETARG_INT64(0);
char buf[64];
- int64 limit = 10 * 1024;
- int64 limit2 = limit * 2 - 1;
+ const struct size_pretty_unit *unit;
- if (Abs(size) < limit)
- snprintf(buf, sizeof(buf), INT64_FORMAT " bytes", size);
- else
+ for (unit = size_pretty_units; unit->name != NULL; unit++)
{
- size >>= 9; /* keep one extra bit for rounding */
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " kB",
- half_rounded(size));
- else
+ /* use this unit there are no more units or we're below the limit */
+ if (unit[1].name == NULL || Abs(size) < unit->limit)
{
- size >>= 10;
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " MB",
- half_rounded(size));
- else
- {
- size >>= 10;
- if (Abs(size) < limit2)
- snprintf(buf, sizeof(buf), INT64_FORMAT " GB",
- half_rounded(size));
- else
- {
- size >>= 10;
- snprintf(buf, sizeof(buf), INT64_FORMAT " TB",
- half_rounded(size));
- }
- }
+ if (unit->round)
+ size = half_rounded(size);
+
+ snprintf(buf, sizeof(buf), INT64_FORMAT " %s", size, unit->name);
+ break;
}
+
+ /*
+ * We may need to shift 1 bit less than the difference between this
+ * and the next unit if the next unit has half rounding. Or we may
+ * need to shift an extra bit if this unit has half rounding and the
+ * next one does not.
+ */
+ size >>= (unit[1].unitbits - unit->unitbits - (unit[1].round == true)
+ + (unit->round == true));
}
PG_RETURN_TEXT_P(cstring_to_text(buf));
@@ -636,56 +648,34 @@ Datum
pg_size_pretty_numeric(PG_FUNCTION_ARGS)
{
Numeric size = PG_GETARG_NUMERIC(0);
- Numeric limit,
- limit2;
- char *result;
-
- limit = int64_to_numeric(10 * 1024);
- limit2 = int64_to_numeric(10 * 1024 * 2 - 1);
+ char *result = NULL;
+ const struct size_pretty_unit *unit;
- if (numeric_is_less(numeric_absolute(size), limit))
+ for (unit = size_pretty_units; unit->name != NULL; unit++)
{
- result = psprintf("%s bytes", numeric_to_cstring(size));
- }
- else
- {
- /* keep one extra bit for rounding */
- /* size >>= 9 */
- size = numeric_shift_right(size, 9);
+ unsigned int shiftby;
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s kB", numeric_to_cstring(size));
- }
- else
+ /* use this unit there are no more units or we're below the limit */
+ if (unit[1].name == NULL ||
+ numeric_is_less(numeric_absolute(size),
+ int64_to_numeric(unit->limit)))
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
+ if (unit->round)
size = numeric_half_rounded(size);
- result = psprintf("%s MB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
-
- if (numeric_is_less(numeric_absolute(size), limit2))
- {
- size = numeric_half_rounded(size);
- result = psprintf("%s GB", numeric_to_cstring(size));
- }
- else
- {
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
- size = numeric_half_rounded(size);
- result = psprintf("%s TB", numeric_to_cstring(size));
- }
- }
+
+ result = psprintf("%s %s", numeric_to_cstring(size), unit->name);
+ break;
}
+
+ /*
+ * We may need to shift 1 bit less than the difference between this
+ * and the next unit if the next unit has half rounding. Or we may
+ * need to shift an extra bit if this unit has half rounding and the
+ * next one does not.
+ */
+ shiftby = (unit[1].unitbits - unit->unitbits - (unit[1].round == true)
+ + (unit->round == true));
+ size = numeric_shift_right(size, shiftby);
}
PG_RETURN_TEXT_P(cstring_to_text(result));
@@ -786,6 +776,7 @@ pg_size_bytes(PG_FUNCTION_ARGS)
/* Handle possible unit */
if (*strptr != '\0')
{
+ const struct size_pretty_unit *unit;
int64 multiplier = 0;
/* Trim any trailing whitespace */
@@ -797,21 +788,18 @@ pg_size_bytes(PG_FUNCTION_ARGS)
endptr++;
*endptr = '\0';
- /* Parse the unit case-insensitively */
- if (pg_strcasecmp(strptr, "bytes") == 0)
- multiplier = (int64) 1;
- else if (pg_strcasecmp(strptr, "kb") == 0)
- multiplier = (int64) 1024;
- else if (pg_strcasecmp(strptr, "mb") == 0)
- multiplier = ((int64) 1024) * 1024;
-
- else if (pg_strcasecmp(strptr, "gb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024;
-
- else if (pg_strcasecmp(strptr, "tb") == 0)
- multiplier = ((int64) 1024) * 1024 * 1024 * 1024;
+ for (unit = size_pretty_units; unit->name != NULL; unit++)
+ {
+ /* Parse the unit case-insensitively */
+ if (pg_strcasecmp(strptr, unit->name) == 0)
+ {
+ multiplier = ((int64) 1) << unit->unitbits;
+ break;
+ }
+ }
- else
+ /* Verify we found a valid unit in the loop above */
+ if (unit->name == NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
--
2.30.2
On Wed, 7 Jul 2021 at 02:46, David Christensen
<david.christensen@crunchydata.com> wrote:
if we do decide to expand the units table there will be a
few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch
to `numeric`).
I wonder if it's worth changing pg_size_bytes() to return NUMERIC
regardless of if we add any additional units or not.
Would you like to create 2 patches, one to change the return type and
another to add the new units, both based on top of the v2 patch I sent
earlier?
David
On Wed, 7 Jul 2021 at 03:47, David Rowley <dgrowleyml@gmail.com> wrote:
Updated patch attached.
Hmm, this looked easy, but...
It occurred to me that there ought to be regression tests for the edge
cases where it steps from one unit to the next. So, in the style of
the existing regression tests, I tried the following:
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(VALUES (10239::bigint), (10240::bigint),
(10485247::bigint), (10485248::bigint),
(10736893951::bigint), (10736893952::bigint),
(10994579406847::bigint), (10994579406848::bigint),
(11258449312612351::bigint), (11258449312612352::bigint)) x(size);
size | pg_size_pretty | pg_size_pretty
-------------------+----------------+----------------
10239 | 10239 bytes | -10239 bytes
10240 | 10 kB | -10 kB
10485247 | 10239 kB | -10 MB
10485248 | 10 MB | -10 MB
10736893951 | 10239 MB | -10 GB
10736893952 | 10 GB | -10 GB
10994579406847 | 10239 GB | -10 TB
10994579406848 | 10 TB | -10 TB
11258449312612351 | 10239 TB | -10239 TB
11258449312612352 | 10240 TB | -10239 TB
(10 rows)
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(VALUES (10239::numeric), (10240::numeric),
(10485247::numeric), (10485248::numeric),
(10736893951::numeric), (10736893952::numeric),
(10994579406847::numeric), (10994579406848::numeric),
(11258449312612351::numeric), (11258449312612352::numeric)) x(size);
size | pg_size_pretty | pg_size_pretty
-------------------+----------------+----------------
10239 | 10239 bytes | -10239 bytes
10240 | 10 kB | -10 kB
10485247 | 10239 kB | -10239 kB
10485248 | 10 MB | -10 MB
10736893951 | 10239 MB | -10239 MB
10736893952 | 10 GB | -10 GB
10994579406847 | 10239 GB | -10239 GB
10994579406848 | 10 TB | -10 TB
11258449312612351 | 10239 TB | -10239 TB
11258449312612352 | 10240 TB | -10240 TB
(10 rows)
Under the assumption that what we're trying to achieve here is
schoolbook rounding (ties away from zero), the numeric results are
correct and the bigint results are wrong.
The reason is that bit shifting isn't the same as division for
negative numbers, since bit shifting rounds towards negative infinity
whereas division rounds towards zero (truncates), which is what I
think we really need.
Regards,
Dean
David Rowley writes:
On Wed, 7 Jul 2021 at 02:46, David Christensen
<david.christensen@crunchydata.com> wrote:if we do decide to expand the units table there will be a
few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch
to `numeric`).I wonder if it's worth changing pg_size_bytes() to return NUMERIC
regardless of if we add any additional units or not.Would you like to create 2 patches, one to change the return type and
another to add the new units, both based on top of the v2 patch I sent
earlier?David
Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
add PB and EB.
If we decide to expand further, the current implementation will need to change, as
ZB and YB have 70 and 80 bits needing to be shifted accordingly, so int64 isn't enough to hold
it. (I fixed this particular issue in the original version of this patch, so there is at least a
blueprint of how to fix.)
I figured that PB and EB are probably good enough additions at this point, so we can debate whether
to add the others.
Best,
David
Attachments:
pg_size_bytes-numeric.patchtext/x-patchDownload
From 57bd2eafff5404313426a10f63b0b7098314998a Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 7 Jul 2021 11:46:09 -0500
Subject: [PATCH] Make pg_size_bytes() return numeric instead of bigint
---
src/backend/utils/adt/dbsize.c | 7 ++++---
src/include/catalog/pg_proc.dat | 2 +-
src/test/regress/expected/dbsize.out | 17 +++++++++++++----
src/test/regress/sql/dbsize.sql | 6 ++++--
4 files changed, 22 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4b7331d85c..a5a3ebe34e 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -813,10 +813,11 @@ pg_size_bytes(PG_FUNCTION_ARGS)
}
}
- result = DatumGetInt64(DirectFunctionCall1(numeric_int8,
- NumericGetDatum(num)));
+ /* now finally truncate, since this is always in integer-like units */
+ num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil,
+ NumericGetDatum(num)));
- PG_RETURN_INT64(result);
+ PG_RETURN_NUMERIC(num);
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fde251fa4f..73326e3618 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7187,7 +7187,7 @@
prosrc => 'pg_size_pretty_numeric' },
{ oid => '3334',
descr => 'convert a size in human-readable format with size units into bytes',
- proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text',
+ proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text',
prosrc => 'pg_size_bytes' },
{ oid => '2997',
descr => 'disk space usage for the specified table, including TOAST, free space and visibility map',
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index e901a2c92a..2950e017d0 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -101,6 +101,19 @@ SELECT size, pg_size_bytes(size) FROM
-.0 gb | 0
(8 rows)
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+ pg_size_bytes
+---------------------
+ 9223372036854775808
+(1 row)
+
+SELECT pg_size_bytes('1e100');
+ pg_size_bytes
+-------------------------------------------------------------------------------------------------------
+ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+(1 row)
+
-- invalid inputs
SELECT pg_size_bytes('1 AB');
ERROR: invalid size: "1 AB"
@@ -114,10 +127,6 @@ SELECT pg_size_bytes('1 AB A ');
ERROR: invalid size: "1 AB A "
DETAIL: Invalid size unit: "AB A".
HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
-SELECT pg_size_bytes('9223372036854775807.9');
-ERROR: bigint out of range
-SELECT pg_size_bytes('1e100');
-ERROR: bigint out of range
SELECT pg_size_bytes('1e1000000000000000000');
ERROR: value overflows numeric format
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index d10a4d7f68..e88184e9c9 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -30,12 +30,14 @@ SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'),
('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size);
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+SELECT pg_size_bytes('1e100');
+
-- invalid inputs
SELECT pg_size_bytes('1 AB');
SELECT pg_size_bytes('1 AB A');
SELECT pg_size_bytes('1 AB A ');
-SELECT pg_size_bytes('9223372036854775807.9');
-SELECT pg_size_bytes('1e100');
SELECT pg_size_bytes('1e1000000000000000000');
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
SELECT pg_size_bytes('');
--
2.30.1 (Apple Git-130)
pg_size-EB.patchtext/x-patchDownload
From 8d6972abf01c01d2c8713d94bbb46ae44682195e Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 7 Jul 2021 12:37:41 -0500
Subject: [PATCH] Expand units for pg_size_*() up through EB
---
src/backend/utils/adt/dbsize.c | 5 +-
src/test/regress/expected/dbsize.out | 138 +++++++++++++++++----------
src/test/regress/sql/dbsize.sql | 33 +++++--
3 files changed, 114 insertions(+), 62 deletions(-)
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index a5a3ebe34e..cd68ff10c8 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -54,6 +54,8 @@ static const struct size_pretty_unit size_pretty_units[] = {
{"MB", 20 * 1024 - 1, 10, 20},
{"GB", 20 * 1024 - 1, 10, 30},
{"TB", 20 * 1024 - 1, 10, 40},
+ {"PB", 20 * 1024 - 1, 10, 50},
+ {"EB", 20 * 1024 - 1, 10, 60},
{NULL, 0, 0, 0}
};
@@ -799,7 +801,8 @@ pg_size_bytes(PG_FUNCTION_ARGS)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
errdetail("Invalid size unit: \"%s\".", strptr),
- errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", and \"TB\".")));
+ errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", \"TB\", "
+ "\"PB\", and \"EB\".")));
if (multiplier > 1)
{
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index 2950e017d0..cb260f69a3 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -13,65 +13,96 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(6 rows)
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
- (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
- (1000000000::numeric), (1000000000000::numeric),
+ (VALUES (10::numeric),
+ (1000::numeric),
+ (1000000::numeric),
+ (1000000000::numeric),
+ (1000000000000::numeric),
(1000000000000000::numeric),
- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
- (1000000000.5::numeric), (1000000000000.5::numeric),
- (1000000000000000.5::numeric)) x(size);
- size | pg_size_pretty | pg_size_pretty
---------------------+----------------+----------------
- 10 | 10 bytes | -10 bytes
- 1000 | 1000 bytes | -1000 bytes
- 1000000 | 977 kB | -977 kB
- 1000000000 | 954 MB | -954 MB
- 1000000000000 | 931 GB | -931 GB
- 1000000000000000 | 909 TB | -909 TB
- 10.5 | 10.5 bytes | -10.5 bytes
- 1000.5 | 1000.5 bytes | -1000.5 bytes
- 1000000.5 | 977 kB | -977 kB
- 1000000000.5 | 954 MB | -954 MB
- 1000000000000.5 | 931 GB | -931 GB
- 1000000000000000.5 | 909 TB | -909 TB
-(12 rows)
+ (1000000000000000000::numeric),
+ (1000000000000000000000::numeric),
+ (1000000000000000000000000::numeric),
+ (1000000000000000000000000000::numeric),
+ (1000000000000000000000000000000::numeric),
+ (10.5::numeric),
+ (1000.5::numeric),
+ (1000000.5::numeric),
+ (1000000000.5::numeric),
+ (1000000000000.5::numeric),
+ (1000000000000000.5::numeric),
+ (1000000000000000000.5::numeric),
+ (1000000000000000000000.5::numeric),
+ (1000000000000000000000000.5::numeric),
+ (1000000000000000000000000000.5::numeric),
+ (1000000000000000000000000000000.5::numeric)
+ ) x(size);
+ size | pg_size_pretty | pg_size_pretty
+-----------------------------------+-----------------+------------------
+ 10 | 10 bytes | -10 bytes
+ 1000 | 1000 bytes | -1000 bytes
+ 1000000 | 977 kB | -977 kB
+ 1000000000 | 954 MB | -954 MB
+ 1000000000000 | 931 GB | -931 GB
+ 1000000000000000 | 909 TB | -909 TB
+ 1000000000000000000 | 888 PB | -888 PB
+ 1000000000000000000000 | 867 EB | -867 EB
+ 1000000000000000000000000 | 867362 EB | -867362 EB
+ 1000000000000000000000000000 | 867361738 EB | -867361738 EB
+ 1000000000000000000000000000000 | 867361737988 EB | -867361737988 EB
+ 10.5 | 10.5 bytes | -10.5 bytes
+ 1000.5 | 1000.5 bytes | -1000.5 bytes
+ 1000000.5 | 977 kB | -977 kB
+ 1000000000.5 | 954 MB | -954 MB
+ 1000000000000.5 | 931 GB | -931 GB
+ 1000000000000000.5 | 909 TB | -909 TB
+ 1000000000000000000.5 | 888 PB | -888 PB
+ 1000000000000000000000.5 | 867 EB | -867 EB
+ 1000000000000000000000000.5 | 867362 EB | -867362 EB
+ 1000000000000000000000000000.5 | 867361738 EB | -867361738 EB
+ 1000000000000000000000000000000.5 | 867361737988 EB | -867361737988 EB
+(22 rows)
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
- ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
- size | pg_size_bytes
-----------+------------------
- 1 | 1
- 123bytes | 123
- 1kB | 1024
- 1MB | 1048576
- 1 GB | 1073741824
- 1.5 GB | 1610612736
- 1TB | 1099511627776
- 3000 TB | 3298534883328000
- 1e6 MB | 1048576000000
-(9 rows)
+ ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB')) x(size);
+ size | pg_size_bytes
+----------+----------------------
+ 1 | 1
+ 123bytes | 123
+ 1kB | 1024
+ 1MB | 1048576
+ 1 GB | 1073741824
+ 1.5 GB | 1610612736
+ 1TB | 1099511627776
+ 3000 TB | 3298534883328000
+ 1e6 MB | 1048576000000
+ 99 PB | 111464090777419776
+ 45 EB | 51881467707308113920
+(11 rows)
-- case-insensitive units are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
- ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
- size | pg_size_bytes
-----------+------------------
- 1 | 1
- 123bYteS | 123
- 1kb | 1024
- 1mb | 1048576
- 1 Gb | 1073741824
- 1.5 gB | 1610612736
- 1tb | 1099511627776
- 3000 tb | 3298534883328000
- 1e6 mb | 1048576000000
-(9 rows)
+ ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB')) x(size);
+ size | pg_size_bytes
+----------+----------------------
+ 1 | 1
+ 123bYteS | 123
+ 1kb | 1024
+ 1mb | 1048576
+ 1 Gb | 1073741824
+ 1.5 gB | 1610612736
+ 1tb | 1099511627776
+ 3000 tb | 3298534883328000
+ 1e6 mb | 1048576000000
+ 99 pb | 111464090777419776
+ 45 eB | 51881467707308113920
+(11 rows)
-- negative numbers are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
- ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
+ ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb')) x(size);
size | pg_size_bytes
-----------+-------------------
-1 | -1
@@ -83,7 +114,8 @@ SELECT size, pg_size_bytes(size) FROM
-1tb | -1099511627776
-3000 TB | -3298534883328000
-10e-1 MB | -1048576
-(9 rows)
+ -19e-4eb | -2190550858753009
+(10 rows)
-- different cases with allowed points
SELECT size, pg_size_bytes(size) FROM
@@ -118,21 +150,21 @@ SELECT pg_size_bytes('1e100');
SELECT pg_size_bytes('1 AB');
ERROR: invalid size: "1 AB"
DETAIL: Invalid size unit: "AB".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
SELECT pg_size_bytes('1 AB A');
ERROR: invalid size: "1 AB A"
DETAIL: Invalid size unit: "AB A".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
SELECT pg_size_bytes('1 AB A ');
ERROR: invalid size: "1 AB A "
DETAIL: Invalid size unit: "AB A".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
SELECT pg_size_bytes('1e1000000000000000000');
ERROR: value overflows numeric format
SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported
ERROR: invalid size: "1 byte"
DETAIL: Invalid size unit: "byte".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
SELECT pg_size_bytes('');
ERROR: invalid size: ""
SELECT pg_size_bytes('kb');
@@ -150,6 +182,6 @@ ERROR: invalid size: ".+912"
SELECT pg_size_bytes('+912+ kB');
ERROR: invalid size: "+912+ kB"
DETAIL: Invalid size unit: "+ kB".
-HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
SELECT pg_size_bytes('++123 kB');
ERROR: invalid size: "++123 kB"
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index e88184e9c9..cfa993f63f 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -4,26 +4,43 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(1000000000000000::bigint)) x(size);
SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
- (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
- (1000000000::numeric), (1000000000000::numeric),
+ (VALUES (10::numeric),
+ (1000::numeric),
+ (1000000::numeric),
+ (1000000000::numeric),
+ (1000000000000::numeric),
(1000000000000000::numeric),
- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
- (1000000000.5::numeric), (1000000000000.5::numeric),
- (1000000000000000.5::numeric)) x(size);
+ (1000000000000000000::numeric),
+ (1000000000000000000000::numeric),
+ (1000000000000000000000000::numeric),
+ (1000000000000000000000000000::numeric),
+ (1000000000000000000000000000000::numeric),
+ (10.5::numeric),
+ (1000.5::numeric),
+ (1000000.5::numeric),
+ (1000000000.5::numeric),
+ (1000000000000.5::numeric),
+ (1000000000000000.5::numeric),
+ (1000000000000000000.5::numeric),
+ (1000000000000000000000.5::numeric),
+ (1000000000000000000000000.5::numeric),
+ (1000000000000000000000000000.5::numeric),
+ (1000000000000000000000000000000.5::numeric)
+ ) x(size);
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
- ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
+ ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB')) x(size);
-- case-insensitive units are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
- ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
+ ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB')) x(size);
-- negative numbers are supported
SELECT size, pg_size_bytes(size) FROM
(VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
- ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
+ ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb')) x(size);
-- different cases with allowed points
SELECT size, pg_size_bytes(size) FROM
--
2.30.1 (Apple Git-130)
David Christensen <david.christensen@crunchydata.com> writes:
Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
add PB and EB.
Can we really get away with changing the return type? That would
by no stretch of the imagination be free; one could expect breakage
of a few user views, for example.
Independently of that, I'm pretty much -1 on going further than PB.
Even if the additional abbreviations you mention are actually recognized
standards, I think not that many people are familiar with them, and such
input is way more likely to be a typo than intended data.
regards, tom lane
Tom Lane writes:
David Christensen <david.christensen@crunchydata.com> writes:
Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
add PB and EB.Can we really get away with changing the return type? That would
by no stretch of the imagination be free; one could expect breakage
of a few user views, for example.
Hmm, that's a good point, and we can't really make the return type polymorphic (being as there isn't
a source type of the given return value).
Independently of that, I'm pretty much -1 on going further than PB.
Even if the additional abbreviations you mention are actually recognized
standards, I think not that many people are familiar with them, and such
input is way more likely to be a typo than intended data.
If we do go ahead and restrict the expansion to just PB, the return value of pg_size_bytes() would
still support up to 8192 PB before running into range limitations. I assume it's not worth creating
a pg_size_bytes_numeric() with the full range of supported units, but that is presumably an option
as well.
Best,
David
On Thu, 8 Jul 2021 at 01:32, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
Hmm, this looked easy, but...
It occurred to me that there ought to be regression tests for the edge
cases where it steps from one unit to the next. So, in the style of
the existing regression tests, I tried the following:SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
(VALUES (10239::bigint), (10240::bigint),
(10485247::bigint), (10485248::bigint),
(10736893951::bigint), (10736893952::bigint),
(10994579406847::bigint), (10994579406848::bigint),
(11258449312612351::bigint), (11258449312612352::bigint)) x(size);
11258449312612352 | 10240 TB | -10239 TB
Hmm, yeah, I noticed that pg_size_pretty(bigint) vs
pg_size_pretty(numeric) didn't always match when I was testing this
patch, but I was more focused on having my results matching the
unpatched version than I was with making sure bigint and numeric
matched.
I imagine this must date back to 8a1fab36ab. Do you feel like it's
something this patch should fix? I was mostly hoping to keep this
patch about rewriting the code to both make it easier to add new units
and also to make it easier to keep all 3 functions in sync.
It feels like if we're going to fix this negative rounding thing then
we should maybe do it and backpatch a fix then rebase this work on top
of that.
What are your thoughts?
David
On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote:
It feels like if we're going to fix this negative rounding thing then
we should maybe do it and backpatch a fix then rebase this work on top
of that.
Here's a patch which I believe makes pg_size_pretty() and
pg_size_pretty_numeric() match in regards to negative values.
Maybe this plus your regression test would be ok to back-patch?
David
Attachments:
adjust_pg_size_pretty_rounding_for_negative_values.patchapplication/octet-stream; name=adjust_pg_size_pretty_rounding_for_negative_values.patchDownload
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 9c39e7d3b3..daf9149be3 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -31,8 +31,8 @@
#include "utils/relmapper.h"
#include "utils/syscache.h"
-/* Divide by two and round towards positive infinity. */
-#define half_rounded(x) (((x) + ((x) < 0 ? 0 : 1)) / 2)
+/* Divide by two and round away from zero */
+#define half_rounded(x) (((x) + ((x) < 0 ? -1 : 1)) / 2)
/* Return physical size of directory contents, or 0 if dir doesn't exist */
static int64
@@ -542,25 +542,29 @@ pg_size_pretty(PG_FUNCTION_ARGS)
snprintf(buf, sizeof(buf), INT64_FORMAT " bytes", size);
else
{
- size >>= 9; /* keep one extra bit for rounding */
+ /*
+ * We use divide instead of bit shifting so that behavior matches for
+ * both positive and negative size values.
+ */
+ size /= (1 << 9); /* keep one extra bit for rounding */
if (Abs(size) < limit2)
snprintf(buf, sizeof(buf), INT64_FORMAT " kB",
half_rounded(size));
else
{
- size >>= 10;
+ size /= (1 << 10);
if (Abs(size) < limit2)
snprintf(buf, sizeof(buf), INT64_FORMAT " MB",
half_rounded(size));
else
{
- size >>= 10;
+ size /= (1 << 10);
if (Abs(size) < limit2)
snprintf(buf, sizeof(buf), INT64_FORMAT " GB",
half_rounded(size));
else
{
- size >>= 10;
+ size /= (1 << 10);
snprintf(buf, sizeof(buf), INT64_FORMAT " TB",
half_rounded(size));
}
On Thu, 8 Jul 2021 at 05:30, David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote:
It feels like if we're going to fix this negative rounding thing then
we should maybe do it and backpatch a fix then rebase this work on top
of that.
Yes, that was my thinking too.
Here's a patch which I believe makes pg_size_pretty() and
pg_size_pretty_numeric() match in regards to negative values.
LGTM, except I think it's worth also making the numeric code not refer
to bit shifting either.
Maybe this plus your regression test would be ok to back-patch?
+1
Here's an update with matching updates to the numeric code, plus the
regression tests.
Regards,
Dean
Attachments:
adjust_pg_size_pretty_rounding_for_negative_values-v2.patchtext/x-patch; charset=US-ASCII; name=adjust_pg_size_pretty_rounding_for_negative_values-v2.patchDownload
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
new file mode 100644
index 9c39e7d..d03e103
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -31,8 +31,8 @@
#include "utils/relmapper.h"
#include "utils/syscache.h"
-/* Divide by two and round towards positive infinity. */
-#define half_rounded(x) (((x) + ((x) < 0 ? 0 : 1)) / 2)
+/* Divide by two and round away from zero */
+#define half_rounded(x) (((x) + ((x) < 0 ? -1 : 1)) / 2)
/* Return physical size of directory contents, or 0 if dir doesn't exist */
static int64
@@ -542,25 +542,29 @@ pg_size_pretty(PG_FUNCTION_ARGS)
snprintf(buf, sizeof(buf), INT64_FORMAT " bytes", size);
else
{
- size >>= 9; /* keep one extra bit for rounding */
+ /*
+ * We use divide instead of bit shifting so that behavior matches for
+ * both positive and negative size values.
+ */
+ size /= (1 << 9); /* keep one extra bit for rounding */
if (Abs(size) < limit2)
snprintf(buf, sizeof(buf), INT64_FORMAT " kB",
half_rounded(size));
else
{
- size >>= 10;
+ size /= (1 << 10);
if (Abs(size) < limit2)
snprintf(buf, sizeof(buf), INT64_FORMAT " MB",
half_rounded(size));
else
{
- size >>= 10;
+ size /= (1 << 10);
if (Abs(size) < limit2)
snprintf(buf, sizeof(buf), INT64_FORMAT " GB",
half_rounded(size));
else
{
- size >>= 10;
+ size /= (1 << 10);
snprintf(buf, sizeof(buf), INT64_FORMAT " TB",
half_rounded(size));
}
@@ -621,13 +625,13 @@ numeric_half_rounded(Numeric n)
}
static Numeric
-numeric_shift_right(Numeric n, unsigned count)
+numeric_truncated_divide(Numeric n, int64 divisor)
{
Datum d = NumericGetDatum(n);
Datum divisor_numeric;
Datum result;
- divisor_numeric = NumericGetDatum(int64_to_numeric(((int64) 1) << count));
+ divisor_numeric = NumericGetDatum(int64_to_numeric(divisor));
result = DirectFunctionCall2(numeric_div_trunc, d, divisor_numeric);
return DatumGetNumeric(result);
}
@@ -650,8 +654,8 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
else
{
/* keep one extra bit for rounding */
- /* size >>= 9 */
- size = numeric_shift_right(size, 9);
+ /* size /= (1 << 9) */
+ size = numeric_truncated_divide(size, 1 << 9);
if (numeric_is_less(numeric_absolute(size), limit2))
{
@@ -660,8 +664,9 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
}
else
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
+ /* size /= (1 << 10) */
+ size = numeric_truncated_divide(size, 1 << 10);
+
if (numeric_is_less(numeric_absolute(size), limit2))
{
size = numeric_half_rounded(size);
@@ -669,8 +674,8 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
}
else
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
+ /* size /= (1 << 10) */
+ size = numeric_truncated_divide(size, 1 << 10);
if (numeric_is_less(numeric_absolute(size), limit2))
{
@@ -679,8 +684,8 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
}
else
{
- /* size >>= 10 */
- size = numeric_shift_right(size, 10);
+ /* size /= (1 << 10) */
+ size = numeric_truncated_divide(size, 1 << 10);
size = numeric_half_rounded(size);
result = psprintf("%s TB", numeric_to_cstring(size));
}
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
new file mode 100644
index e901a2c..29804ae
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -35,6 +35,48 @@ SELECT size, pg_size_pretty(size), pg_si
1000000000000000.5 | 909 TB | -909 TB
(12 rows)
+-- test where units change up
+SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
+ (VALUES (10239::bigint), (10240::bigint),
+ (10485247::bigint), (10485248::bigint),
+ (10736893951::bigint), (10736893952::bigint),
+ (10994579406847::bigint), (10994579406848::bigint),
+ (11258449312612351::bigint), (11258449312612352::bigint)) x(size);
+ size | pg_size_pretty | pg_size_pretty
+-------------------+----------------+----------------
+ 10239 | 10239 bytes | -10239 bytes
+ 10240 | 10 kB | -10 kB
+ 10485247 | 10239 kB | -10239 kB
+ 10485248 | 10 MB | -10 MB
+ 10736893951 | 10239 MB | -10239 MB
+ 10736893952 | 10 GB | -10 GB
+ 10994579406847 | 10239 GB | -10239 GB
+ 10994579406848 | 10 TB | -10 TB
+ 11258449312612351 | 10239 TB | -10239 TB
+ 11258449312612352 | 10240 TB | -10240 TB
+(10 rows)
+
+SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
+ (VALUES (10239::numeric), (10240::numeric),
+ (10485247::numeric), (10485248::numeric),
+ (10736893951::numeric), (10736893952::numeric),
+ (10994579406847::numeric), (10994579406848::numeric),
+ (11258449312612351::numeric), (11258449312612352::numeric)) x(size);
+ size | pg_size_pretty | pg_size_pretty
+-------------------+----------------+----------------
+ 10239 | 10239 bytes | -10239 bytes
+ 10240 | 10 kB | -10 kB
+ 10485247 | 10239 kB | -10239 kB
+ 10485248 | 10 MB | -10 MB
+ 10736893951 | 10239 MB | -10239 MB
+ 10736893952 | 10 GB | -10 GB
+ 10994579406847 | 10239 GB | -10239 GB
+ 10994579406848 | 10 TB | -10 TB
+ 11258449312612351 | 10239 TB | -10239 TB
+ 11258449312612352 | 10240 TB | -10240 TB
+(10 rows)
+
+-- pg_size_bytes() tests
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
new file mode 100644
index d10a4d7..6a45c5e
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -11,6 +11,22 @@ SELECT size, pg_size_pretty(size), pg_si
(1000000000.5::numeric), (1000000000000.5::numeric),
(1000000000000000.5::numeric)) x(size);
+-- test where units change up
+SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
+ (VALUES (10239::bigint), (10240::bigint),
+ (10485247::bigint), (10485248::bigint),
+ (10736893951::bigint), (10736893952::bigint),
+ (10994579406847::bigint), (10994579406848::bigint),
+ (11258449312612351::bigint), (11258449312612352::bigint)) x(size);
+
+SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
+ (VALUES (10239::numeric), (10240::numeric),
+ (10485247::numeric), (10485248::numeric),
+ (10736893951::numeric), (10736893952::numeric),
+ (10994579406847::numeric), (10994579406848::numeric),
+ (11258449312612351::numeric), (11258449312612352::numeric)) x(size);
+
+-- pg_size_bytes() tests
SELECT size, pg_size_bytes(size) FROM
(VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
On Thu, 8 Jul 2021 at 07:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Christensen <david.christensen@crunchydata.com> writes:
Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
add PB and EB.Can we really get away with changing the return type? That would
by no stretch of the imagination be free; one could expect breakage
of a few user views, for example.
That's a good point. We should probably leave it alone then. I had
had it in mind that it might be ok since we did this for extract() in
14. At least we have date_part() as a backup there. I'm fine to leave
the return value of pg_size_bytes as-is.
Independently of that, I'm pretty much -1 on going further than PB.
Even if the additional abbreviations you mention are actually recognized
standards, I think not that many people are familiar with them, and such
input is way more likely to be a typo than intended data.
I'm fine with that too. In [1]/messages/by-id/CAApHDvp9ym+RSQNGoSRPjH+j6TJ1tFBhfT+JoLFf_RbZq1EszQ@mail.gmail.com I mentioned my concerns with adding
all the defined units up to Yottabyte. David reduced that down to just
exabytes, but I think if we're keeping pg_size_bytes returning bigint
then drawing the line at PB seems ok to me. Anything more than
pg_size_bytes('8 EB') would overflow.
David
[1]: /messages/by-id/CAApHDvp9ym+RSQNGoSRPjH+j6TJ1tFBhfT+JoLFf_RbZq1EszQ@mail.gmail.com
tOn Thu, 8 Jul 2021 at 20:23, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote:
Here's a patch which I believe makes pg_size_pretty() and
pg_size_pretty_numeric() match in regards to negative values.LGTM, except I think it's worth also making the numeric code not refer
to bit shifting either.Maybe this plus your regression test would be ok to back-patch?
+1
Here's an update with matching updates to the numeric code, plus the
regression tests.
Looks good.
I gave it a bit of exercise by running pgbench and calling this procedure:
CREATE OR REPLACE PROCEDURE public.test_size_pretty2()
LANGUAGE plpgsql
AS $procedure$
declare b bigint;
begin
FOR i IN 1..1000 LOOP
b := 0 - (random() * 9223372036854775807)::bigint;
if pg_size_pretty(b) <> pg_size_pretty(b::numeric) then
raise notice '%. % != %', b,
pg_size_pretty(b), pg_size_pretty(b::numeric);
end if;
END LOOP;
END;
$procedure$
It ran 8526956 times, so with the loop that's 8.5 billion random
numbers. No variations between the two functions. I got the same
after removing the 0 - to test positive numbers.
If you like, I can push this in my morning, or if you'd rather do it
yourself, please go ahead.
David
On Thu, 8 Jul 2021 at 14:38, David Rowley <dgrowleyml@gmail.com> wrote:
I gave it a bit of exercise by running pgbench and calling this procedure:
It ran 8526956 times, so with the loop that's 8.5 billion random
numbers. No variations between the two functions. I got the same
after removing the 0 - to test positive numbers.
Wow, that's a lot of testing! I just tried a few hand-picked edge cases.
If you like, I can push this in my morning, or if you'd rather do it
yourself, please go ahead.
No, I didn't get as much time as I thought I would today, so please go ahead.
Regards,
Dean
On Thu, 8 Jul 2021 at 05:44, David Christensen
<david.christensen@crunchydata.com> wrote:
Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
add PB and EB.
I ended up not changing the return type of pg_size_bytes().
I figured that PB and EB are probably good enough additions at this point, so we can debate whether
to add the others.
Per Tom's concern both with changing the return type of
pg_size_bytes() and his and my concern about going too far adding more
units, I've adjusted your patch to only add petabytes and pushed it.
The maximum range of BIGINT is only 8 exabytes, so the BIGINT version
would never show in exabytes anyway. It would still be measuring in
petabytes at the 64-bit range limit.
After a bit of searching, I found reports that the estimated entire
stored digital data on Earth as of 2020 to be 59 zettabytes, or about
0.06 yottabytes. I feel like we've gone far enough by adding
petabytes today. Maybe that's worth revisiting in a couple of storage
generations. After we're done there, we can start working on the LSN
wraparound code.
David