[PATCH] Add function to_oct

Started by Eric Radmanover 3 years ago34 messageshackers
Jump to latest
#1Eric Radman
ericshane@eradman.com

Hello!

This patch is a new function based on the implementation of to_hex(int).

Since support for octal integer literals was added, to_oct(int) allows
octal values to be easily stored and returned in query results.

to_oct(0o755) = '755'

This is probably most useful for storing file system permissions.

--
Eric Radman

Attachments:

add-function-to_oct.patchtext/plain; charset=us-asciiDownload+59-0
#2Ian Lawrence Barwick
barwick@gmail.com
In reply to: Eric Radman (#1)
Re: [PATCH] Add function to_oct

2022年12月21日(水) 7:08 Eric Radman <ericshane@eradman.com>:>

Hello!

This patch is a new function based on the implementation of to_hex(int).

Since support for octal integer literals was added, to_oct(int) allows
octal values to be easily stored and returned in query results.

to_oct(0o755) = '755'

This is probably most useful for storing file system permissions.

Seems like it would be convenient to have. Any reason why there's
no matching "to_oct(bigint)" version?

Patch has been added to the next commitfest [1]https://commitfest.postgresql.org/41/4071/, thanks.

[1]: https://commitfest.postgresql.org/41/4071/

Regards

Ian Barwick

#3Eric Radman
ericshane@eradman.com
In reply to: Ian Lawrence Barwick (#2)
Re: [PATCH] Add function to_oct

On Wed, Dec 21, 2022 at 08:36:40AM +0900, Ian Lawrence Barwick wrote:

2022年12月21日(水) 7:08 Eric Radman <ericshane@eradman.com>:>

Hello!

This patch is a new function based on the implementation of to_hex(int).

Since support for octal integer literals was added, to_oct(int) allows
octal values to be easily stored and returned in query results.

to_oct(0o755) = '755'

This is probably most useful for storing file system permissions.

Seems like it would be convenient to have. Any reason why there's
no matching "to_oct(bigint)" version?

I couldn't think of a reason someone might want an octal
representation of a bigint. Certainly it would be easy to add
if there is value in supporting all of the same argument types.

#4Ian Lawrence Barwick
barwick@gmail.com
In reply to: Eric Radman (#3)
Re: [PATCH] Add function to_oct

2022年12月21日(水) 10:42 Eric Radman <ericshane@eradman.com>:

On Wed, Dec 21, 2022 at 08:36:40AM +0900, Ian Lawrence Barwick wrote:

2022年12月21日(水) 7:08 Eric Radman <ericshane@eradman.com>:>

Hello!

This patch is a new function based on the implementation of to_hex(int).

Since support for octal integer literals was added, to_oct(int) allows
octal values to be easily stored and returned in query results.

to_oct(0o755) = '755'

This is probably most useful for storing file system permissions.

Seems like it would be convenient to have. Any reason why there's
no matching "to_oct(bigint)" version?

I couldn't think of a reason someone might want an octal
representation of a bigint. Certainly it would be easy to add
if there is value in supporting all of the same argument types.

Yeah, I am struggling to think of a practical application other than
symmetry with to_hex().

Regards

Ian Barwick

#5Dag Lem
dag@nimrod.no
In reply to: Ian Lawrence Barwick (#4)
Re: [PATCH] Add function to_oct

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

This is a mini-review of to_oct :-)

The function seems useful to me, and is obviously correct.

I don't know whether optimization at such a low level is considered in PostgreSQL, but here goes.

The calculation of quotient and remainder can be replaced by less costly masking and shifting.

Defining

#define OCT_DIGIT_BITS 3
#define OCT_DIGIT_BITMASK 0x7

the content of the loop can be replaced by

*--ptr = digits[value & OCT_DIGIT_BITMASK];
value >>= OCT_DIGIT_BITS;

Also, the check for ptr > buf in the while loop can be removed. The check is superfluous, since buf cannot possibly be exhausted by a 32 bit integer (the maximum octal number being 37777777777).

Best regards

Dag Lem

The new status of this patch is: Waiting on Author

#6Eric Radman
ericshane@eradman.com
In reply to: Dag Lem (#5)
Re: [PATCH] Add function to_oct

On Thu, Dec 22, 2022 at 10:08:17AM +0000, Dag Lem wrote:

The calculation of quotient and remainder can be replaced by less costly masking and shifting.

Defining

#define OCT_DIGIT_BITS 3
#define OCT_DIGIT_BITMASK 0x7

the content of the loop can be replaced by

*--ptr = digits[value & OCT_DIGIT_BITMASK];
value >>= OCT_DIGIT_BITS;

Also, the check for ptr > buf in the while loop can be removed. The
check is superfluous, since buf cannot possibly be exhausted by a 32
bit integer (the maximum octal number being 37777777777).

I integrated these suggestions in the attached -v2 patch and tested
range of values manually.

Also picked an OID > 8000 as suggested by unused_oids.

..Eric

Attachments:

add-function-to_oct-v2.patchtext/plain; charset=us-asciiDownload+60-0
#7vignesh C
vignesh21@gmail.com
In reply to: Eric Radman (#6)
Re: [PATCH] Add function to_oct

On Thu, 22 Dec 2022 at 23:11, Eric Radman <ericshane@eradman.com> wrote:

On Thu, Dec 22, 2022 at 10:08:17AM +0000, Dag Lem wrote:

The calculation of quotient and remainder can be replaced by less costly masking and shifting.

Defining

#define OCT_DIGIT_BITS 3
#define OCT_DIGIT_BITMASK 0x7

the content of the loop can be replaced by

*--ptr = digits[value & OCT_DIGIT_BITMASK];
value >>= OCT_DIGIT_BITS;

Also, the check for ptr > buf in the while loop can be removed. The
check is superfluous, since buf cannot possibly be exhausted by a 32
bit integer (the maximum octal number being 37777777777).

I integrated these suggestions in the attached -v2 patch and tested
range of values manually.

Also picked an OID > 8000 as suggested by unused_oids.

Few suggestions
1) We could use to_oct instead of to_oct32 as we don't have multiple
implementations for to_oct
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 98d90d9338..fde0b24563 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3687,6 +3687,9 @@
 { oid => '2090', descr => 'convert int8 number to hex',
   proname => 'to_hex', prorettype => 'text', proargtypes => 'int8',
   prosrc => 'to_hex64' },
+{ oid => '8335', descr => 'convert int4 number to oct',
+  proname => 'to_oct', prorettype => 'text', proargtypes => 'int4',
+  prosrc => 'to_oct32' },
2) Similarly we could change this to "to_oct"
+/*
+ * Convert an int32 to a string containing a base 8 (oct) representation of
+ * the number.
+ */
+Datum
+to_oct32(PG_FUNCTION_ARGS)
+{
+       uint32          value = (uint32) PG_GETARG_INT32(0);
3) I'm not sure if  AS "77777777" is required, but I also noticed it
is done similarly in to_hex too:
+--
+-- test to_oct
+--
+select to_oct(256*256*256 - 1) AS "77777777";
+ 77777777
+----------
+ 77777777
+(1 row)

4) You could add a commit message for the patch

Regards,
Vignesh

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: vignesh C (#7)
Re: [PATCH] Add function to_oct

vignesh C <vignesh21@gmail.com> writes:

Few suggestions
1) We could use to_oct instead of to_oct32 as we don't have multiple
implementations for to_oct

That seems (a) shortsighted and (b) inconsistent with the naming
pattern used for to_hex, so I doubt it'd be an improvement.

regards, tom lane

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Eric Radman (#1)
Re: [PATCH] Add function to_oct

On 20.12.22 23:08, Eric Radman wrote:

This patch is a new function based on the implementation of to_hex(int).

Since support for octal integer literals was added, to_oct(int) allows
octal values to be easily stored and returned in query results.

to_oct(0o755) = '755'

This is probably most useful for storing file system permissions.

Note this subsequent discussion about the to_hex function:
/messages/by-id/CAEZATCVbkL1ynqpsKiTDpch34=SCr5nnau=nfNmiy2nM3SJHtw@mail.gmail.com

Also, I think there is no "to binary" function, so perhaps if we're
going down this road one way or the other, we should probably complete
the set.

#10Kirk Wolak
wolakk@gmail.com
In reply to: Peter Eisentraut (#9)
Re: [PATCH] Add function to_oct

On Thu, Feb 23, 2023 at 6:32 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 20.12.22 23:08, Eric Radman wrote:

This patch is a new function based on the implementation of to_hex(int).

Since support for octal integer literals was added, to_oct(int) allows
octal values to be easily stored and returned in query results.

to_oct(0o755) = '755'

This is probably most useful for storing file system permissions.

Note this subsequent discussion about the to_hex function:

/messages/by-id/CAEZATCVbkL1ynqpsKiTDpch34=SCr5nnau=nfNmiy2nM3SJHtw@mail.gmail.com

Also, I think there is no "to binary" function, so perhaps if we're
going down this road one way or the other, we should probably complete
the set.

The code reads clearly. It works as expected (being an old PDP-11

guy!)... And the docs make sense and build as well.
Nothing larger than an int gets in. I was "missing" the bigint version,
but read through ALL of the messages to see (and agree)
That that's okay.
Marked Ready for Committer.

Thanks, Kirk

#11Nathan Bossart
nathandbossart@gmail.com
In reply to: Kirk Wolak (#10)
Re: [PATCH] Add function to_oct

On Tue, Apr 04, 2023 at 08:45:36PM -0400, Kirk Wolak wrote:

Marked Ready for Committer.

I started taking a look at this and ended up adding to_binary() and a
bigint version of to_oct() for completeness. While I was at it, I moved
the base-conversion logic out to a separate static function that
to_binary/oct/hex all use.

From the other discussion referenced upthread, it sounds like we might want
to replace to_binary/oct/hex with a more generic base-conversion function.
Maybe we should try to do that instead.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-add-to_binary-and-to_oct.patchtext/x-diff; charset=us-asciiDownload+148-26
#12Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#11)
Re: [PATCH] Add function to_oct

On Tue, Jul 25, 2023 at 04:24:26PM -0700, Nathan Bossart wrote:

I started taking a look at this and ended up adding to_binary() and a
bigint version of to_oct() for completeness. While I was at it, I moved
the base-conversion logic out to a separate static function that
to_binary/oct/hex all use.

Bleh, this patch seems to fail horribly on 32-bit builds. I'll look into
it soon.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#12)
Re: [PATCH] Add function to_oct

On Tue, Jul 25, 2023 at 05:16:56PM -0700, Nathan Bossart wrote:

On Tue, Jul 25, 2023 at 04:24:26PM -0700, Nathan Bossart wrote:

I started taking a look at this and ended up adding to_binary() and a
bigint version of to_oct() for completeness. While I was at it, I moved
the base-conversion logic out to a separate static function that
to_binary/oct/hex all use.

Bleh, this patch seems to fail horribly on 32-bit builds. I'll look into
it soon.

Here's a new version of the patch with the silly mistakes fixed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-add-to_binary-and-to_oct.patchtext/x-diff; charset=us-asciiDownload+148-26
#14Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#13)
Re: [PATCH] Add function to_oct

On Tue, Jul 25, 2023 at 08:29:17PM -0700, Nathan Bossart wrote:

Here's a new version of the patch with the silly mistakes fixed.

If there are no objections, I'd like to commit this patch soon.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#15Vik Fearing
vik@postgresfriends.org
In reply to: Nathan Bossart (#14)
Re: [PATCH] Add function to_oct

On 8/15/23 06:11, Nathan Bossart wrote:

On Tue, Jul 25, 2023 at 08:29:17PM -0700, Nathan Bossart wrote:

Here's a new version of the patch with the silly mistakes fixed.

If there are no objections, I'd like to commit this patch soon.

I just took a look at this (and the rest of the thread). I am a little
bit disappointed that we don't have a generic base conversion function,
but even if we did I think these specialized functions would still be
useful.

No objection from me.
--
Vik Fearing

#16John Naylor
john.naylor@enterprisedb.com
In reply to: Nathan Bossart (#14)
Re: [PATCH] Add function to_oct

On Tue, Jul 25, 2023 at 05:16:56PM -0700, Nathan Bossart wrote:

[v4]

If we're not going to have a general SQL conversion function, here are some
comments on the current patch.

+static char *convert_to_base(uint64 value, int base);

Not needed if the definition is above the callers.

+ * Workhorse for to_binary, to_oct, and to_hex.  Note that base must be
either
+ * 2, 8, or 16.

Why wouldn't it work with any base <= 16?

- *ptr = '\0';
+ Assert(base == 2 || base == 8 || base == 16);

+ *ptr = '\0';

Spurious whitespace change?

- char buf[32]; /* bigger than needed, but reasonable */
+ char   *buf = palloc(sizeof(uint64) * BITS_PER_BYTE + 1);

Why is this no longer allocated on the stack? Maybe needs a comment about
the size calculation.

+static char *
+convert_to_base(uint64 value, int base)

On my machine this now requires a function call and a DIV instruction, even
though the patch claims not to support anything but a power of two. It's
tiny enough to declare it inline so the compiler can specialize for each
call site.

+{ oid => '5101', descr => 'convert int4 number to binary',

Needs to be over 8000.

--
John Naylor
EDB: http://www.enterprisedb.com

#17Nathan Bossart
nathandbossart@gmail.com
In reply to: Vik Fearing (#15)
Re: [PATCH] Add function to_oct

On Tue, Aug 15, 2023 at 07:58:17AM +0200, Vik Fearing wrote:

On 8/15/23 06:11, Nathan Bossart wrote:

If there are no objections, I'd like to commit this patch soon.

I just took a look at this (and the rest of the thread). I am a little bit
disappointed that we don't have a generic base conversion function, but even
if we did I think these specialized functions would still be useful.

No objection from me.

Thanks for taking a look. I don't mean for this to preclude a generic base
conversion function that would supersede the functions added by this patch.
However, I didn't want to hold up $SUBJECT because of something that may or
may not happen after lots of discussion. If it does happen within the v17
development cycle, we could probably just remove to_oct/binary.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#18Nathan Bossart
nathandbossart@gmail.com
In reply to: John Naylor (#16)
Re: [PATCH] Add function to_oct

On Tue, Aug 15, 2023 at 01:53:25PM +0700, John Naylor wrote:

If we're not going to have a general SQL conversion function, here are some
comments on the current patch.

I appreciate the review.

+static char *convert_to_base(uint64 value, int base);

Not needed if the definition is above the callers.

Done.

+ * Workhorse for to_binary, to_oct, and to_hex.  Note that base must be
either
+ * 2, 8, or 16.

Why wouldn't it work with any base <= 16?

You're right. I changed this in v5.

- *ptr = '\0';
+ Assert(base == 2 || base == 8 || base == 16);

+ *ptr = '\0';

Spurious whitespace change?

I think this might just be a weird artifact of the diff algorithm.

- char buf[32]; /* bigger than needed, but reasonable */
+ char   *buf = palloc(sizeof(uint64) * BITS_PER_BYTE + 1);

Why is this no longer allocated on the stack? Maybe needs a comment about
the size calculation.

It really should be. IIRC I wanted to avoid passing a pre-allocated buffer
to convert_to_base(), but I don't remember why. I fixed this in v5.

+static char *
+convert_to_base(uint64 value, int base)

On my machine this now requires a function call and a DIV instruction, even
though the patch claims not to support anything but a power of two. It's
tiny enough to declare it inline so the compiler can specialize for each
call site.

This was on my list of things to check before committing. I assumed that
it would be automatically inlined, but given your analysis, I went ahead
and added the inline keyword. My compiler took the hint and inlined the
function, and it used SHR instead of DIV instructions. The machine code
for to_hex32/64 is still a couple of instructions longer than before
(probably because of the uint64 casts), but I don't know if we need to
worry about micro-optimizing these functions any further.

+{ oid => '5101', descr => 'convert int4 number to binary',

Needs to be over 8000.

Done.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-add-to_binary-and-to_oct.patchtext/x-diff; charset=us-asciiDownload+163-28
#19John Naylor
john.naylor@enterprisedb.com
In reply to: Nathan Bossart (#18)
Re: [PATCH] Add function to_oct

On Wed, Aug 16, 2023 at 12:17 AM Nathan Bossart <nathandbossart@gmail.com>
wrote:

On Tue, Aug 15, 2023 at 01:53:25PM +0700, John Naylor wrote:

- *ptr = '\0';
+ Assert(base == 2 || base == 8 || base == 16);

+ *ptr = '\0';

Spurious whitespace change?

I think this might just be a weird artifact of the diff algorithm.

Don't believe everything you think. :-)

```
*ptr = '\0';

do
```

to

```
*ptr = '\0';
do
```

- char buf[32]; /* bigger than needed, but reasonable */
+ char   *buf = palloc(sizeof(uint64) * BITS_PER_BYTE + 1);

Why is this no longer allocated on the stack? Maybe needs a comment

about

the size calculation.

It really should be. IIRC I wanted to avoid passing a pre-allocated

buffer

to convert_to_base(), but I don't remember why. I fixed this in v5.

Now I'm struggling to understand why each and every instance has its own
nominal buffer, passed down to the implementation. All we care about is the
result -- is there some reason not to confine the buffer declaration to the
general implementation?

--
John Naylor
EDB: http://www.enterprisedb.com

#20Nathan Bossart
nathandbossart@gmail.com
In reply to: John Naylor (#19)
Re: [PATCH] Add function to_oct

On Wed, Aug 16, 2023 at 10:35:27AM +0700, John Naylor wrote:

```
*ptr = '\0';

do
```

to

```
*ptr = '\0';
do
```

Oh, I misunderstood. I thought you meant that there might be a whitespace
change on that line, not the surrounding ones. This is fixed in v6.

Now I'm struggling to understand why each and every instance has its own
nominal buffer, passed down to the implementation. All we care about is the
result -- is there some reason not to confine the buffer declaration to the
general implementation?

We can do that if we use a static variable, which is what I've done in v6.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-add-to_binary-and-to_oct.patchtext/x-diff; charset=us-asciiDownload+153-25
#21John Naylor
john.naylor@enterprisedb.com
In reply to: Nathan Bossart (#20)
#22Nathan Bossart
nathandbossart@gmail.com
In reply to: John Naylor (#21)
#23John Naylor
john.naylor@enterprisedb.com
In reply to: Nathan Bossart (#22)
#24Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Nathan Bossart (#22)
#25Nathan Bossart
nathandbossart@gmail.com
In reply to: John Naylor (#23)
#26Nathan Bossart
nathandbossart@gmail.com
In reply to: Dean Rasheed (#24)
#27Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Nathan Bossart (#26)
#28Nathan Bossart
nathandbossart@gmail.com
In reply to: Dean Rasheed (#27)
#29Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Nathan Bossart (#28)
#30John Naylor
john.naylor@enterprisedb.com
In reply to: Dean Rasheed (#29)
#31Peter Eisentraut
peter_e@gmx.net
In reply to: Nathan Bossart (#26)
#32Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#31)
#33Peter Eisentraut
peter_e@gmx.net
In reply to: Nathan Bossart (#32)
#34Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#33)