[PATCH] Add roman support for to_number function

[1]: https://commitfest.postgresql.org/50/5233/
[2]: /messages/by-id/CAGMVOduAJ9wKqJXBYnmFmEetKxapJxrG3afUwpbOZ6n_dWaUnA@mail.gmail.com

m.sakrejda@gmail.com

over 1 year ago

In reply to: Hunaid Sohail (#1)

Re: [PATCH] Add roman support for to_number function

Thanks for the contribution.

I took a look at the patch, and it works as advertised. It's too late
for the September commitfest, but I took the liberty of registering
your patch for the November CF [1]https://commitfest.postgresql.org/50/5233/. In the course of that, I found an
older thread proposing this feature seven years ago [2]/messages/by-id/CAGMVOduAJ9wKqJXBYnmFmEetKxapJxrG3afUwpbOZ6n_dWaUnA@mail.gmail.com. That patch
was returned with feedback and (as far as I can tell), was not
followed-up on by the author. You may want to review that thread for
feedback; I won't repeat it here.

On Fri, Aug 30, 2024 at 12:22 AM Hunaid Sohail <hunaidpgml@gmail.com> wrote:

While looking at formatting.c file, I noticed a TODO about "add support for roman number to standard number conversion" (https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/formatting.c#L52)

Your patch should also remove the TODO =)

- Is it okay for the function to handle Roman numerals in a case-insensitive way? (e.g., 'XIV', 'xiv', and 'Xiv' are all seen as 14).

The patch in the thread I linked also took a case-insensitive
approach. I did not see any objections to that, and it seems
reasonable to me as well.

- How should we handle Roman numerals with leading or trailing spaces, like ' XIV' or 'MC '? Should we trim the spaces, or would it be better to throw an error in such cases?

I thought we could reference existing to_number behavior here, but
after playing with it a bit, I'm not really sure what that is:

-- single leading space
maciek=# select to_number(' 1', '9');
to_number
-----------
1
(1 row)

-- two leading spaces
maciek=# select to_number(' 1', '9');
ERROR: invalid input syntax for type numeric: " "
-- two leading spaces and template pattern with a decimal
maciek=# select to_number(' 1', '9D9');
to_number
-----------
1
(1 row)

Separately, I also noticed some unusual Roman representations work
with your patch:

postgres=# select to_number('viv', 'RN');
to_number
-----------
9
(1 row)

Is this expected? In contrast, some somewhat common alternative
representations don't work:

postgres=# select to_number('iiii', 'RN');
ERROR: invalid roman numeral

I know this is expected, but is this the behavior we want? If so, we
probably want to reject the former case, too. If not, maybe that one
is okay, too.

I know I've probably offered more questions than answers, but I hope
finding the old thread here is useful.

Thanks,
Maciek

hunaidpgml@gmail.com

over 1 year ago

In reply to: Maciek Sakrejda (#2)

Re: [PATCH] Add roman support for to_number function

On Mon, Sep 2, 2024 at 11:41 PM Maciek Sakrejda <m.sakrejda@gmail.com>
wrote:

Thanks for the contribution.

I took a look at the patch, and it works as advertised. It's too late
for the September commitfest, but I took the liberty of registering
your patch for the November CF [1]. In the course of that, I found an
older thread proposing this feature seven years ago [2]. That patch
was returned with feedback and (as far as I can tell), was not
followed-up on by the author. You may want to review that thread for
feedback; I won't repeat it here.

I submitted the patch on Aug 30 because I read that new patches should be
submitted in CF with "Open" status.

On Fri, Aug 30, 2024 at 12:22 AM Hunaid Sohail <hunaidpgml@gmail.com>
wrote:

While looking at formatting.c file, I noticed a TODO about "add support

for roman number to standard number conversion" (
https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/formatting.c#L52
)

Your patch should also remove the TODO =)

Noted.

- How should we handle Roman numerals with leading or trailing spaces,
like ' XIV' or 'MC '? Should we trim the spaces, or would it be better to
throw an error in such cases?

I thought we could reference existing to_number behavior here, but
after playing with it a bit, I'm not really sure what that is:

-- single leading space
maciek=# select to_number(' 1', '9');
to_number
-----------
1
(1 row)

-- two leading spaces
maciek=# select to_number(' 1', '9');
ERROR: invalid input syntax for type numeric: " "
-- two leading spaces and template pattern with a decimal
maciek=# select to_number(' 1', '9D9');
to_number
-----------
1
(1 row)

Yes, you are right. I can't understand the behaviour of trailing spaces
too. Trailing spaces are ignored (doesn't matter how many spaces) but
leading spaces are ignored if there is 1 leading space. For more leading
spaces, error is returned.
A few cases of trailing spaces.
postgres=# select to_number(' 1', '9');
ERROR: invalid input syntax for type numeric: " "
postgres=# select to_number('1 ', '9');
to_number
-----------
1
(1 row)

postgres=# select to_number('1 ', '9');
to_number
-----------
1
(1 row)

Separately, I also noticed some unusual Roman representations work

with your patch:

postgres=# select to_number('viv', 'RN');
to_number
-----------
9
(1 row)

Is this expected? In contrast, some somewhat common alternative
representations don't work:

postgres=# select to_number('iiii', 'RN');
ERROR: invalid roman numeral

I know this is expected, but is this the behavior we want? If so, we
probably want to reject the former case, too. If not, maybe that one
is okay, too.

Yes, 'viv' is invalid. Thanks for pointing this out. Also, found a few
other invalid cases like 'lxl' or 'dcd'. I will fix them in the next patch.
Thank you for your feedback.

Regards,
Hunaid Sohail

maciek@pganalyze.com

over 1 year ago

In reply to: Hunaid Sohail (#3)

Re: [PATCH] Add roman support for to_number function

On Tue, Sep 3, 2024 at 6:29 AM Hunaid Sohail <hunaidpgml@gmail.com> wrote:

I submitted the patch on Aug 30 because I read that new patches should be submitted in CF with "Open" status.

Oh my bad! I missed that you had submitted it to the September CF:
https://commitfest.postgresql.org/49/5221/

I don't see a way to just delete CF entries, so I've marked the
November one that I created as Withdrawn.

I'll add myself as reviewer to the September CF entry.

Thanks,
Maciek

hunaidpgml@gmail.com

over 1 year ago

In reply to: Maciek Sakrejda (#4)

Re: [PATCH] Add roman support for to_number function

Hi,

I have attached a new patch v2 with following changes:

- Handled invalid cases like 'viv', 'lxl', and 'dcd'.
- Changed errcode to 22P07 because 22P06 was already taken.
- Removed TODO.
- Added a few positive & negative tests.
- Updated documentation.

Looking forward to your feedback.

Regards,
Hunaid Sohail

On Tue, Sep 3, 2024 at 8:47 PM Maciek Sakrejda <maciek@pganalyze.com> wrote:

Show quoted text

On Tue, Sep 3, 2024 at 6:29 AM Hunaid Sohail <hunaidpgml@gmail.com> wrote:

I submitted the patch on Aug 30 because I read that new patches should

be submitted in CF with "Open" status.

Oh my bad! I missed that you had submitted it to the September CF:
https://commitfest.postgresql.org/49/5221/

I don't see a way to just delete CF entries, so I've marked the
November one that I created as Withdrawn.

I'll add myself as reviewer to the September CF entry.

Thanks,
Maciek

Aleksander Alekseev

aleksander@timescale.com

over 1 year ago

In reply to: Hunaid Sohail (#5)

Re: [PATCH] Add roman support for to_number function

Hi,

I have attached a new patch v2 with following changes:

- Handled invalid cases like 'viv', 'lxl', and 'dcd'.
- Changed errcode to 22P07 because 22P06 was already taken.
- Removed TODO.
- Added a few positive & negative tests.
- Updated documentation.

Looking forward to your feedback.

While playing with the patch I noticed that to_char(..., 'RN') doesn't
seem to be test-covered. I suggest adding the following test:

```
WITH rows AS (
SELECT i, to_char(i, 'FMRN') AS roman
FROM generate_series(1, 3999) AS i
) SELECT bool_and(to_number(roman, 'RN') = i) FROM rows;

bool_and
----------
t
```

... in order to fix this while on it. The query takes ~12 ms on my laptop.

--
Best regards,
Aleksander Alekseev

hunaidpgml@gmail.com

over 1 year ago

In reply to: Aleksander Alekseev (#6)

Re: [PATCH] Add roman support for to_number function

On Thu, Sep 5, 2024 at 2:41 PM Aleksander Alekseev <aleksander@timescale.com>
wrote:

While playing with the patch I noticed that to_char(..., 'RN') doesn't
seem to be test-covered. I suggest adding the following test:

```
WITH rows AS (
SELECT i, to_char(i, 'FMRN') AS roman
FROM generate_series(1, 3999) AS i
) SELECT bool_and(to_number(roman, 'RN') = i) FROM rows;

bool_and
----------
t
```

I also noticed there are no tests for to_char roman format. The test you
provided covers roman format in both to_char and to_number. I will add it.
Thank you.

Regards,
Hunaid Sohail

hunaidpgml@gmail.com

over 1 year ago

In reply to: Hunaid Sohail (#7)

Re: [PATCH] Add roman support for to_number function

Hi,

On Thu, Sep 5, 2024 at 2:41 PM Aleksander Alekseev <aleksander@timescale.com>

wrote:

While playing with the patch I noticed that to_char(..., 'RN') doesn't
seem to be test-covered. I suggest adding the following test:

```
WITH rows AS (
SELECT i, to_char(i, 'FMRN') AS roman
FROM generate_series(1, 3999) AS i
) SELECT bool_and(to_number(roman, 'RN') = i) FROM rows;

bool_and
----------
t
```

I have added this test in the attached patch (v3). Thank you once again,
Aleksander, for the suggestion.

Regards,
Hunaid Sohail

m.sakrejda@gmail.com

over 1 year ago

In reply to: Hunaid Sohail (#8)

Re: [PATCH] Add roman support for to_number function

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, passed

Tested again, and the patch looks good. It does not accept leading or trailing whitespace, which seems reasonable, given the unclear behavior of to_number with other format strings. It also rejects less common Roman spellings like "IIII". I don't feel strongly about that one way or the other, but perhaps a test codifying that behavior would be useful to make it clear it's intentional.

I'm marking it RfC.

The new status of this patch is: Ready for Committer

#10

[1]: /messages/by-id/CAD68Dp1GgTeBiA0YVWXpfPCMC7=nqBCLn6+huOkWURcKRnFw5g@mail.gmail.com

m.sakrejda@gmail.com

over 1 year ago

In reply to: Maciek Sakrejda (#9)

Re: [PATCH] Add roman support for to_number function

Sorry, it looks like I failed to accurately log my review in the
review app due to the current broken layout issues [1]/messages/by-id/CAD68Dp1GgTeBiA0YVWXpfPCMC7=nqBCLn6+huOkWURcKRnFw5g@mail.gmail.com. The summary
should be:

make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested (not sure what the spec has to
say about this)
Documentation: tested, passed

#11

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Maciek Sakrejda (#9)

Re: [PATCH] Add roman support for to_number function

Maciek Sakrejda <m.sakrejda@gmail.com> writes:

Tested again, and the patch looks good. It does not accept leading or trailing whitespace, which seems reasonable, given the unclear behavior of to_number with other format strings. It also rejects less common Roman spellings like "IIII". I don't feel strongly about that one way or the other, but perhaps a test codifying that behavior would be useful to make it clear it's intentional.

Yeah, I don't have a strong feeling about that either, but probably
being strict is better. to_number has a big problem with "garbage in
garbage out" already, and being lax will make that worse.

A few notes from a quick read of the patch:

* roman_to_int() should have a header comment, notably explaining its
result convention. I find it fairly surprising that "0" means an
error --- I realize that Roman notation can't represent zero, but
wouldn't it be better to use -1?

* Do *not* rely on toupper(). There are multiple reasons why not,
but in this context the worst one is that in Turkish locales it's
likely to translate "i" to "İ", on which you will fail. I'd use
pg_ascii_toupper().

* I think roman_to_int() is under-commented internally too.
To start with, where did the magic "15" come from? And why
should we have such a test anyway --- what if the format allows
for trailing stuff after the roman numeral? (That is, I think
you need to fix this so that roman_to_int reports how much data
it ate, instead of assuming that it must read to the end of the
input.) The logic for detecting invalid numeral combinations
feels a little opaque too. Do you have a source for the rules
you're implementing, and if so could you cite it?

* This code will get run through pgindent before commit, so you
might want to revisit whether your comments will still look nice
afterwards. There's not a lot of consistency in them about initial
cap versus lower case or trailing period versus not, too.

* roman_result could be declared where used, no?

* I'm not really convinced that we need a new errcode
ERRCODE_INVALID_ROMAN_NUMERAL rather than using a generic one like
ERRCODE_INVALID_TEXT_REPRESENTATION. However, if we do do it
like that, why did you pick the out-of-sequence value 22P07?

* Might be good to have a few test cases demonstrating what happens
when there's other stuff combined with the RN format spec. Notably,
even if RN itself won't eat whitespace, there had better be a way
to write the format string to allow that.

* Further to Aleksander's point about lack of test coverage for
the to_char direction, I see from
https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html
that almost none of the existing roman-number code paths are covered
today. While it's not strictly within the charter of this patch
to improve that, maybe we should try to do so --- at the very
least it'd raise formatting.c's coverage score a few notches.

regards, tom lane

#12

[1]: /messages/by-id/2956175.1725831136@sss.pgh.pa.us

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Tom Lane (#11)

Re: [PATCH] Add roman support for to_number function

I wrote:

* Further to Aleksander's point about lack of test coverage for
the to_char direction, I see from
https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html
that almost none of the existing roman-number code paths are covered
today. While it's not strictly within the charter of this patch
to improve that, maybe we should try to do so --- at the very
least it'd raise formatting.c's coverage score a few notches.

For the archives' sake: I created a patch and a separate discussion
thread [1]/messages/by-id/2956175.1725831136@sss.pgh.pa.us for that.

regards, tom lane

#13

hunaidpgml@gmail.com

over 1 year ago

In reply to: Tom Lane (#11)

Re: [PATCH] Add roman support for to_number function

Hi,

On Sun, Sep 8, 2024 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

A few notes from a quick read of the patch:

* roman_to_int() should have a header comment, notably explaining its
result convention. I find it fairly surprising that "0" means an
error --- I realize that Roman notation can't represent zero, but
wouldn't it be better to use -1?

Noted.

* Do *not* rely on toupper(). There are multiple reasons why not,
but in this context the worst one is that in Turkish locales it's
likely to translate "i" to "İ", on which you will fail. I'd use
pg_ascii_toupper().

Noted.

* I think roman_to_int() is under-commented internally too.
To start with, where did the magic "15" come from? And why
should we have such a test anyway --- what if the format allows
for trailing stuff after the roman numeral? (That is, I think
you need to fix this so that roman_to_int reports how much data
it ate, instead of assuming that it must read to the end of the
input.)

MMMDCCCLXXXVIII is the longest valid standard roman numeral (15
characters). I will add appropriate comment on length check.

I am not sure I am able to understand the latter part. Could you please
elaborate it?
Are you referring to cases like these?
SELECT to_number('CXXIII foo', 'RN foo');
SELECT to_number('CXXIII foo', 'RN');
Please check my comments on Oracle's behaviour below.

The logic for detecting invalid numeral combinations
feels a little opaque too. Do you have a source for the rules
you're implementing, and if so could you cite it?

There are many sources on the internet.
A few sources:
1. https://www.toppr.com/guides/maths/knowing-our-numbers/roman-numerals/
2. https://www.cuemath.com/numbers/roman-numerals/

Note that we are following the standard form:
https://en.wikipedia.org/wiki/Roman_numerals#Standard_form

* This code will get run through pgindent before commit, so you
might want to revisit whether your comments will still look nice
afterwards. There's not a lot of consistency in them about initial
cap versus lower case or trailing period versus not, too.

Noted.

* roman_result could be declared where used, no?

Noted.

* I'm not really convinced that we need a new errcode
ERRCODE_INVALID_ROMAN_NUMERAL rather than using a generic one like
ERRCODE_INVALID_TEXT_REPRESENTATION. However, if we do do it
like that, why did you pick the out-of-sequence value 22P07?

22P07 is not out-of-sequence. 22P01 to 22P06 are already used.
However, I do agree with you that we can use
ERRCODE_INVALID_TEXT_REPRESENTATION. I will change it.

* Might be good to have a few test cases demonstrating what happens
when there's other stuff combined with the RN format spec. Notably,
even if RN itself won't eat whitespace, there had better be a way
to write the format string to allow that.

The current patch (v3) simply ignores other formats with RN except when RN
is used EEEE which returns error.
```
postgres=# SELECT to_number('CXXIII', 'foo RN');
to_number
-----------
123
(1 row)

postgres=# SELECT to_number('CXXIII', 'MIrn');
to_number
-----------
123
(1 row)

postgres=# SELECT to_number('CXXIII', 'EEEErn');
ERROR: "EEEE" must be the last pattern used
```

However, I think that other formats except FM should be rejected when used
with RN in NUMDesc_prepare function. Any opinions?

If we look into Oracle, they are strict in this regard and don't allow any
other format with RN.
1. SELECT to_char(12, 'MIrn') from dual;
2. SELECT to_char(12, 'foo RN') from dual;
results in error:
ORA-01481: invalid number format model

I also found that there is no check when RN is used twice (both in to_char
and to_number) and that results in unexpected output.

```
postgres=# SELECT to_number('CXXIII', 'RNrn');
to_number
-----------
123
(1 row)

postgres=# SELECT to_char(12, 'RNrn');
to_char
--------------------------------
XII xii
(1 row)

postgres=# SELECT to_char(12, 'rnrn');
to_char
--------------------------------
xii xii
(1 row)

postgres=# SELECT to_char(12, 'FMrnrn');
to_char
---------
xiixii
(1 row)
```

* Further to Aleksander's point about lack of test coverage for
the to_char direction, I see from

https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html
that almost none of the existing roman-number code paths are covered
today. While it's not strictly within the charter of this patch
to improve that, maybe we should try to do so --- at the very
least it'd raise formatting.c's coverage score a few notches.

I see that you have provided a patch to increase test coverage of to_char
roman format including some other formats. I will try to add more tests for
the to_number roman format.

I will provide the next patch soon.

Regards,
Hunaid Sohail

#14

hunaidpgml@gmail.com

over 1 year ago

In reply to: Hunaid Sohail (#13)

Re: [PATCH] Add roman support for to_number function

Hi,

I have started working on the next version of the patch and have addressed
the smaller issues identified by Tom. Before proceeding further, I would
like to have opinions on some comments/questions in my previous email.

Regards,
Hunaid Sohail

On Mon, Sep 9, 2024 at 5:45 PM Hunaid Sohail <hunaidpgml@gmail.com> wrote:

Show quoted text

Hi,

On Sun, Sep 8, 2024 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

A few notes from a quick read of the patch:

* roman_to_int() should have a header comment, notably explaining its
result convention. I find it fairly surprising that "0" means an
error --- I realize that Roman notation can't represent zero, but
wouldn't it be better to use -1?

Noted.

* Do *not* rely on toupper(). There are multiple reasons why not,
but in this context the worst one is that in Turkish locales it's
likely to translate "i" to "İ", on which you will fail. I'd use
pg_ascii_toupper().

Noted.

* I think roman_to_int() is under-commented internally too.
To start with, where did the magic "15" come from? And why
should we have such a test anyway --- what if the format allows
for trailing stuff after the roman numeral? (That is, I think
you need to fix this so that roman_to_int reports how much data
it ate, instead of assuming that it must read to the end of the
input.)

MMMDCCCLXXXVIII is the longest valid standard roman numeral (15
characters). I will add appropriate comment on length check.

I am not sure I am able to understand the latter part. Could you please
elaborate it?
Are you referring to cases like these?
SELECT to_number('CXXIII foo', 'RN foo');
SELECT to_number('CXXIII foo', 'RN');
Please check my comments on Oracle's behaviour below.

The logic for detecting invalid numeral combinations
feels a little opaque too. Do you have a source for the rules
you're implementing, and if so could you cite it?

There are many sources on the internet.
A few sources:
1. https://www.toppr.com/guides/maths/knowing-our-numbers/roman-numerals/
2. https://www.cuemath.com/numbers/roman-numerals/

Note that we are following the standard form:
https://en.wikipedia.org/wiki/Roman_numerals#Standard_form

* This code will get run through pgindent before commit, so you
might want to revisit whether your comments will still look nice
afterwards. There's not a lot of consistency in them about initial
cap versus lower case or trailing period versus not, too.

Noted.

* roman_result could be declared where used, no?

Noted.

* I'm not really convinced that we need a new errcode
ERRCODE_INVALID_ROMAN_NUMERAL rather than using a generic one like
ERRCODE_INVALID_TEXT_REPRESENTATION. However, if we do do it
like that, why did you pick the out-of-sequence value 22P07?

22P07 is not out-of-sequence. 22P01 to 22P06 are already used.
However, I do agree with you that we can use
ERRCODE_INVALID_TEXT_REPRESENTATION. I will change it.

* Might be good to have a few test cases demonstrating what happens
when there's other stuff combined with the RN format spec. Notably,
even if RN itself won't eat whitespace, there had better be a way
to write the format string to allow that.

The current patch (v3) simply ignores other formats with RN except when RN
is used EEEE which returns error.
```
postgres=# SELECT to_number('CXXIII', 'foo RN');
to_number
-----------
123
(1 row)

postgres=# SELECT to_number('CXXIII', 'MIrn');
to_number
-----------
123
(1 row)

postgres=# SELECT to_number('CXXIII', 'EEEErn');
ERROR: "EEEE" must be the last pattern used
```

However, I think that other formats except FM should be rejected when used
with RN in NUMDesc_prepare function. Any opinions?

If we look into Oracle, they are strict in this regard and don't allow any
other format with RN.
1. SELECT to_char(12, 'MIrn') from dual;
2. SELECT to_char(12, 'foo RN') from dual;
results in error:
ORA-01481: invalid number format model

I also found that there is no check when RN is used twice (both in to_char
and to_number) and that results in unexpected output.

```
postgres=# SELECT to_number('CXXIII', 'RNrn');
to_number
-----------
123
(1 row)

postgres=# SELECT to_char(12, 'RNrn');
to_char
--------------------------------
XII xii
(1 row)

postgres=# SELECT to_char(12, 'rnrn');
to_char
--------------------------------
xii xii
(1 row)

postgres=# SELECT to_char(12, 'FMrnrn');
to_char
---------
xiixii
(1 row)
```

* Further to Aleksander's point about lack of test coverage for
the to_char direction, I see from

https://coverage.postgresql.org/src/backend/utils/adt/formatting.c.gcov.html
that almost none of the existing roman-number code paths are covered
today. While it's not strictly within the charter of this patch
to improve that, maybe we should try to do so --- at the very
least it'd raise formatting.c's coverage score a few notches.

I see that you have provided a patch to increase test coverage of to_char
roman format including some other formats. I will try to add more tests for
the to_number roman format.

I will provide the next patch soon.

Regards,
Hunaid Sohail

#15

hunaidpgml@gmail.com

over 1 year ago

In reply to: Hunaid Sohail (#14)

Re: [PATCH] Add roman support for to_number function

Hi,

I have attached a new patch with following changes:
- Addressed minor issues identified by Tom.
- Rejected other formats with RN and updated the docs.
- Added a few more tests.

Regards,
Hunaid Sohail

Show quoted text

#16

hunaidpgml@gmail.com

over 1 year ago

In reply to: Hunaid Sohail (#15)

Re: [PATCH] Add roman support for to_number function

Hi,

I have attached a rebased patch if someone wants to review in the next CF.
No changes as compared to v4.

Regards,
Hunaid Sohail

Show quoted text

#17

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Hunaid Sohail (#16)

Re: [PATCH] Add roman support for to_number function

Hi,

On 10/30/24 08:07, Hunaid Sohail wrote:

Hi,

I have attached a rebased patch if someone wants to review in the next
CF. No changes as compared to v4.

Regards,
Hunaid Sohail

Thanks for a nice patch. I did a quick review today, seems almost there,
I only have a couple minor comments:

1) Template Patterns for Numeric Formatting

Why the wording change? "input between 1 and 3999" sounds fine to me.

2) The docs say "standard numbers" but I'm not sure what that is? We
don't use that term anywhere else, I think. Same for "standard roman
numeral".

3) A couple comments need a bit of formatting. Multi-line comments
should start with an empty line, some lines are a bit too long.

4) I was wondering if the regression tests check all interesting cases,
and it seems none of the tests hit these two cases:

if (vCount > 1 || lCount > 1 || dCount > 1)
return -1;

and

if (!IS_VALID_SUB_COMB(currChar, nextChar))
return -1;

I haven't tried constructing tests to hit those cases, though.

Seems ready to go otherwise.

regards

--
Tomas Vondra

#18

hunaidpgml@gmail.com

over 1 year ago

In reply to: Tomas Vondra (#17)

Re: [PATCH] Add roman support for to_number function

Hi,

Thanks for reviewing this patch.

On Mon, Dec 9, 2024 at 3:07 AM Tomas Vondra <tomas@vondra.me> wrote:

Thanks for a nice patch. I did a quick review today, seems almost there,
I only have a couple minor comments:

1) Template Patterns for Numeric Formatting

Why the wording change? "input between 1 and 3999" sounds fine to me.

input... seemed to refer to numeric input for to_char roman format. But,
after roman support in to_number function shouldn't we modify it? It is the
reason I changed it to "valid for numbers 1 to 3999".

2) The docs say "standard numbers" but I'm not sure what that is? We

don't use that term anywhere else, I think. Same for "standard roman
numeral".

I will change "standard numbers" to "numbers".
Note that we are following the standard form. There are other forms too.
For example, 4 can be represented as IV (standard) or IIII (non standard).
https://en.wikipedia.org/wiki/Roman_numerals#Standard_form

3) A couple comments need a bit of formatting. Multi-line comments
should start with an empty line, some lines are a bit too long.

I will fix the multi-line formatting.

4) I was wondering if the regression tests check all interesting cases,
and it seems none of the tests hit these two cases:

if (vCount > 1 || lCount > 1 || dCount > 1)
return -1;

SELECT to_number('viv', 'RN') test covers this if statement for the
subtraction part. But, I noticed that no test covers simple counts of V, L,
D.
I will add SELECT to_number('VV', 'RN') for that.

and

if (!IS_VALID_SUB_COMB(currChar, nextChar))
return -1;

Noted. SELECT to_number('IL', 'RN') test can be added.

Regards,
Hunaid Sohail

#19

hunaidpgml@gmail.com

over 1 year ago

In reply to: Hunaid Sohail (#18)

Re: [PATCH] Add roman support for to_number function

Hi,

On Mon, Dec 9, 2024 at 3:07 AM Tomas Vondra <tomas@vondra.me> wrote:

Thanks for a nice patch. I did a quick review today, seems almost there,
I only have a couple minor comments:

1) Template Patterns for Numeric Formatting

Why the wording change? "input between 1 and 3999" sounds fine to me.

input... seemed to refer to numeric input for to_char roman format. But,
after roman support in to_number function shouldn't we modify it? It is the
reason I changed it to "valid for numbers 1 to 3999".

2) The docs say "standard numbers" but I'm not sure what that is? We

don't use that term anywhere else, I think. Same for "standard roman
numeral".

I will change "standard numbers" to "numbers".
Note that we are following the standard form. There are other forms too.
For example, 4 can be represented as IV (standard) or IIII (non standard).
https://en.wikipedia.org/wiki/Roman_numerals#Standard_form

3) A couple comments need a bit of formatting. Multi-line comments
should start with an empty line, some lines are a bit too long.

I will fix the multi-line formatting.

4) I was wondering if the regression tests check all interesting cases,
and it seems none of the tests hit these two cases:

if (vCount > 1 || lCount > 1 || dCount > 1)
return -1;

SELECT to_number('viv', 'RN') test covers this if statement for the
subtraction part. But, I noticed that no test covers simple counts of V, L,
D.
I will add SELECT to_number('VV', 'RN') for that.

and

if (!IS_VALID_SUB_COMB(currChar, nextChar))
return -1;

Noted. SELECT to_number('IL', 'RN') test can be added.

I’ve attached a new patch that addresses comments 2, 3, and 4 from Tomas.

Looking forward to your feedback.

Regards,
Hunaid Sohail

#20

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Hunaid Sohail (#19)

Re: [PATCH] Add roman support for to_number function

Hunaid Sohail <hunaidpgml@gmail.com> writes:

I’ve attached a new patch that addresses comments 2, 3, and 4 from Tomas.

I'm still quite unhappy that this doesn't tolerate trailing
whitespace. That's not how other format patterns work, and it makes
it impossible to round-trip the output of to_char(..., 'RN').
I think the core of the problem is that you tried to cram the logic
in where the existing "not implemented" error is thrown. But on
inspection there is nothing sane about that entire "Roman correction"
stanza -- what it does is either useless (zeroing already-zeroed
fields) or in the wrong place (if we don't want to allow other flags
with IS_ROMAN, that should be done via an error in NUMDesc_prepare,
like other similar cases). In the attached I moved the roman_to_int
call into NUM_processor's main loop so it's more like other format
patterns, and fixed it to not eat any more characters than it should.
This might allow putting back some format-combination capabilities,
but other than the whitespace angle I figure we can leave that for
people to request it.

regards, tom lane

#21

hunaidpgml@gmail.com

over 1 year ago

In reply to: Tom Lane (#20)

#22