formatting.c cleanup

Started by Peter Eisentraut6 months ago3 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

The file formatting.c contains some hard to read and understand code.
For the attached patch series, I made a few more or less mechanical
passes over it to modernize the code a bit, renamed some symbols to be
clearer, adjusted some types to make things more self-documenting,
removed some redundant code to make some things more compact. I hope
this helps a bit.

Attachments:

v1-0001-formatting.c-cleanup-Remove-dashes-in-comments.patchtext/plain; charset=UTF-8; name=v1-0001-formatting.c-cleanup-Remove-dashes-in-comments.patchDownload+59-116
v1-0002-formatting.c-cleanup-Move-loop-variables-definiti.patchtext/plain; charset=UTF-8; name=v1-0002-formatting.c-cleanup-Move-loop-variables-definiti.patchDownload+15-31
v1-0003-formatting.c-cleanup-Use-size_t-for-string-length.patchtext/plain; charset=UTF-8; name=v1-0003-formatting.c-cleanup-Use-size_t-for-string-length.patchDownload+48-49
v1-0004-formatting.c-cleanup-Add-some-const-char-qualifie.patchtext/plain; charset=UTF-8; name=v1-0004-formatting.c-cleanup-Add-some-const-char-qualifie.patchDownload+7-8
v1-0005-formatting.c-cleanup-Use-array-syntax-instead-of-.patchtext/plain; charset=UTF-8; name=v1-0005-formatting.c-cleanup-Use-array-syntax-instead-of-.patchDownload+3-4
v1-0006-formatting.c-cleanup-Remove-unnecessary-extra-par.patchtext/plain; charset=UTF-8; name=v1-0006-formatting.c-cleanup-Remove-unnecessary-extra-par.patchDownload+2-3
v1-0007-formatting.c-cleanup-Remove-unnecessary-extra-lin.patchtext/plain; charset=UTF-8; name=v1-0007-formatting.c-cleanup-Remove-unnecessary-extra-lin.patchDownload+12-25
v1-0008-formatting.c-cleanup-Remove-unnecessary-zeroize-m.patchtext/plain; charset=UTF-8; name=v1-0008-formatting.c-cleanup-Remove-unnecessary-zeroize-m.patchDownload+3-20
v1-0009-formatting.c-cleanup-Improve-formatting-of-some-s.patchtext/plain; charset=UTF-8; name=v1-0009-formatting.c-cleanup-Improve-formatting-of-some-s.patchDownload+32-33
v1-0010-formatting.c-cleanup-Change-TmFromChar.clock-fiel.patchtext/plain; charset=UTF-8; name=v1-0010-formatting.c-cleanup-Change-TmFromChar.clock-fiel.patchDownload+5-9
v1-0011-formatting.c-cleanup-Change-several-int-fields-to.patchtext/plain; charset=UTF-8; name=v1-0011-formatting.c-cleanup-Change-several-int-fields-to.patchDownload+46-31
v1-0012-formatting.c-cleanup-Rename-DCH_S_-to-DCH_SUFFIX_.patchtext/plain; charset=UTF-8; name=v1-0012-formatting.c-cleanup-Rename-DCH_S_-to-DCH_SUFFIX_.patchDownload+131-109
v1-0013-formatting.c-cleanup-Change-fill_str-return-type-.patchtext/plain; charset=UTF-8; name=v1-0013-formatting.c-cleanup-Change-fill_str-return-type-.patchDownload+2-4
#2Chao Li
li.evan.chao@gmail.com
In reply to: Peter Eisentraut (#1)
Re: formatting.c cleanup

On Oct 20, 2025, at 17:50, Peter Eisentraut <peter@eisentraut.org> wrote:

The file formatting.c contains some hard to read and understand code. For the attached patch series, I made a few more or less mechanical passes over it to modernize the code a bit, renamed some symbols to be clearer, adjusted some types to make things more self-documenting, removed some redundant code to make some things more compact. I hope this helps a bit.
<v1-0001-formatting.c-cleanup-Remove-dashes-in-comments.patch><v1-0002-formatting.c-cleanup-Move-loop-variables-definiti.patch><v1-0003-formatting.c-cleanup-Use-size_t-for-string-length.patch><v1-0004-formatting.c-cleanup-Add-some-const-char-qualifie.patch><v1-0005-formatting.c-cleanup-Use-array-syntax-instead-of-.patch><v1-0006-formatting.c-cleanup-Remove-unnecessary-extra-par.patch><v1-0007-formatting.c-cleanup-Remove-unnecessary-extra-lin.patch><v1-0008-formatting.c-cleanup-Remove-unnecessary-zeroize-m.patch><v1-0009-formatting.c-cleanup-Improve-formatting-of-some-s.patch><v1-0010-formatting.c-cleanup-Change-TmFromChar.clock-fiel.patch><v1-0011-formatting.c-cleanup-Change-several-int-fields-to.patch><v1-0012-formatting.c-cleanup-Rename-DCH_S_-to-DCH_SUFFIX_.patch><v1-0013-formatting.c-cleanup-Change-fill_str-return-type-.patch>

0001 - Only removes dashes from comments. No logic change. LGTM.

0002 - Moves loop variable into for, which makes function local variable list shorter, and makes it easier to see which variables will still be used after for loop. So I think that improves readability. I searched for missing occurrence, and didn’t find. So, LGTM.

0003 - Replace int with size_t wherever possible. LGTM.

0004 - Changes “char *” to “const char *” wherever possible.

I found some “text *” can be “const text *”:

```
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 4c217d0e825..cc290903bb6 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1043,7 +1043,7 @@ typedef struct NUMProc
                                read_post,              /* to_number - number of dec. digit */
                                read_pre;               /* to_number - number non-dec. digit */
-       char       *number,                     /* string with number   */
+       char       *number,                     /* string with number   */
                           *number_p,           /* pointer to current number position */
                           *inout,                      /* in / out buffer      */
                           *inout_p,            /* pointer to current inout position */
@@ -1110,11 +1110,11 @@ static bool from_char_seq_search(int *dest, const char **src,
                                                                 const char *const *array,
                                                                 char **localized_array, Oid collid,
                                                                 FormatNode *node, Node *escontext);
-static bool do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std,
+static bool do_to_timestamp(text *date_txt, const text *fmt, Oid collid, bool std,
                                                        struct pg_tm *tm, fsec_t *fsec, struct fmt_tz *tz,
                                                        int *fprec, uint32 *flags, Node *escontext);
 static void fill_str(char *str, int c, size_t maxlen);
-static FormatNode *NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree);
+static FormatNode *NUM_cache(int len, NUMDesc *Num, const text *pars_str, bool *shouldFree);
 static char *int_to_roman(int number);
 static int     roman_to_int(NUMProc *Np, size_t input_len);
 static void NUM_prepare_locale(NUMProc *Np);
@@ -3902,7 +3902,7 @@ DCH_cache_fetch(const char *str, bool std)
  * for formatting.
  */
 static text *
-datetime_to_char_body(TmToChar *tmtc, text *fmt, bool is_interval, Oid collid)
+datetime_to_char_body(TmToChar *tmtc, const text *fmt, bool is_interval, Oid collid)
 {
        FormatNode *format;
        char       *fmt_str,
@@ -4394,7 +4394,7 @@ datetime_format_has_tz(const char *fmt_str)
  * struct 'tm', 'fsec', struct 'tz', and 'fprec'.
  */
 static bool
-do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std,
+do_to_timestamp(text *date_txt, const text *fmt, Oid collid, bool std,
                                struct pg_tm *tm, fsec_t *fsec, struct fmt_tz *tz,
                                int *fprec, uint32 *flags, Node *escontext)
 {
@@ -4952,7 +4952,7 @@ NUM_cache_fetch(const char *str)
  * Cache routine for NUM to_char version
  */
 static FormatNode *
-NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree)
+NUM_cache(int len, NUMDesc *Num, const text *pars_str, bool *shouldFree)
 {
        FormatNode *format = NULL;
        char       *str;
```

0005 - Replaces pointer+offset with array syntax. LGMT.

0006 - Removes unnecessary (). LGTM.

0007 - I am not convinced with this change. Combining two constant string into one causes the line too long, exceeding 80 columns.

0008 - Removes some macros. LGTM.

0009 - Improves some structure definition. LGTM.

0010 - Changes TmFromChar.clock from int to bool.

A suggestion is that maybe we can move the new field “clock_12_hour” next to “bool has_tz”, so that size of the structure is reduced by 4 bytes.

0011 - Changes some int to enum. LGTM.

0012 - Renames DCH_S_* to DCH_SUFFIX_*, make the symbols more descriptive. LGTM.

0013 - Changes fill_str() to return void.

```
-static char *
+static void
fill_str(char *str, int c, size_t maxlen)
{
memset(str, c, maxlen);
str[maxlen] = '\0';
- return str;
}
```

This function has no comment, so I am not sure what “maxlen” exactly mean. Usually, we do “str[maxlen-1] = ‘\0’ because maxlen usually implies max length of the buffer. But this function seems to fill maxlen of c into str, then “maxlen” might not be a good name, “count” could be better.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Chao Li (#2)
Re: formatting.c cleanup

On 20.10.25 15:31, Chao Li wrote:

On Oct 20, 2025, at 17:50, Peter Eisentraut <peter@eisentraut.org> wrote:
The file formatting.c contains some hard to read and understand code. For the attached patch series, I made a few more or less mechanical passes over it to modernize the code a bit, renamed some symbols to be clearer, adjusted some types to make things more self-documenting, removed some redundant code to make some things more compact. I hope this helps a bit.

I have committed this patch set. Thanks for checking.

0004 - Changes “char *” to “const char *” wherever possible.

I found some “text *” can be “const text *”:

I added those.

0007 - I am not convinced with this change. Combining two constant string into one causes the line too long, exceeding 80 columns.

We have been moving toward not breaking up string literals, except where
they contain a line break. This makes it easier to grep for error
messages, for example.

0010 - Changes TmFromChar.clock from int to bool.

A suggestion is that maybe we can move the new field “clock_12_hour” next to “bool has_tz”, so that size of the structure is reduced by 4 bytes.

I don't think we need to worry about structure size here. (If we did,
we could change many fields to short int, probably.) The current order
seems pretty logical. So I kept it.

0013 - Changes fill_str() to return void.

```
-static char *
+static void
fill_str(char *str, int c, size_t maxlen)
{
memset(str, c, maxlen);
str[maxlen] = '\0';
- return str;
}
```

This function has no comment, so I am not sure what “maxlen” exactly mean. Usually, we do “str[maxlen-1] = ‘\0’ because maxlen usually implies max length of the buffer. But this function seems to fill maxlen of c into str, then “maxlen” might not be a good name, “count” could be better.

Yes, this is a bit confusing. I added a function comment to explain this.