-Wformat-signedness

Started by Thomas Munroover 5 years ago8 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi hackers,

There're probably mostly harmless, being mostly error and debug
messages and the like, and considering that eg OID parsing tolerates
negative numbers when reading them back in, but for what it's worth:
GCC complains about many %d vs %u type mixups if you build with
$SUBJECT.

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#1)
Re: -Wformat-signedness

On 2020-10-29 22:37, Thomas Munro wrote:

There're probably mostly harmless, being mostly error and debug
messages and the like, and considering that eg OID parsing tolerates
negative numbers when reading them back in, but for what it's worth:
GCC complains about many %d vs %u type mixups if you build with
$SUBJECT.

I had looked into this some time ago. I have dusted off my patch again.
The attached version fixes all warnings for me.

The following are the main categories of issues:

1. enums are unsigned by default in gcc, so all those internal error
messages "unrecognized blah kind: %d" need to be changed to %u.

I have split that into its own patch since it's easily separable. All
the remaining issues are in one patch.

2. Various trickery at the boundary of internal counters that are
unsigned and external functions or views using signed types. These need
another look.

3. Various messages print signed values using %x formats, which need to
be unsigned. These might also need another look.

4. Issues with constants being signed by default. For example, things
like elog(ERROR, "foo is %u but should be %u", somevar, 55) warns
because of the constant. Should be changed to something like 55U for
symmetry, or change the %u to %d. This also reaches into genbki
territory with all the OID constants being generated.

5. Some "surprising" but correct C behavior. For example, unsigned
short is promoted to int (not unsigned int) in variable arguments, so
needs a %d format.

6. Finally, a bunch of uses were just plain wrong and should be corrected.

I haven't found anything that is a really serious bug, but I imagine you
could run into trouble in various ways when you exceed the INT_MAX
value. But then again, if you use up INT_MAX WAL timelines, you
probably have other problems. ;-)

Attachments:

v1-0001-Add-Wformat-signedness.patchtext/plain; charset=UTF-8; name=v1-0001-Add-Wformat-signedness.patch; x-mac-creator=0; x-mac-type=0Download+41-1
v1-0002-Fix-Wformat-signedness-warnings-for-enums.patchtext/plain; charset=UTF-8; name=v1-0002-Fix-Wformat-signedness-warnings-for-enums.patch; x-mac-creator=0; x-mac-type=0Download+124-125
v1-0003-WIP-Fix-remaining-Wformat-signedness-warnings.patchtext/plain; charset=UTF-8; name=v1-0003-WIP-Fix-remaining-Wformat-signedness-warnings.patch; x-mac-creator=0; x-mac-type=0Download+320-320
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#2)
Re: -Wformat-signedness

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

1. enums are unsigned by default in gcc, so all those internal error
messages "unrecognized blah kind: %d" need to be changed to %u.

Do we have reason to think that that is true in every C compiler?
My own preference for this would be to leave the messages as-is
and add explicit "(int)" casts to the arguments. There are some
fraction of these that are like that already.

regards, tom lane

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#3)
Re: -Wformat-signedness

On Tue, Nov 10, 2020 at 4:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

1. enums are unsigned by default in gcc, so all those internal error
messages "unrecognized blah kind: %d" need to be changed to %u.

Do we have reason to think that that is true in every C compiler?
My own preference for this would be to leave the messages as-is
and add explicit "(int)" casts to the arguments. There are some
fraction of these that are like that already.

From experimentation, it seems that GCC enumerator constants are int,
but enum variables are int or signed int depending on whether any
negative values were defined. Valid values have to be representable
as int anyway regardless of what size and signedness a compiler
chooses to use, so yeah, +1 for casting to int.

#5Andy Fan
zhihui.fan1213@gmail.com
In reply to: Peter Eisentraut (#2)
Re: -Wformat-signedness

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Hi,

On 2020-10-29 22:37, Thomas Munro wrote:

There're probably mostly harmless, being mostly error and debug
messages and the like, and considering that eg OID parsing tolerates
negative numbers when reading them back in, but for what it's worth:
GCC complains about many %d vs %u type mixups if you build with
$SUBJECT.

I had looked into this some time ago. I have dusted off my patch
again. The attached version fixes all warnings for me.

When Dean pointed me this thread[1]/messages/by-id/CA+hUKGJNUk434tcsVbs5YUGsujZbveo43QcZeWbv0xPzg9us-A@mail.gmail.com, I was thinking we need to add the
"-Wformat-signedness" and fix all the existing warnning. Then after some
research, it is not such easy and seems we need some agreement first if
we want to fix them.

The following are the main categories of issues:

1. enums are unsigned by default in gcc, so all those internal error
messages "unrecognized blah kind: %d" need to be changed to %u.

IIUC, we have agreed that we should cast enum to int and continue to use
"%d". At least Tom suggested this and Thomas agreed this [1]/messages/by-id/CA+hUKGJNUk434tcsVbs5YUGsujZbveo43QcZeWbv0xPzg9us-A@mail.gmail.com and Peter
didn't raise any opposition.

2. Various trickery at the boundary of internal counters that are
unsigned and external functions or views using signed types. These need
another look.

I also noticed we lack of UNSIGNED INT32/64 SQL type. Changing the
counter to signed looks not good to me as well. This one looks doesn't
have an agreement yet.

3. Various messages print signed values using %x formats, which need to
be unsigned. These might also need another look.

4. Issues with constants being signed by default. For example, things
like elog(ERROR, "foo is %u but should be %u", somevar, 55) warns
because of the constant. Should be changed to something like 55U for
symmetry, or change the %u to %d. This also reaches into genbki
territory with all the OID constants being generated.

5. Some "surprising" but correct C behavior. For example, unsigned
short is promoted to int (not unsigned int) in variable arguments, so
needs a %d format.

6. Finally, a bunch of uses were just plain wrong and should be corrected.

7. __FILE__ in gcc is 'int', but we elog() it with "%u". Should we
change it to "%d"?

I haven't found anything that is a really serious bug, but I imagine you
could run into trouble in various ways when you exceed the INT_MAX
value. But then again, if you use up INT_MAX WAL timelines, you
probably have other problems. ;-)

Me too, just that want some clean code:) But FWIW, "-Wformat-signedness"
is not supported by clang so far, so if people is using clang, they
still can't benefit from this changes. My soluation (I use clang
everyday) is adding a "gcc-checker" for my c file, if I make such
mistake, it can remind me directly.

[0]: /messages/by-id/874j4yl4cj.fsf@163.com
[1]: /messages/by-id/CA+hUKGJNUk434tcsVbs5YUGsujZbveo43QcZeWbv0xPzg9us-A@mail.gmail.com
/messages/by-id/CA+hUKGJNUk434tcsVbs5YUGsujZbveo43QcZeWbv0xPzg9us-A@mail.gmail.com

--
Best Regards
Andy Fan

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Andy Fan (#5)
Re: -Wformat-signedness

On 27.10.24 04:59, Andy Fan wrote:

I haven't found anything that is a really serious bug, but I imagine you
could run into trouble in various ways when you exceed the INT_MAX
value. But then again, if you use up INT_MAX WAL timelines, you
probably have other problems. ;-)

Me too, just that want some clean code:) But FWIW, "-Wformat-signedness"
is not supported by clang so far, so if people is using clang, they
still can't benefit from this changes.

clang 19 supports it now.

My soluation (I use clang
everyday) is adding a "gcc-checker" for my c file, if I make such
mistake, it can remind me directly.

I think it could be useful to set up some better test coverage for
various things overflowing signed integer maximums. For example, maybe
you could hack initdb to advance the OID counter to INT32_MAX+1 or
thereabouts and run the test suites from there. That would also catch
things like inappropriate uses of atoi(), things beyond just the format
strings.

#7Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#6)
Re: -Wformat-signedness

On Tue, Oct 29, 2024 at 07:38:36AM +0100, Peter Eisentraut wrote:

I think it could be useful to set up some better test coverage for various
things overflowing signed integer maximums. For example, maybe you could
hack initdb to advance the OID counter to INT32_MAX+1 or thereabouts and run
the test suites from there. That would also catch things like inappropriate
uses of atoi(), things beyond just the format strings.

Fun. One way to be fancy here would be to force a pg_resetwal
--next-oid in some of the test paths (Cluster.pm and/or pg_regress)
with an environment variable to force the command to trigger across
the board for all the clusters created in the tests. initdb cannot be
used here as the TAP tests reuse a cluster already initdb'd to save
time. No need to touch at pg_regress, either, as we could count on the
pg_regress runs in 002_pg_upgrade.pl and 027_stream_regress.pl.
--
Michael

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#7)
Re: -Wformat-signedness

On 29.10.24 07:51, Michael Paquier wrote:

On Tue, Oct 29, 2024 at 07:38:36AM +0100, Peter Eisentraut wrote:

I think it could be useful to set up some better test coverage for various
things overflowing signed integer maximums. For example, maybe you could
hack initdb to advance the OID counter to INT32_MAX+1 or thereabouts and run
the test suites from there. That would also catch things like inappropriate
uses of atoi(), things beyond just the format strings.

Fun. One way to be fancy here would be to force a pg_resetwal
--next-oid in some of the test paths (Cluster.pm and/or pg_regress)
with an environment variable to force the command to trigger across
the board for all the clusters created in the tests. initdb cannot be
used here as the TAP tests reuse a cluster already initdb'd to save
time. No need to touch at pg_regress, either, as we could count on the
pg_regress runs in 002_pg_upgrade.pl and 027_stream_regress.pl.

I was thinking just compiling with a patch like this:

-#define FirstNormalObjectId        16384
+#define FirstNormalObjectId        ((Oid) INT_MAX + 1)

Already found one bug: pg_checksums --filenode only accepts files up to
INT_MAX.