Avoid overhead with fprintf related functions

Started by Ranier Vilelaover 3 years ago7 messageshackers
Jump to latest
#1Ranier Vilela
ranier.vf@gmail.com

Based on work in [1]/messages/by-id/CAApHDvp2THseLvCc+TcYFBC7FKHpHTs1JyYmd2JghtOVhb5WGA@mail.gmail.com.
According to https://cplusplus.com/reference/cstdio/fprintf/
The use of fprintf is related to the need to generate a string based on a
format, which should be different from "%s".
Since fprintf has overhead when parsing the "format" parameter, plus all
the trouble of checking the va_arg parameters.
I think this is one of the low fruits available and easy to reap.
By replacing fprintf with its equivalents, fputs and fputc,
we avoid overhead and increase security [2]https://stackoverflow.com/questions/20837989/fprintf-stack-buffer-overflow and [3]https://bufferoverflows.net/format-string-vulnerability-what-when-and-how/.

The downside is a huge big churm, which unfortunately will occur.
But, IHMO, I think the advantages are worth it.
Note that behavior remains the same, since fputs and fputc do not change
the expected behavior of fprintf.

A small performance gain is expected, mainly for the client, since there
are several occurrences in some critical places, such as
(usr/src/fe_utils/print.c).

Patch attached.
This pass check-world.

regards,
Ranier Vilela

[1]: /messages/by-id/CAApHDvp2THseLvCc+TcYFBC7FKHpHTs1JyYmd2JghtOVhb5WGA@mail.gmail.com
/messages/by-id/CAApHDvp2THseLvCc+TcYFBC7FKHpHTs1JyYmd2JghtOVhb5WGA@mail.gmail.com
[2]: https://stackoverflow.com/questions/20837989/fprintf-stack-buffer-overflow
https://stackoverflow.com/questions/20837989/fprintf-stack-buffer-overflow
[3]: https://bufferoverflows.net/format-string-vulnerability-what-when-and-how/
https://bufferoverflows.net/format-string-vulnerability-what-when-and-how/

Attachments:

fprintf_fixes.patchapplication/octet-stream; name=fprintf_fixes.patchDownload+255-259
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Ranier Vilela (#1)
Re: Avoid overhead with fprintf related functions

On Fri, Sep 09, 2022 at 10:45:37AM -0300, Ranier Vilela wrote:

Based on work in [1].
According to https://cplusplus.com/reference/cstdio/fprintf/
The use of fprintf is related to the need to generate a string based on a
format, which should be different from "%s".
Since fprintf has overhead when parsing the "format" parameter, plus all
the trouble of checking the va_arg parameters.
I think this is one of the low fruits available and easy to reap.
By replacing fprintf with its equivalents, fputs and fputc,
we avoid overhead and increase security [2] and [3].

The downside is a huge big churm, which unfortunately will occur.
But, IHMO, I think the advantages are worth it.
Note that behavior remains the same, since fputs and fputc do not change
the expected behavior of fprintf.

A small performance gain is expected, mainly for the client, since there
are several occurrences in some critical places, such as
(usr/src/fe_utils/print.c).

I agree with David [0]/messages/by-id/CAApHDvp2THseLvCc+TcYFBC7FKHpHTs1JyYmd2JghtOVhb5WGA@mail.gmail.com. But if you can demonstrate a performance gain,
perhaps it's worth considering a subset of these changes in hot paths.

[0]: /messages/by-id/CAApHDvp2THseLvCc+TcYFBC7FKHpHTs1JyYmd2JghtOVhb5WGA@mail.gmail.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#3Ranier Vilela
ranier.vf@gmail.com
In reply to: Nathan Bossart (#2)
Re: Avoid overhead with fprintf related functions

Em sex., 9 de set. de 2022 às 13:20, Nathan Bossart <
nathandbossart@gmail.com> escreveu:

On Fri, Sep 09, 2022 at 10:45:37AM -0300, Ranier Vilela wrote:

Based on work in [1].
According to https://cplusplus.com/reference/cstdio/fprintf/
The use of fprintf is related to the need to generate a string based on a
format, which should be different from "%s".
Since fprintf has overhead when parsing the "format" parameter, plus all
the trouble of checking the va_arg parameters.
I think this is one of the low fruits available and easy to reap.
By replacing fprintf with its equivalents, fputs and fputc,
we avoid overhead and increase security [2] and [3].

The downside is a huge big churm, which unfortunately will occur.
But, IHMO, I think the advantages are worth it.
Note that behavior remains the same, since fputs and fputc do not change
the expected behavior of fprintf.

A small performance gain is expected, mainly for the client, since there
are several occurrences in some critical places, such as
(usr/src/fe_utils/print.c).

I agree with David [0]. But if you can demonstrate a performance gain,
perhaps it's worth considering a subset of these changes in hot paths.

Simple benchmark with David sort example.

1. make data
create table t (a bigint not null, b bigint not null, c bigint not
null, d bigint not null, e bigint not null, f bigint not null);

insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x; --
10GB!
vacuum freeze t;

2. client run
\timing on
\pset pager off
select * from t limit 1000000;

head:
Time: 418,210 ms
Time: 419,588 ms
Time: 424,713 ms

fprintf patch:
Time: 416,919 ms
Time: 416,246 ms
Time: 416,237 ms

regards,
Ranier Vilela

#4Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#1)
Re: Avoid overhead with fprintf related functions

Em sex., 9 de set. de 2022 às 10:45, Ranier Vilela <ranier.vf@gmail.com>
escreveu:

Based on work in [1].
According to https://cplusplus.com/reference/cstdio/fprintf/
The use of fprintf is related to the need to generate a string based on a
format, which should be different from "%s".
Since fprintf has overhead when parsing the "format" parameter, plus all
the trouble of checking the va_arg parameters.
I think this is one of the low fruits available and easy to reap.
By replacing fprintf with its equivalents, fputs and fputc,
we avoid overhead and increase security [2] and [3].

The downside is a huge big churm, which unfortunately will occur.
But, IHMO, I think the advantages are worth it.
Note that behavior remains the same, since fputs and fputc do not change
the expected behavior of fprintf.

A small performance gain is expected, mainly for the client, since there
are several occurrences in some critical places, such as
(usr/src/fe_utils/print.c).

Patch attached.
This pass check-world.

Rechecked for the hundredth time.
One typo.

regards,
Ranier Vilela

Attachments:

v1-fprintf_fixes.patchtext/x-patch; charset=US-ASCII; name=v1-fprintf_fixes.patchDownload+254-259
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ranier Vilela (#3)
Re: Avoid overhead with fprintf related functions

Ranier Vilela <ranier.vf@gmail.com> writes:

Em sex., 9 de set. de 2022 às 13:20, Nathan Bossart <
nathandbossart@gmail.com> escreveu:

I agree with David [0]. But if you can demonstrate a performance gain,
perhaps it's worth considering a subset of these changes in hot paths.

head:
Time: 418,210 ms
Time: 419,588 ms
Time: 424,713 ms

fprintf patch:
Time: 416,919 ms
Time: 416,246 ms
Time: 416,237 ms

That is most certainly not enough gain to justify a large amount
of code churn. In fact, given that this is probably pretty
platform-dependent and you've checked only one platform, I don't
think I'd call this a sufficient case for even a one-line change.

regards, tom lane

#6Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#5)
Re: Avoid overhead with fprintf related functions

Em sex., 9 de set. de 2022 às 18:53, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

Ranier Vilela <ranier.vf@gmail.com> writes:

Em sex., 9 de set. de 2022 às 13:20, Nathan Bossart <
nathandbossart@gmail.com> escreveu:

I agree with David [0]. But if you can demonstrate a performance gain,
perhaps it's worth considering a subset of these changes in hot paths.

head:
Time: 418,210 ms
Time: 419,588 ms
Time: 424,713 ms

fprintf patch:
Time: 416,919 ms
Time: 416,246 ms
Time: 416,237 ms

That is most certainly not enough gain to justify a large amount
of code churn. In fact, given that this is probably pretty
platform-dependent and you've checked only one platform, I don't
think I'd call this a sufficient case for even a one-line change.

Of course, base these changes not on performance gain, but on correct style
and increased security.
But out-vote is out-vote, case closed.

Regards,
Ranier Vilela

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#5)
Re: Avoid overhead with fprintf related functions

On Fri, Sep 09, 2022 at 05:53:54PM -0400, Tom Lane wrote:

Ranier Vilela <ranier.vf@gmail.com> writes:

Em sex., 9 de set. de 2022 �s 13:20, Nathan Bossart <
nathandbossart@gmail.com> escreveu:

I agree with David [0]. But if you can demonstrate a performance gain,
perhaps it's worth considering a subset of these changes in hot paths.

head:
Time: 418,210 ms
Time: 419,588 ms
Time: 424,713 ms

fprintf patch:
Time: 416,919 ms
Time: 416,246 ms
Time: 416,237 ms

That is most certainly not enough gain to justify a large amount
of code churn. In fact, given that this is probably pretty
platform-dependent and you've checked only one platform, I don't
think I'd call this a sufficient case for even a one-line change.

Agreed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com