Should we use MemSet or {0} for struct initialization?

Started by Richard Guoover 2 years ago11 messages
#1Richard Guo
guofenglinux@gmail.com
1 attachment(s)

While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet. I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures? From 9fd45870
it seems that we prefer to {0}. So here is a trivial patch doing that.
And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.

Thanks
Richard

Attachments:

v1-0001-Replace-more-MemSet-calls-with-struct-initialization.patchapplication/octet-stream; name=v1-0001-Replace-more-MemSet-calls-with-struct-initialization.patchDownload
From f6e8262f3f9f33fe73c4cc916c2183998b021e57 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 31 Aug 2023 16:23:24 +0800
Subject: [PATCH v1] Replace more MemSet calls with struct initialization

---
 src/backend/parser/parse_target.c   | 3 +--
 src/backend/utils/adt/pgstatfuncs.c | 8 ++------
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c
index 57247de363..9aaf771913 100644
--- a/src/backend/parser/parse_target.c
+++ b/src/backend/parser/parse_target.c
@@ -1643,10 +1643,9 @@ expandRecordVariable(ParseState *pstate, Var *var, int levelsup)
 					 * in step with varlevelsup in the CTE; furthermore it
 					 * could be an outer CTE.
 					 */
-					ParseState	mypstate;
+					ParseState	mypstate = {0};
 					Index		levelsup;
 
-					MemSet(&mypstate, 0, sizeof(mypstate));
 					/* this loop must work, since GetCTEForRTE did */
 					for (levelsup = 0;
 						 levelsup < rte->ctelevelsup + netlevelsup;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2b9742ad21..586b2d1b21 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -709,15 +709,11 @@ pg_stat_get_backend_subxact(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_SUBXACT_COLS	2
 	TupleDesc	tupdesc;
-	Datum		values[PG_STAT_GET_SUBXACT_COLS];
-	bool		nulls[PG_STAT_GET_SUBXACT_COLS];
+	Datum		values[PG_STAT_GET_SUBXACT_COLS] = {0};
+	bool		nulls[PG_STAT_GET_SUBXACT_COLS] = {0};
 	int32		beid = PG_GETARG_INT32(0);
 	LocalPgBackendStatus *local_beentry;
 
-	/* Initialise values and NULL flags arrays */
-	MemSet(values, 0, sizeof(values));
-	MemSet(nulls, 0, sizeof(nulls));
-
 	/* Initialise attributes information in the tuple descriptor */
 	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_SUBXACT_COLS);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "subxact_count",
-- 
2.31.0

#2Junwang Zhao
zhjwpku@gmail.com
In reply to: Richard Guo (#1)
Re: Should we use MemSet or {0} for struct initialization?

On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote:

While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet. I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures? From 9fd45870
it seems that we prefer to {0}. So here is a trivial patch doing that.
And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.

Thanks
Richard

If the struct has padding or aligned, {0} only guarantee the struct
members initialized to 0, while memset sets the alignment/padding
to 0 as well, but since we will not access the alignment/padding, so
they give the same effect.

I bet {0} should be faster since there is no function call, but I'm not
100% sure ;)

--
Regards
Junwang Zhao

#3John Naylor
john.naylor@enterprisedb.com
In reply to: Junwang Zhao (#2)
Re: Should we use MemSet or {0} for struct initialization?

On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com>

wrote:

While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet. I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures? From 9fd45870
it seems that we prefer to {0}. So here is a trivial patch doing that.

It seems to have been deliberately left that way in the wake of that
commit, see:

/messages/by-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10@enterprisedb.com

(If so, it deserves a comment to keep people from trying to change it...)

And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.

I _believe_ that's harmless to change.

On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

If the struct has padding or aligned, {0} only guarantee the struct
members initialized to 0, while memset sets the alignment/padding
to 0 as well, but since we will not access the alignment/padding, so
they give the same effect.

See above -- if it's used as a hash key, for example, you must clear
everything.

I bet {0} should be faster since there is no function call, but I'm not
100% sure ;)

Neither has a function call. MemSet is a PG macro. You're thinking of
memset, the libc library function, but a decent compiler can easily turn
that into something else for fixed-size inputs.

--
John Naylor
EDB: http://www.enterprisedb.com

#4Junwang Zhao
zhjwpku@gmail.com
In reply to: John Naylor (#3)
Re: Should we use MemSet or {0} for struct initialization?

On Thu, Aug 31, 2023 at 7:07 PM John Naylor
<john.naylor@enterprisedb.com> wrote:

On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com> wrote:

While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet. I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures? From 9fd45870
it seems that we prefer to {0}. So here is a trivial patch doing that.

It seems to have been deliberately left that way in the wake of that commit, see:

/messages/by-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10@enterprisedb.com

(If so, it deserves a comment to keep people from trying to change it...)

And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.

I _believe_ that's harmless to change.

On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

If the struct has padding or aligned, {0} only guarantee the struct
members initialized to 0, while memset sets the alignment/padding
to 0 as well, but since we will not access the alignment/padding, so
they give the same effect.

See above -- if it's used as a hash key, for example, you must clear everything.

Yeah, if memcmp was used as the key comparison function, there is a problem.

I bet {0} should be faster since there is no function call, but I'm not
100% sure ;)

Neither has a function call. MemSet is a PG macro. You're thinking of memset, the libc library function, but a decent compiler can easily turn that into something else for fixed-size inputs.

good to know, thanks.

--
John Naylor
EDB: http://www.enterprisedb.com

--
Regards
Junwang Zhao

#5Richard Guo
guofenglinux@gmail.com
In reply to: John Naylor (#3)
Re: Should we use MemSet or {0} for struct initialization?

On Thu, Aug 31, 2023 at 7:07 PM John Naylor <john.naylor@enterprisedb.com>
wrote:

On Thu, Aug 31, 2023 at 5:34 PM Richard Guo <guofenglinux@gmail.com>

wrote:

While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet. I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures? From

9fd45870

it seems that we prefer to {0}. So here is a trivial patch doing that.

It seems to have been deliberately left that way in the wake of that
commit, see:

/messages/by-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10@enterprisedb.com

(If so, it deserves a comment to keep people from trying to change it...)

Thanks for pointing this out. Yeah, struct initialization does not work
for some cases with padding bits, such as for a hash key we need to
clear the padding too.

The case in expandRecordVariable() mentioned here should be safe though,
maybe this is an omission from 9fd45870?

Thanks
Richard

#6Jelte Fennema
postgres@jeltef.nl
In reply to: Junwang Zhao (#4)
Re: Should we use MemSet or {0} for struct initialization?

On Thu, 31 Aug 2023 at 13:35, Junwang Zhao <zhjwpku@gmail.com> wrote:

If the struct has padding or aligned, {0} only guarantee the struct
members initialized to 0, while memset sets the alignment/padding
to 0 as well, but since we will not access the alignment/padding, so
they give the same effect.

See above -- if it's used as a hash key, for example, you must clear everything.

Yeah, if memcmp was used as the key comparison function, there is a problem.

The C standard says:

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

So if you set any of the fields after a MemSet, the values of the
padding bytes that were set to 0 are now unspecified. It seems much
safer to actually spell out the padding fields of a hash key.

#7John Naylor
john.naylor@enterprisedb.com
In reply to: Jelte Fennema (#6)
Re: Should we use MemSet or {0} for struct initialization?

On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> wrote:

The C standard says:

When a value is stored in an object of structure or union type,

including in a member object, the bytes of the object representation that
correspond to any padding bytes take unspecified values.

So if you set any of the fields after a MemSet, the values of the
padding bytes that were set to 0 are now unspecified. It seems much
safer to actually spell out the padding fields of a hash key.

No, the standard is telling you why you need to memset if consistency of
padding bytes matters.

--
John Naylor
EDB: http://www.enterprisedb.com

#8Chapman Flack
chap@anastigmatix.net
In reply to: John Naylor (#7)
Re: Should we use MemSet or {0} for struct initialization?

On 2023-09-01 09:25, John Naylor wrote:

On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl>
wrote:

The C standard says:

When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object representation that
correspond to any padding bytes take unspecified values.

So if you set any of the fields after a MemSet, the values of the
padding bytes that were set to 0 are now unspecified. It seems much
safer to actually spell out the padding fields of a hash key.

No, the standard is telling you why you need to memset if consistency
of
padding bytes matters.

Um, I'm in no way a language lawyer for recent C specs, but the language
Jelte Fennema quoted is also present in the rather old 9899 TC2 draft
I still have around from years back, and in particular it does say
that upon assignment, padding bytes ▶take◀ unspecified values, not
merely that they retain whatever unspecified values they may have had
before. There is a footnote attached (in 9899 TC2) that says "Thus,
for example, structure assignment need not copy any padding bits."
If that footnote example were normative, it would be reassuring,
because you could assume that padding bits not copied are unchanged
and remember what you originally memset() them to. So that would be
nice. But everything about the form and phrasing of the footnote
conveys that it isn't normative. And the normative text does appear
to be saying that those padding bytes ▶take◀ unspecified values upon,
assignment to the object, even if you may have memset() them before.
Or at least to be saying that's what could happen, in some
implementation
on some architecture, and it would be standard-conformant if it did.

Perhaps there is language elsewhere in the standard that pins it down
to the way you've interpreted it? If you know where that language
is, could you point to it?

Regards,
-Chap

#9Jelte Fennema
postgres@jeltef.nl
In reply to: John Naylor (#7)
Re: Should we use MemSet or {0} for struct initialization?

On Fri, 1 Sept 2023 at 15:25, John Naylor <john.naylor@enterprisedb.com>
wrote:

On Fri, Sep 1, 2023 at 7:48 PM Jelte Fennema <postgres@jeltef.nl> wrote:

The C standard says:

When a value is stored in an object of structure or union type,

including in a member object, the bytes of the object representation that
correspond to any padding bytes take unspecified values.

So if you set any of the fields after a MemSet, the values of the
padding bytes that were set to 0 are now unspecified. It seems much
safer to actually spell out the padding fields of a hash key.

No, the standard is telling you why you need to memset if consistency of

padding bytes matters.

Maybe I'm misunderstanding the sentence from the C standard I quoted. But
under my interpretation it means that even an assignment to a field of a
struct causes the padding bytes to take unspecified (but not undefined)
values, because of the "including in a member object" part of the sentence.
It's ofcourse possible that all compilers relevant to Postgres never
actually change padding when assigning to a field.

#10Peter Eisentraut
peter@eisentraut.org
In reply to: Richard Guo (#1)
Re: Should we use MemSet or {0} for struct initialization?

On 31.08.23 10:32, Richard Guo wrote:

While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet.  I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures?  From 9fd45870
it seems that we prefer to {0}.  So here is a trivial patch doing that.
And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.

The first part (parse_target.c) was already addressed by e0e492e5a9. I
have applied the second part (pgstatfuncs.c).

#11Richard Guo
guofenglinux@gmail.com
In reply to: Peter Eisentraut (#10)
Re: Should we use MemSet or {0} for struct initialization?

On Tue, Sep 19, 2023 at 5:37 PM Peter Eisentraut <peter@eisentraut.org>
wrote:

On 31.08.23 10:32, Richard Guo wrote:

While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet. I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures? From 9fd45870
it seems that we prefer to {0}. So here is a trivial patch doing that.
And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.

The first part (parse_target.c) was already addressed by e0e492e5a9. I
have applied the second part (pgstatfuncs.c).

Thanks for pushing this.

Thanks
Richard