Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Started by Ranier Vilelaalmost 4 years ago53 messageshackers
Jump to latest
#1Ranier Vilela
ranier.vf@gmail.com

Hi hackers,

At function load_relcache_init_file, there is an unnecessary function call,
to initialize pgstat_info pointer to NULL.

MemSet(&rel->pgstat_info, 0, sizeof(rel->pgstat_info));

I think that intention with use of MemSet was:
MemSet(&rel->pgstat_info, 0, sizeof(*rel->pgstat_info));

Initialize with sizeof of Struct size, not with sizeof pointer size.
But so it breaks.

Attached a tiny patch.

regards,
Ranier Vilela

Attachments:

avoid_unecessary_memset_call.patchapplication/octet-stream; name=avoid_unecessary_memset_call.patchDownload+1-1
#2David Rowley
dgrowleyml@gmail.com
In reply to: Ranier Vilela (#1)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

On Sun, 15 May 2022 at 09:47, Ranier Vilela <ranier.vf@gmail.com> wrote:

At function load_relcache_init_file, there is an unnecessary function call,
to initialize pgstat_info pointer to NULL.

MemSet(&rel->pgstat_info, 0, sizeof(rel->pgstat_info));

What seems to have happened here is the field was changed to become a
pointer in 77947c51c. It's not incorrect to use MemSet() to zero out
the pointer field. What it does probably do is confuse the casual
reader into thinking the field is a struct rather than a pointer to
one. It's probably worth making that consistent with the other
fields so nobody gets confused.

Can you add a CF entry for PG16 for this so we come back to it after we branch?

David

#3Ranier Vilela
ranier.vf@gmail.com
In reply to: David Rowley (#2)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em seg., 16 de mai. de 2022 às 20:26, David Rowley <dgrowleyml@gmail.com>
escreveu:

On Sun, 15 May 2022 at 09:47, Ranier Vilela <ranier.vf@gmail.com> wrote:

At function load_relcache_init_file, there is an unnecessary function

call,

to initialize pgstat_info pointer to NULL.

MemSet(&rel->pgstat_info, 0, sizeof(rel->pgstat_info));

What seems to have happened here is the field was changed to become a
pointer in 77947c51c. It's not incorrect to use MemSet() to zero out
the pointer field. What it does probably do is confuse the casual
reader into thinking the field is a struct rather than a pointer to
one. It's probably worth making that consistent with the other
fields so nobody gets confused.

Can you add a CF entry for PG16 for this so we come back to it after we
branch?

Of course.
I will add it.

regards,
Ranier Vilela

#4Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#3)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em ter., 17 de mai. de 2022 às 10:33, Ranier Vilela <ranier.vf@gmail.com>
escreveu:

Em seg., 16 de mai. de 2022 às 20:26, David Rowley <dgrowleyml@gmail.com>
escreveu:

On Sun, 15 May 2022 at 09:47, Ranier Vilela <ranier.vf@gmail.com> wrote:

At function load_relcache_init_file, there is an unnecessary function

call,

to initialize pgstat_info pointer to NULL.

MemSet(&rel->pgstat_info, 0, sizeof(rel->pgstat_info));

What seems to have happened here is the field was changed to become a
pointer in 77947c51c. It's not incorrect to use MemSet() to zero out
the pointer field. What it does probably do is confuse the casual
reader into thinking the field is a struct rather than a pointer to
one. It's probably worth making that consistent with the other
fields so nobody gets confused.

Can you add a CF entry for PG16 for this so we come back to it after we
branch?

Of course.
I will add it.

Created https://commitfest.postgresql.org/38/3640/
However, I would like to add more.
I found, I believe, a serious problem of incorrect usage of the memset api.
Historically, people have relied on using memset or MemSet, using the
variable name as an argument for the sizeof.
While it works correctly, for arrays, when it comes to pointers to
structures, things go awry.

#include <stdio.h>

struct test_t
{
double b;
int a;
char c;
};

typedef struct test_t Test;

int main()
{
Test * my_test;

printf("Sizeof pointer=%u\n", sizeof(my_test));
printf("Sizeof struct=%u\n", sizeof(Test));
}

Output:
Sizeof pointer=8
Sizeof struct=16

So throughout the code there are these misuses.

So, taking advantage of this CF I'm going to add one more big patch, with
suggestions to fix the calls.
This pass vcregress check.

regards,
Ranier Vilela

Attachments:

001-avoid_unecessary_memset_call.patchapplication/octet-stream; name=001-avoid_unecessary_memset_call.patchDownload+1-1
002-fix_api_memset_usage.patchapplication/octet-stream; name=002-fix_api_memset_usage.patchDownload+171-170
#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Ranier Vilela (#4)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

On Tue, May 17, 2022 at 07:52:30PM -0300, Ranier Vilela wrote:

I found, I believe, a serious problem of incorrect usage of the memset api.
Historically, people have relied on using memset or MemSet, using the
variable name as an argument for the sizeof.
While it works correctly, for arrays, when it comes to pointers to
structures, things go awry.

Knowing how sizeof() works is required before using it - the same is true for
pointers.

So throughout the code there are these misuses.

Why do you think it's a misuse ?

Take the first one as an example. It says:

GenericCosts costs;
MemSet(&costs, 0, sizeof(costs));

You sent a patch to change it to sizeof(GenericCosts).

But it's not a pointer, so they are the same.

Is that true for every change in your patch ?

--
Justin

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ranier Vilela (#4)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Ranier Vilela <ranier.vf@gmail.com> writes:

I found, I believe, a serious problem of incorrect usage of the memset api.
Historically, people have relied on using memset or MemSet, using the
variable name as an argument for the sizeof.
While it works correctly, for arrays, when it comes to pointers to
structures, things go awry.

You'll have to convince people that any of these places are in
fact incorrect. Everyone who's used C for any length of time
is well aware of the possibility of getting sizeof() wrong in
this sort of context, and I think we've been careful about it.

Also, as a stylistic matter I think it's best to write
"memset(&x, 0, sizeof(x))" where we can. Replacing sizeof(x)
with sizeof(some type name) has its own risks of error, and
therefore is not automatically an improvement.

regards, tom lane

#7Ranier Vilela
ranier.vf@gmail.com
In reply to: Justin Pryzby (#5)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em ter., 17 de mai. de 2022 às 20:18, Justin Pryzby <pryzby@telsasoft.com>
escreveu:

On Tue, May 17, 2022 at 07:52:30PM -0300, Ranier Vilela wrote:

I found, I believe, a serious problem of incorrect usage of the memset

api.

Historically, people have relied on using memset or MemSet, using the
variable name as an argument for the sizeof.
While it works correctly, for arrays, when it comes to pointers to
structures, things go awry.

Knowing how sizeof() works is required before using it - the same is true
for
pointers.

So throughout the code there are these misuses.

Why do you think it's a misuse ?

Take the first one as an example. It says:

GenericCosts costs;
MemSet(&costs, 0, sizeof(costs));

You sent a patch to change it to sizeof(GenericCosts).

But it's not a pointer, so they are the same.

Is that true for every change in your patch ?

It seems true, sorry.
Thanks Justin for pointing out my big mistake.

I hope this isn't all wasted work, but should I remove the 002 patch.

regards,
Ranier Vilela

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ranier Vilela (#4)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

This one caught my attention:

diff --git a/contrib/pgcrypto/crypt-blowfish.c b/contrib/pgcrypto/crypt-blowfish.c
index a663852ccf..63fcef562d 100644
--- a/contrib/pgcrypto/crypt-blowfish.c
+++ b/contrib/pgcrypto/crypt-blowfish.c
@@ -750,7 +750,7 @@ _crypt_blowfish_rn(const char *key, const char *setting,
 /* Overwrite the most obvious sensitive data we have on the stack. Note
  * that this does not guarantee there's no sensitive data left on the
  * stack and/or in registers; I'm not aware of portable code that does. */
-	px_memset(&data, 0, sizeof(data));
+	px_memset(&data, 0, sizeof(struct data));

return output;
}

The curious thing here is that sizeof(data) is correct, because it
refers to a variable defined earlier in that function, whose type is an
anonymous struct declared there. But I don't know what "struct data"
refers to, precisely because that struct is unnamed. Am I misreading it?

Also:

diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index e1048e47ff..87be62f023 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -601,7 +601,7 @@ pgstathashindex(PG_FUNCTION_ARGS)
 				 errmsg("cannot access temporary indexes of other sessions")));
 	/* Get the information we need from the metapage. */
-	memset(&stats, 0, sizeof(stats));
+	memset(&stats, 0, sizeof(HashIndexStat));
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
 	metap = HashPageGetMeta(BufferGetPage(metabuf));
 	stats.version = metap->hashm_version;

I think the working theory here is that the original line is correct
now, and it continues to be correct if somebody edits the function and
makes variable 'stats' be of a different type. But if you change the
sizeof() to use the type name, then there are two places that you need
to edit, and they are not necessarily close together; so it is correct
now and could become a bug in the future. I don't think we're fully
consistent about this, but I think you're proposing to change it in the
opposite direction that we'd prefer.

For the case where the variable is a pointer, the developer could write
'sizeof(*variable)' instead of being forced to specify the type name,
for example (just a random one):

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index a434cf93ef..e92c03686f 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -438,7 +438,7 @@ BloomFillMetapage(Relation index, Page metaPage)
 	 */
 	BloomInitPage(metaPage, BLOOM_META);
 	metadata = BloomPageGetMeta(metaPage);
-	memset(metadata, 0, sizeof(BloomMetaPageData));
+	memset(metadata, 0, sizeof(*metadata));
 	metadata->magickNumber = BLOOM_MAGICK_NUMBER;
 	metadata->opts = *opts;
 	((PageHeader) metaPage)->pd_lower += sizeof(BloomMetaPageData);

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#9Ranier Vilela
ranier.vf@gmail.com
In reply to: Alvaro Herrera (#8)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em qua., 18 de mai. de 2022 às 05:54, Alvaro Herrera <
alvherre@alvh.no-ip.org> escreveu:

This one caught my attention:

diff --git a/contrib/pgcrypto/crypt-blowfish.c
b/contrib/pgcrypto/crypt-blowfish.c
index a663852ccf..63fcef562d 100644
--- a/contrib/pgcrypto/crypt-blowfish.c
+++ b/contrib/pgcrypto/crypt-blowfish.c
@@ -750,7 +750,7 @@ _crypt_blowfish_rn(const char *key, const char
*setting,
/* Overwrite the most obvious sensitive data we have on the stack. Note
* that this does not guarantee there's no sensitive data left on the
* stack and/or in registers; I'm not aware of portable code that does. */
-       px_memset(&data, 0, sizeof(data));
+       px_memset(&data, 0, sizeof(struct data));

return output;
}

The curious thing here is that sizeof(data) is correct, because it
refers to a variable defined earlier in that function, whose type is an
anonymous struct declared there. But I don't know what "struct data"
refers to, precisely because that struct is unnamed. Am I misreading it?

No, you are right.
This is definitely wrong.

Also:

diff --git a/contrib/pgstattuple/pgstatindex.c
b/contrib/pgstattuple/pgstatindex.c
index e1048e47ff..87be62f023 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -601,7 +601,7 @@ pgstathashindex(PG_FUNCTION_ARGS)
errmsg("cannot access temporary indexes
of other sessions")));
/* Get the information we need from the metapage. */
-       memset(&stats, 0, sizeof(stats));
+       memset(&stats, 0, sizeof(HashIndexStat));
metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ,
LH_META_PAGE);
metap = HashPageGetMeta(BufferGetPage(metabuf));
stats.version = metap->hashm_version;

I think the working theory here is that the original line is correct
now, and it continues to be correct if somebody edits the function and
makes variable 'stats' be of a different type. But if you change the
sizeof() to use the type name, then there are two places that you need
to edit, and they are not necessarily close together; so it is correct
now and could become a bug in the future. I don't think we're fully
consistent about this, but I think you're proposing to change it in the
opposite direction that we'd prefer.

Yes. I think that only advantage using the name of structure is
when you read the line of MemSet, you know what kind type
is filled.

For the case where the variable is a pointer, the developer could write
'sizeof(*variable)' instead of being forced to specify the type name,
for example (just a random one):

Could have used this style to make the patch.
But the intention was to correct a possible misinterpretation,
which in this case, showed that I was totally wrong.

Sorry by the noise.

regards,
Ranier Vilela

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Justin Pryzby (#5)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

On 18.05.22 01:18, Justin Pryzby wrote:

Take the first one as an example. It says:

GenericCosts costs;
MemSet(&costs, 0, sizeof(costs));

You sent a patch to change it to sizeof(GenericCosts).

But it's not a pointer, so they are the same.

This instance can more easily be written as

costs = {0};

#11Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Eisentraut (#10)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em qua., 18 de mai. de 2022 às 10:52, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> escreveu:

On 18.05.22 01:18, Justin Pryzby wrote:

Take the first one as an example. It says:

GenericCosts costs;
MemSet(&costs, 0, sizeof(costs));

You sent a patch to change it to sizeof(GenericCosts).

But it's not a pointer, so they are the same.

This instance can more easily be written as

costs = {0};

That would initialize the content at compilation and not at runtime,
correct?
And we would avoid MemSet/memset altogether.

There are a lot of cases using MemSet (with struct variables) and at
Windows 64 bits, long are 4 (four) bytes.
So I believe that MemSet is less efficient on Windows than on Linux.
"The size of the '_vstart' buffer is not a multiple of the element size of
the type 'long'."
message from PVS-Studio static analysis tool.

regards,
Ranier Vilela

#12David Rowley
dgrowleyml@gmail.com
In reply to: Ranier Vilela (#11)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

On Thu, 19 May 2022 at 02:08, Ranier Vilela <ranier.vf@gmail.com> wrote:

That would initialize the content at compilation and not at runtime, correct?

Your mental model of compilation and run-time might be flawed here.
Here's no such thing as zeroing memory at compile time. There's only
emitting instructions that perform those tasks at run-time.
https://godbolt.org/ might help your understanding.

There are a lot of cases using MemSet (with struct variables) and at Windows 64 bits, long are 4 (four) bytes.
So I believe that MemSet is less efficient on Windows than on Linux.
"The size of the '_vstart' buffer is not a multiple of the element size of the type 'long'."
message from PVS-Studio static analysis tool.

I've been wondering for a while if we really need to have the MemSet()
macro. I see it was added in 8cb415449 (1997). I think compilers have
evolved quite a bit in the past 25 years, so it could be time to
revisit that.

Your comment on the sizeof(long) on win64 is certainly true. I wrote
the attached C program to test the performance difference.

(windows 64-bit)

cl memset.c /Ox
memset 200000000

Running 200000000 loops
MemSet: size 8: 1.833000 seconds
MemSet: size 16: 1.841000 seconds
MemSet: size 32: 1.838000 seconds
MemSet: size 64: 1.851000 seconds
MemSet: size 128: 3.228000 seconds
MemSet: size 256: 5.278000 seconds
MemSet: size 512: 3.943000 seconds
memset: size 8: 0.065000 seconds
memset: size 16: 0.131000 seconds
memset: size 32: 0.262000 seconds
memset: size 64: 0.530000 seconds
memset: size 128: 1.169000 seconds
memset: size 256: 2.950000 seconds
memset: size 512: 3.191000 seconds

It seems like there's no cases there where MemSet is faster than
memset. I was careful to only provide MemSet() with inputs that
result in it not using the memset fallback. I also provided constants
so that the decision about which method to use was known at compile
time.

It's not clear to me why 512 is faster than 256. I saw the same on a repeat run.

Changing "long" to "long long" it looks like:

memset 200000000

Running 200000000 loops
MemSet: size 8: 0.066000 seconds
MemSet: size 16: 1.978000 seconds
MemSet: size 32: 1.982000 seconds
MemSet: size 64: 1.973000 seconds
MemSet: size 128: 1.970000 seconds
MemSet: size 256: 3.225000 seconds
MemSet: size 512: 5.366000 seconds
memset: size 8: 0.069000 seconds
memset: size 16: 0.132000 seconds
memset: size 32: 0.265000 seconds
memset: size 64: 0.527000 seconds
memset: size 128: 1.161000 seconds
memset: size 256: 2.976000 seconds
memset: size 512: 3.179000 seconds

The situation is a little different on my Linux machine:

$ gcc memset.c -o memset -O2
$ ./memset 200000000
Running 200000000 loops
MemSet: size 8: 0.000002 seconds
MemSet: size 16: 0.000000 seconds
MemSet: size 32: 0.094041 seconds
MemSet: size 64: 0.184618 seconds
MemSet: size 128: 1.781503 seconds
MemSet: size 256: 2.547910 seconds
MemSet: size 512: 4.005173 seconds
memset: size 8: 0.046156 seconds
memset: size 16: 0.046123 seconds
memset: size 32: 0.092291 seconds
memset: size 64: 0.184509 seconds
memset: size 128: 1.781518 seconds
memset: size 256: 2.577104 seconds
memset: size 512: 4.004757 seconds

It looks like part of the work might be getting optimised away in the
8-16 MemSet() calls.

clang seems to have the opposite for size 8.

$ clang memset.c -o memset -O2
$ ./memset 200000000
Running 200000000 loops
MemSet: size 8: 0.007653 seconds
MemSet: size 16: 0.005771 seconds
MemSet: size 32: 0.011539 seconds
MemSet: size 64: 0.023095 seconds
MemSet: size 128: 0.046130 seconds
MemSet: size 256: 0.092269 seconds
MemSet: size 512: 0.968564 seconds
memset: size 8: 0.000000 seconds
memset: size 16: 0.005776 seconds
memset: size 32: 0.011559 seconds
memset: size 64: 0.023069 seconds
memset: size 128: 0.046129 seconds
memset: size 256: 0.092243 seconds
memset: size 512: 0.968534 seconds

There does not seem to be any significant reduction in the size of the
binary from changing the MemSet macro to directly use memset. It went
from 9865008 bytes down to 9860800 bytes (4208 bytes less).

David

Attachments:

memset.ctext/plain; charset=US-ASCII; name=memset.cDownload
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#12)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

David Rowley <dgrowleyml@gmail.com> writes:

I've been wondering for a while if we really need to have the MemSet()
macro. I see it was added in 8cb415449 (1997). I think compilers have
evolved quite a bit in the past 25 years, so it could be time to
revisit that.

Yeah, I've thought for awhile that technology has moved on from that.
Nobody's really taken the trouble to measure it though. (And no,
results from one compiler on one machine are not terribly convincing.)

The thing that makes this a bit more difficult than it might be is
the special cases we have for known-aligned and so on targets, which
are particularly critical for palloc0 and makeNode etc. So there's
more than one case to look into. But I'd argue that those special
cases are actually what we want to worry about the most: zeroing
relatively small, known-aligned node structs is THE use case.

regards, tom lane

#14Ranier Vilela
ranier.vf@gmail.com
In reply to: David Rowley (#12)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em qua., 18 de mai. de 2022 às 19:57, David Rowley <dgrowleyml@gmail.com>
escreveu:

On Thu, 19 May 2022 at 02:08, Ranier Vilela <ranier.vf@gmail.com> wrote:

That would initialize the content at compilation and not at runtime,

correct?

Your mental model of compilation and run-time might be flawed here.
Here's no such thing as zeroing memory at compile time. There's only
emitting instructions that perform those tasks at run-time.
https://godbolt.org/ might help your understanding.

There are a lot of cases using MemSet (with struct variables) and at

Windows 64 bits, long are 4 (four) bytes.

So I believe that MemSet is less efficient on Windows than on Linux.
"The size of the '_vstart' buffer is not a multiple of the element size

of the type 'long'."

message from PVS-Studio static analysis tool.

I've been wondering for a while if we really need to have the MemSet()
macro. I see it was added in 8cb415449 (1997). I think compilers have
evolved quite a bit in the past 25 years, so it could be time to
revisit that.

+1
All compilers currently have memset optimized.

Your comment on the sizeof(long) on win64 is certainly true. I wrote
the attached C program to test the performance difference.

(windows 64-bit)

cl memset.c /Ox
memset 200000000

Running 200000000 loops
MemSet: size 8: 1.833000 seconds
MemSet: size 16: 1.841000 seconds
MemSet: size 32: 1.838000 seconds
MemSet: size 64: 1.851000 seconds
MemSet: size 128: 3.228000 seconds
MemSet: size 256: 5.278000 seconds
MemSet: size 512: 3.943000 seconds
memset: size 8: 0.065000 seconds
memset: size 16: 0.131000 seconds
memset: size 32: 0.262000 seconds
memset: size 64: 0.530000 seconds
memset: size 128: 1.169000 seconds
memset: size 256: 2.950000 seconds
memset: size 512: 3.191000 seconds

It seems like there's no cases there where MemSet is faster than
memset. I was careful to only provide MemSet() with inputs that
result in it not using the memset fallback. I also provided constants
so that the decision about which method to use was known at compile
time.

It's not clear to me why 512 is faster than 256.

Probably broken alignment with 256?
Another warning from PVS-Studio:
[1]: https://pvs-studio.com/en/docs/warnings/v1032/

src/contrib/postgres_fdw/connection.c (Line 1690)
MemSet(values, 0, sizeof(values));

I saw the same on a repeat run.

Changing "long" to "long long" it looks like:

memset 200000000

Running 200000000 loops
MemSet: size 8: 0.066000 seconds
MemSet: size 16: 1.978000 seconds
MemSet: size 32: 1.982000 seconds
MemSet: size 64: 1.973000 seconds
MemSet: size 128: 1.970000 seconds
MemSet: size 256: 3.225000 seconds
MemSet: size 512: 5.366000 seconds
memset: size 8: 0.069000 seconds
memset: size 16: 0.132000 seconds
memset: size 32: 0.265000 seconds
memset: size 64: 0.527000 seconds
memset: size 128: 1.161000 seconds
memset: size 256: 2.976000 seconds
memset: size 512: 3.179000 seconds

The situation is a little different on my Linux machine:

$ gcc memset.c -o memset -O2
$ ./memset 200000000
Running 200000000 loops
MemSet: size 8: 0.000002 seconds
MemSet: size 16: 0.000000 seconds
MemSet: size 32: 0.094041 seconds
MemSet: size 64: 0.184618 seconds
MemSet: size 128: 1.781503 seconds
MemSet: size 256: 2.547910 seconds
MemSet: size 512: 4.005173 seconds
memset: size 8: 0.046156 seconds
memset: size 16: 0.046123 seconds
memset: size 32: 0.092291 seconds
memset: size 64: 0.184509 seconds
memset: size 128: 1.781518 seconds
memset: size 256: 2.577104 seconds
memset: size 512: 4.004757 seconds

It looks like part of the work might be getting optimised away in the
8-16 MemSet() calls.

On linux (long) have 8 bytes.
I'm still surprised that MemSet (8/16) is faster.

clang seems to have the opposite for size 8.

$ clang memset.c -o memset -O2
$ ./memset 200000000
Running 200000000 loops
MemSet: size 8: 0.007653 seconds
MemSet: size 16: 0.005771 seconds
MemSet: size 32: 0.011539 seconds
MemSet: size 64: 0.023095 seconds
MemSet: size 128: 0.046130 seconds
MemSet: size 256: 0.092269 seconds
MemSet: size 512: 0.968564 seconds
memset: size 8: 0.000000 seconds
memset: size 16: 0.005776 seconds
memset: size 32: 0.011559 seconds
memset: size 64: 0.023069 seconds
memset: size 128: 0.046129 seconds
memset: size 256: 0.092243 seconds
memset: size 512: 0.968534 seconds

There does not seem to be any significant reduction in the size of the
binary from changing the MemSet macro to directly use memset. It went
from 9865008 bytes down to 9860800 bytes (4208 bytes less).

Anyway I think on Windows 64 bits,
it is very worthwhile to remove the MemSet macro.

regards,
Ranier Vilela

[1]: https://pvs-studio.com/en/docs/warnings/v1032/

#15Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#13)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em qua., 18 de mai. de 2022 às 20:20, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

zeroing
relatively small, known-aligned node structs is THE use case.

Currently, especially on 64-bit Windows, MemSet can break alignment.

regards,
Ranier Vilela

#16Ranier Vilela
ranier.vf@gmail.com
In reply to: David Rowley (#12)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em qua., 18 de mai. de 2022 às 19:57, David Rowley <dgrowleyml@gmail.com>
escreveu:

On Thu, 19 May 2022 at 02:08, Ranier Vilela <ranier.vf@gmail.com> wrote:

That would initialize the content at compilation and not at runtime,

correct?

Your mental model of compilation and run-time might be flawed here.
Here's no such thing as zeroing memory at compile time. There's only
emitting instructions that perform those tasks at run-time.
https://godbolt.org/ might help your understanding.

There are a lot of cases using MemSet (with struct variables) and at

Windows 64 bits, long are 4 (four) bytes.

So I believe that MemSet is less efficient on Windows than on Linux.
"The size of the '_vstart' buffer is not a multiple of the element size

of the type 'long'."

message from PVS-Studio static analysis tool.

I've been wondering for a while if we really need to have the MemSet()
macro. I see it was added in 8cb415449 (1997). I think compilers have
evolved quite a bit in the past 25 years, so it could be time to
revisit that.

Your comment on the sizeof(long) on win64 is certainly true. I wrote
the attached C program to test the performance difference.

(windows 64-bit)

cl memset.c /Ox
memset 200000000

Running 200000000 loops
MemSet: size 8: 1.833000 seconds
MemSet: size 16: 1.841000 seconds
MemSet: size 32: 1.838000 seconds
MemSet: size 64: 1.851000 seconds
MemSet: size 128: 3.228000 seconds
MemSet: size 256: 5.278000 seconds
MemSet: size 512: 3.943000 seconds
memset: size 8: 0.065000 seconds
memset: size 16: 0.131000 seconds
memset: size 32: 0.262000 seconds
memset: size 64: 0.530000 seconds
memset: size 128: 1.169000 seconds
memset: size 256: 2.950000 seconds
memset: size 512: 3.191000 seconds

It seems like there's no cases there where MemSet is faster than
memset. I was careful to only provide MemSet() with inputs that
result in it not using the memset fallback. I also provided constants
so that the decision about which method to use was known at compile
time.

It's not clear to me why 512 is faster than 256. I saw the same on a
repeat run.

Changing "long" to "long long" it looks like:

memset 200000000

Running 200000000 loops
MemSet: size 8: 0.066000 seconds
MemSet: size 16: 1.978000 seconds
MemSet: size 32: 1.982000 seconds
MemSet: size 64: 1.973000 seconds
MemSet: size 128: 1.970000 seconds
MemSet: size 256: 3.225000 seconds
MemSet: size 512: 5.366000 seconds
memset: size 8: 0.069000 seconds
memset: size 16: 0.132000 seconds
memset: size 32: 0.265000 seconds
memset: size 64: 0.527000 seconds
memset: size 128: 1.161000 seconds
memset: size 256: 2.976000 seconds
memset: size 512: 3.179000 seconds

The situation is a little different on my Linux machine:

$ gcc memset.c -o memset -O2
$ ./memset 200000000
Running 200000000 loops
MemSet: size 8: 0.000002 seconds
MemSet: size 16: 0.000000 seconds
MemSet: size 32: 0.094041 seconds
MemSet: size 64: 0.184618 seconds
MemSet: size 128: 1.781503 seconds
MemSet: size 256: 2.547910 seconds
MemSet: size 512: 4.005173 seconds
memset: size 8: 0.046156 seconds
memset: size 16: 0.046123 seconds
memset: size 32: 0.092291 seconds
memset: size 64: 0.184509 seconds
memset: size 128: 1.781518 seconds
memset: size 256: 2.577104 seconds
memset: size 512: 4.004757 seconds

It looks like part of the work might be getting optimised away in the
8-16 MemSet() calls.

clang seems to have the opposite for size 8.

$ clang memset.c -o memset -O2
$ ./memset 200000000
Running 200000000 loops
MemSet: size 8: 0.007653 seconds
MemSet: size 16: 0.005771 seconds
MemSet: size 32: 0.011539 seconds
MemSet: size 64: 0.023095 seconds
MemSet: size 128: 0.046130 seconds
MemSet: size 256: 0.092269 seconds
MemSet: size 512: 0.968564 seconds
memset: size 8: 0.000000 seconds
memset: size 16: 0.005776 seconds
memset: size 32: 0.011559 seconds
memset: size 64: 0.023069 seconds
memset: size 128: 0.046129 seconds
memset: size 256: 0.092243 seconds
memset: size 512: 0.968534 seconds

The results from clang, only reinforce the argument in favor of native
memset.
There is still room for gcc to improve with 8/16 bytes and for sure at some
point they will.
Which will make memset faster on all platforms and compilers.

regards,
Ranier Vilela

#17Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#16)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Taking it a step further.
Created a new patch into commitfest, targeting 16 version.
https://commitfest.postgresql.org/38/3645/

Currently native memset is well optimized on several platforms, including
Windows 64 bits [1]https://msrc-blog.microsoft.com/2021/01/11/building-faster-amd64-memset-routines/.

However, even the native memset has problems,
I redid the David's memset.c test:

C:\usr\src\tests\memset>memset2 2000000000
Running 2000000000 loops
MemSet: size 8: 6.635000 seconds
MemSet: size 16: 6.594000 seconds
MemSet: size 32: 6.694000 seconds
MemSet: size 64: 9.002000 seconds
MemSet: size 128: 10.598000 seconds
MemSet: size 256: 25.061000 seconds
MemSet: size 512: 27.365000 seconds
memset: size 8: 0.594000 seconds
memset: size 16: 0.595000 seconds
memset: size 32: 1.189000 seconds
memset: size 64: 2.378000 seconds
memset: size 128: 4.753000 seconds
memset: size 256: 24.391000 seconds
memset: size 512: 27.064000 seconds

Both MemSet/memset perform very poorly with 256/512.

But, I believe it is worth removing the use of MemSet, because the usage is
empirical and has been mixed with memset in several places in the code,
without any criteria.
Using just memset makes the mental process of using it more simplified and
it seems like there aren't any regressions when removing the use of MemSet.

Windows 10 64 bit
msvc 2019 64 bit
RAM 8GB
SSD 256GB
Postgres (15beta1 with original configuration)

1. pgbench -c 50 -T 300 -S -n -U postgres
HEAD:
pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 1
maximum number of tries: 1
duration: 300 s
number of transactions actually processed: 10448967
number of failed transactions: 0 (0.000%)
latency average = 1.432 ms
initial connection time = 846.186 ms
tps = 34926.861987 (without initial connection time)

PATCHED (without MemSet)
pgbench (15beta1)
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 1
maximum number of tries: 1
duration: 300 s
number of transactions actually processed: 10655332
number of failed transactions: 0 (0.000%)
latency average = 1.404 ms
initial connection time = 866.203 ms
tps = 35621.045750 (without initial connection time)

2.
CREATE TABLE t_test (x numeric);
INSERT INTO t_test SELECT random()
FROM generate_series(1, 5000000);
ANALYZE;
SHOW work_mem;

HEAD:
postgres=# explain analyze SELECT * FROM t_test ORDER BY x;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=397084.73..883229.71 rows=4166666 width=11) (actual
time=1328.331..2743.310 rows=5000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=396084.71..401293.04 rows=2083333 width=11) (actual
time=1278.442..1513.510 rows=1666667 loops=3)
Sort Key: x
Sort Method: external merge Disk: 25704kB
Worker 0: Sort Method: external merge Disk: 23960kB
Worker 1: Sort Method: external merge Disk: 23960kB
-> Parallel Seq Scan on t_test (cost=0.00..47861.33 rows=2083333
width=11) (actual time=0.234..128.607 rows=1666667 loops=3)
Planning Time: 0.064 ms
Execution Time: 2863.381 ms
(11 rows)

PATCHED:
postgres=# explain analyze SELECT * FROM t_test ORDER BY x;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=397084.73..883229.71 rows=4166666 width=11) (actual
time=1309.703..2705.027 rows=5000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=396084.71..401293.04 rows=2083333 width=11) (actual
time=1281.111..1515.928 rows=1666667 loops=3)
Sort Key: x
Sort Method: external merge Disk: 24880kB
Worker 0: Sort Method: external merge Disk: 24776kB
Worker 1: Sort Method: external merge Disk: 23960kB
-> Parallel Seq Scan on t_test (cost=0.00..47861.33 rows=2083333
width=11) (actual time=0.260..130.277 rows=1666667 loops=3)
Planning Time: 0.060 ms
Execution Time: 2825.201 ms
(11 rows)

I leave MemSetAligned and MemSetLoop to another step.

regards,
Ranier Vilela

[1]: https://msrc-blog.microsoft.com/2021/01/11/building-faster-amd64-memset-routines/
https://msrc-blog.microsoft.com/2021/01/11/building-faster-amd64-memset-routines/

Attachments:

001_avoid_unecessary_memset_call.patchapplication/octet-stream; name=001_avoid_unecessary_memset_call.patchDownload+1-1
002_refactoring_memset_api_usage.patchapplication/octet-stream; name=002_refactoring_memset_api_usage.patchDownload+409-409
#18David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#13)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

On Thu, 19 May 2022 at 11:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The thing that makes this a bit more difficult than it might be is
the special cases we have for known-aligned and so on targets, which
are particularly critical for palloc0 and makeNode etc. So there's
more than one case to look into. But I'd argue that those special
cases are actually what we want to worry about the most: zeroing
relatively small, known-aligned node structs is THE use case.

I think the makeNode() trickery would be harder to get rid of, or for
that matter, anything where the size/alignment is unknown at compile
time. I think the more interesting ones that we might be able to get
rid of are the ones where the alignment and size *are* known at
compile time. Also probably anything that passes a compile-time const
that's not 0 will fallback on memset anyway, so might as well be
removed to tidy things up.

It just all seems a bit untidy when you look at functions like
ExecStoreAllNullTuple() which use a mix of memset and MemSet without
any apparent explanation of why. That particular one is likely that
way due to the first size guaranteed to be multiples of sizeof(Datum)
and the latter not.

Naturally, we'd need to run enough benchmarks to prove to ourselves
that we're not causing any slowdowns. The intention of memset.c was
to try to put something out there that people could test so we could
get an idea if there are any machines/compilers that we might need to
be concerned about.

David

#19Peter Eisentraut
peter_e@gmx.net
In reply to: Ranier Vilela (#17)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

On 19.05.22 18:09, Ranier Vilela wrote:

Taking it a step further.
Created a new patch into commitfest, targeting 16 version.
https://commitfest.postgresql.org/38/3645/
<https://commitfest.postgresql.org/38/3645/&gt;

I have committed your 001 patch, which was clearly a (harmless) mistake.

I have also committed a patch that gets rid of MemSet() calls where the
value is a constant not-0, because that just falls back to memset() anyway.

I'm on board with trying to get rid of MemSet(), but first I need to
analyze all the performance numbers and arguments that were shown in
this thread.

#20Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Eisentraut (#19)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

Em qui., 30 de jun. de 2022 às 19:37, Peter Eisentraut <
peter.eisentraut@enterprisedb.com> escreveu:

On 19.05.22 18:09, Ranier Vilela wrote:

Taking it a step further.
Created a new patch into commitfest, targeting 16 version.
https://commitfest.postgresql.org/38/3645/
<https://commitfest.postgresql.org/38/3645/&gt;

I have committed your 001 patch, which was clearly a (harmless) mistake.

Thank you.

I have also committed a patch that gets rid of MemSet() calls where the
value is a constant not-0, because that just falls back to memset() anyway.

I'm on board with trying to get rid of MemSet(), but first I need to
analyze all the performance numbers and arguments that were shown in
this thread.

One good argument is that using memset, allows to compiler
analyze and remove completely memset call if he understands
that can do it, which with MemSet is not possible.

regards,
Ranier Vilela

#21Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Eisentraut (#19)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#10)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#22)
#24Peter Eisentraut
peter_e@gmx.net
In reply to: Ranier Vilela (#21)
#25Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Eisentraut (#22)
#26Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Eisentraut (#22)
#27Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#26)
#28Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#23)
#29Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#27)
#30Peter Eisentraut
peter_e@gmx.net
In reply to: Ranier Vilela (#29)
#31Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Eisentraut (#30)
#32Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#31)
#33Peter Smith
smithpb2250@gmail.com
In reply to: Ranier Vilela (#32)
#34Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Smith (#33)
#35mahendrakar s
mahendrakarforpg@gmail.com
In reply to: Ranier Vilela (#34)
#36Ranier Vilela
ranier.vf@gmail.com
In reply to: mahendrakar s (#35)
#37Peter Eisentraut
peter_e@gmx.net
In reply to: Ranier Vilela (#32)
#38Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Eisentraut (#37)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ranier Vilela (#38)
#40Ranier Vilela
ranier.vf@gmail.com
In reply to: Alvaro Herrera (#39)
#41Julien Rouhaud
rjuju123@gmail.com
In reply to: Ranier Vilela (#40)
#42Ranier Vilela
ranier.vf@gmail.com
In reply to: Julien Rouhaud (#41)
#43David Zhang
david.zhang@highgo.ca
In reply to: Ranier Vilela (#32)
#44Ranier Vilela
ranier.vf@gmail.com
In reply to: David Zhang (#43)
#45Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Zhang (#43)
#46Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#45)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#46)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#47)
#49Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#47)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Ranier Vilela (#49)
#51Ranier Vilela
ranier.vf@gmail.com
In reply to: Robert Haas (#50)
#52Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#46)
#53Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#52)