pgsql: Generational memory allocator

Started by Simon Riggsover 8 years ago37 messagescomitters
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

Generational memory allocator

Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Author: Tomas Vondra
Reviewed-by: Simon Riggs

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/a4ccc1cef5a04cc054af83bc4582a045d5232cb3

Modified Files
--------------
src/backend/replication/logical/reorderbuffer.c | 80 +--
src/backend/utils/mmgr/Makefile | 2 +-
src/backend/utils/mmgr/README | 23 +
src/backend/utils/mmgr/generation.c | 768 ++++++++++++++++++++++++
src/include/nodes/memnodes.h | 4 +-
src/include/nodes/nodes.h | 1 +
src/include/replication/reorderbuffer.h | 15 +-
src/include/utils/memutils.h | 5 +
8 files changed, 819 insertions(+), 79 deletions(-)

#2Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#1)
Re: pgsql: Generational memory allocator

Hi,

On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:

Generational memory allocator

Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01

Greetings,

Andres Freund

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#2)
Re: pgsql: Generational memory allocator

On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:

Generational memory allocator

Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-11-22%2022%3A30%3A01

It doesn't report anything useful that would allow me to fix this.

Please enable some info reporting on the animal.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Simon Riggs (#3)
Re: pgsql: Generational memory allocator

On 11/23/2017 10:57 AM, Simon Riggs wrote:

On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:

Generational memory allocator

Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-11-22%2022%3A30%3A01

It doesn't report anything useful that would allow me to fix this.

Please enable some info reporting on the animal.

I agree it would be useful to get the valgrind log from the animal, but
I guess you'd need to get valgrind working to fix the issue anyway.

Anyway, the attached patch fixes the issues for me - strictly speaking
the Assert is not needed, but it doesn't hurt.

regards
Tomas

Attachments:

genalloc-valgrind-fix.patchtext/x-patch; name=genalloc-valgrind-fix.patchDownload+5-0
#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Tomas Vondra (#4)
Re: pgsql: Generational memory allocator

On 24 November 2017 at 02:16, Tomas Vondra <tv@fuzzy.cz> wrote:

On 11/23/2017 10:57 AM, Simon Riggs wrote:

On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:

Generational memory allocator

Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Looks like it's not quite valgrind clean:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-11-22%2022%3A30%3A01

It doesn't report anything useful that would allow me to fix this.

Please enable some info reporting on the animal.

I agree it would be useful to get the valgrind log from the animal, but I
guess you'd need to get valgrind working to fix the issue anyway.

If something fails compilation it tells us why, not just "compilation failed".

The whole point of the buildfarm is to report errors to allow them to
be fixed, not just a fence that blocks things.

Anyway, the attached patch fixes the issues for me - strictly speaking the
Assert is not needed, but it doesn't hurt.

Thanks for the patch.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#3)
Re: pgsql: Generational memory allocator

On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:

On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:

Generational memory allocator

Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-11-22%2022%3A30%3A01

It doesn't report anything useful that would allow me to fix this.

Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.

Please enable some info reporting on the animal.

Hey, I just pointed out for you that there's a problem with something
that you committed. ISTM you're inverting the responsibilities more than
a bit here.

Greetings,

Andres Freund

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: pgsql: Generational memory allocator

Andres Freund <andres@anarazel.de> writes:

On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:

On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:

Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-11-22%2022%3A30%3A01

It doesn't report anything useful that would allow me to fix this.

Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.

I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.

In any case, valgrind failures are generally easy enough to reproduce
locally.

Meanwhile, over on
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&amp;dt=2017-11-23%2013%3A56%3A17

we have

ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I../../../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o generation.o generation.c
generation.c: In function ‘GenerationContextCreate’:
generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
make[4]: *** [generation.o] Error 1

Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
must be a maxaligned quantity, which is wrong on platforms where
MAXALIGN is bigger than sizeof(pointer).

regards, tom lane

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#7)
Re: pgsql: Generational memory allocator

On 11/23/2017 09:06 PM, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:

On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:

Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-11-22%2022%3A30%3A01

It doesn't report anything useful that would allow me to fix this.

Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.

It may not be immediately obvious that the failure is due to valgrind,
but otherwise I agree it's up to us to investigate.

I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.

Are the valgrind errors really written to postmaster log? I'm assuming
it failed because valgrind ran into an issue and killed the process.

In any case, valgrind failures are generally easy enough to reproduce
locally.

Right.

Meanwhile, over on
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&amp;dt=2017-11-23%2013%3A56%3A17

we have

ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I../../../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o generation.o generation.c
generation.c: In function ‘GenerationContextCreate’:
generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
make[4]: *** [generation.o] Error 1

Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
must be a maxaligned quantity, which is wrong on platforms where
MAXALIGN is bigger than sizeof(pointer).

Hmmm, I see. Presumably adding this to GenerationChunk (similarly to
what we do in AllocChunkData) would address the issue:

#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endif

but I have no way to verify that (no access to such machine). I wonder
why SlabChunk doesn't need to do that (perhaps a comment explaining that
would be appropriate?).

regards
Tomas

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#7)
Re: pgsql: Generational memory allocator

On 24 November 2017 at 07:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Meanwhile, over on
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&amp;dt=2017-11-23%2013%3A56%3A17

we have

ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I../../../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o generation.o generation.c
generation.c: In function ‘GenerationContextCreate’:
generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
make[4]: *** [generation.o] Error 1

Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
must be a maxaligned quantity, which is wrong on platforms where
MAXALIGN is bigger than sizeof(pointer).

Thanks, yes.

I can't see how to resolve that without breaking the design assumption
at line 122, "there must not be any padding to reach a MAXALIGN
boundary here!".

So I'll wait for Tomas to comment.

(I'm just about to catch a long flight, so will be offline for 24
hours, so we may need to revert this before I get back if it is
difficult to resolve.)

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#8)
Re: pgsql: Generational memory allocator

On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:

I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.

Are the valgrind errors really written to postmaster log? I'm assuming it
failed because valgrind ran into an issue and killed the process.

Yes. Search e.g. in
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-09-18%2004%3A10%3A01
for VALGRINDERROR-BEGIN.

(You could argue that they're only written there in certain
configurations, because I'd assume it'd not work in e.g. syslog
configured systems, because valgrind just prints to stderr).

Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
do in AllocChunkData) would address the issue:

#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endif

but I have no way to verify that (no access to such machine). I wonder why
SlabChunk doesn't need to do that (perhaps a comment explaining that would
be appropriate?).

Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?

Greetings,

Andres Freund

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#6)
Re: pgsql: Generational memory allocator

On 24 November 2017 at 06:39, Andres Freund <andres@anarazel.de> wrote:

On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:

On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:

Generational memory allocator

Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).

Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.

Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-11-22%2022%3A30%3A01

It doesn't report anything useful that would allow me to fix this.

Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.

Please enable some info reporting on the animal.

Hey, I just pointed out for you that there's a problem with something
that you committed. ISTM you're inverting the responsibilities more than
a bit here.

I don't believe I was inverting responsibilties, but sorry if you thought I was.

Yes, there is a bug in something I committed which I know about
because of the buildfarm. I am happy to work on fixing it. No
discussion on responsibility there.

I can see that buildfarm animal is yours, so I asked you for more info
about the bug on that animal. ISTM reasonable to ask for some more
detail on a bug report, and where that is an automated service to
request that the owner of that service make changes to assist. I don't
see why that implies some change of my responsibilities.

Thanks for your help

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#10)
Re: pgsql: Generational memory allocator

Andres Freund <andres@anarazel.de> writes:

On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:

Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
do in AllocChunkData) would address the issue:

#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endif

but I have no way to verify that (no access to such machine). I wonder why
SlabChunk doesn't need to do that (perhaps a comment explaining that would
be appropriate?).

Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?

I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)

regards, tom lane

#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#12)
Re: pgsql: Generational memory allocator

On 11/23/2017 11:04 PM, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:

Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
do in AllocChunkData) would address the issue:

#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endif

but I have no way to verify that (no access to such machine). I wonder why
SlabChunk doesn't need to do that (perhaps a comment explaining that would
be appropriate?).

Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?

I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)

Thanks. Do we need to do something similar to the other memory contexts?
I see Slab does not do this at all (assuming it's not necessary), and
AllocSet does this in a different way (which seems a bit strange).

regards
Tomas

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#13)
Re: pgsql: Generational memory allocator

Tomas Vondra <tv@fuzzy.cz> writes:

On 11/23/2017 11:04 PM, Tom Lane wrote:

I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)

Thanks. Do we need to do something similar to the other memory contexts?
I see Slab does not do this at all (assuming it's not necessary), and
AllocSet does this in a different way (which seems a bit strange).

Hm ... the coding in AllocSet is ancient and I have to say that I don't
like it as well as what I put into generation.c. We could not have done
it in that way at the time, because (IIRC) we did not have constant macros
for SIZEOF_SIZE_T, and maybe not SIZEOF_VOID_P either --- and you need
constant macros in order to do the #if calculation, because sizeof() does
not work in preprocessor expressions. But we have 'em now, so I'm tempted
to change aset.c to match generation.c.

As for SlabChunk, I think that's fine as-is (except I wonder why the
"block" field is declared "void *" rather than "SlabBlock *"). Since
the contents are fixed at two pointer fields, it's impossible that
there's any unexpected padding in there --- the struct size is
basically guaranteed to be 2 * sizeof(pointer). Which makes it either 8
or 16 bytes, either one of which is certain to be a multiple of MAXALIGN.
So I think the StaticAssert that that's true is plenty sufficient in
that case.

regards, tom lane

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#12)
Re: pgsql: Generational memory allocator

On 24 November 2017 at 09:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:

Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
do in AllocChunkData) would address the issue:

#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endif

but I have no way to verify that (no access to such machine). I wonder why
SlabChunk doesn't need to do that (perhaps a comment explaining that would
be appropriate?).

Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?

I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)

Oh, OK, thanks.

It sunk in what was needed while flying, but that's better than my efforts.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#7)
Re: pgsql: Generational memory allocator

On 11/23/2017 03:06 PM, Tom Lane wrote:

I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.

A fix for this will be in the next buildfarm release.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#4)
Re: pgsql: Generational memory allocator

Tomas Vondra <tv@fuzzy.cz> writes:

I agree it would be useful to get the valgrind log from the animal, but
I guess you'd need to get valgrind working to fix the issue anyway.
Anyway, the attached patch fixes the issues for me - strictly speaking
the Assert is not needed, but it doesn't hurt.

For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run:

==00:00:00:08.768 15475== Invalid read of size 8
==00:00:00:08.768 15475== at 0x710BD0: SnapBuildProcessNewCid (snapbuild.c:780)
==00:00:00:08.768 15475== by 0x70291E: LogicalDecodingProcessRecord (decode.c:371)
==00:00:00:08.768 15475== by 0x705A04: pg_logical_slot_get_changes_guts (logicalfuncs.c:308)
==00:00:00:08.768 15475== by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.768 15475== by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.768 15475== by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.768 15475== by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.768 15475== by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.768 15475== by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.768 15475== by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.768 15475== by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.768 15475== by 0x6EFDC9: PostmasterMain (postmaster.c:4412)
==00:00:00:08.768 15475== Address 0xe7e846c is 28 bytes inside a block of size 34 client-defined
==00:00:00:08.768 15475== at 0x89A17F: palloc (mcxt.c:871)
==00:00:00:08.768 15475== by 0x51FF07: DecodeXLogRecord (xlogreader.c:1283)
==00:00:00:08.768 15475== by 0x520CD3: XLogReadRecord (xlogreader.c:475)
==00:00:00:08.768 15475== by 0x7059E4: pg_logical_slot_get_changes_guts (logicalfuncs.c:293)
==00:00:00:08.769 15475== by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.769 15475== by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.769 15475== by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.769 15475== by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.769 15475== by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.769 15475== by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.769 15475== by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.769 15475== by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.769 15475==

Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?

I'm tempted to push Tomas' patch as-is, just to see if skink sees the
same failure I do.

BTW, I'm pretty certain that there are additional bugs in generation.c's
valgrind support, in code paths that the regression tests likely do not
reach. I do not think it's setting up valgrind correctly for "large"
chunks, and it looks to me like GenerationCheck would fail because it
tries to access chunk->requested_size which is generally kept NOACCESS.
I'm tempted to rip out all of the logic concerned with flipping
chunk->requested_size between normal and NOACCESS states, as that seems to
me like an exercise much more likely to introduce bugs than to catch any.

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#14)
Re: pgsql: Generational memory allocator

I wrote:

Tomas Vondra <tv@fuzzy.cz> writes:

Thanks. Do we need to do something similar to the other memory contexts?
I see Slab does not do this at all (assuming it's not necessary), and
AllocSet does this in a different way (which seems a bit strange).

Hm ... the coding in AllocSet is ancient and I have to say that I don't
like it as well as what I put into generation.c.

I take that back: the current coding of padding in AllocChunkData only
dates back to 7e3aa03b. But I still don't like it, and will migrate
generation.c's version into aset.c. And maybe improve the comments.

BTW, it appears that some of the confusion in generation.c can be
accounted for by not having entirely followed the changes in 7e3aa03b.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: pgsql: Generational memory allocator

I wrote:

For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run: ...
Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?

I've pushed fixes for everything that I could find wrong in generation.c
(and there was a lot :-(). But I'm still seeing the "invalid read in
SnapBuildProcessNewCid" failure when I run test_decoding under valgrind.
Somebody who has more familiarity with the logical decoding stuff than
I do needs to look into that.

I tried to narrow down exactly which fetch in SnapBuildProcessNewCid was
triggering the failure, with the attached patch. Weirdly, *it does not
fail* with this. I have no explanation for that.

regards, tom lane

Attachments:

hack-to-split-up-fetches.patchtext/x-diff; charset=us-ascii; name=hack-to-split-up-fetches.patchDownload+21-15
#20Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#19)
Re: pgsql: Generational memory allocator

Hi,

On 11/25/2017 02:25 AM, Tom Lane wrote:

I wrote:

For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run: ...
Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?

I've pushed fixes for everything that I could find wrong in generation.c
(and there was a lot :-(). But I'm still seeing the "invalid read in
SnapBuildProcessNewCid" failure when I run test_decoding under valgrind.
Somebody who has more familiarity with the logical decoding stuff than
I do needs to look into that.

I tried to narrow down exactly which fetch in SnapBuildProcessNewCid was
triggering the failure, with the attached patch. Weirdly, *it does not
fail* with this. I have no explanation for that.

I have no explanation for that either. FWIW I don't think this is
related to the new memory contexts. I can reproduce it on 3bae43c (i.e.
before the Generation memory context was introduced), and with Slab
removed from ReorderBuffer.

I wonder if this might be a valgrind issue. I'm not sure which version
skink is using, but I'm running with valgrind-3.12.0-9.el7_4.x86_64.

BTW I also see these failures in hstore:

==15168== Source and destination overlap in memcpy(0x5d0fed0, 0x5d0fed0, 40)
==15168== at 0x4C2E00C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
==15168== by 0x15419A06: hstoreUniquePairs (hstore_io.c:343)
==15168== by 0x15419EE4: hstore_in (hstore_io.c:416)
==15168== by 0x9ED11A: InputFunctionCall (fmgr.c:1635)
==15168== by 0x9ED3C2: OidInputFunctionCall (fmgr.c:1738)
==15168== by 0x6014A2: stringTypeDatum (parse_type.c:641)
==15168== by 0x5E1ADC: coerce_type (parse_coerce.c:304)
==15168== by 0x5E17A9: coerce_to_target_type (parse_coerce.c:103)
==15168== by 0x5EDD6D: transformTypeCast (parse_expr.c:2724)
==15168== by 0x5E8860: transformExprRecurse (parse_expr.c:203)
==15168== by 0x5E8601: transformExpr (parse_expr.c:156)
==15168== by 0x5FCF95: transformTargetEntry (parse_target.c:103)
==15168== by 0x5FD15D: transformTargetList (parse_target.c:191)
==15168== by 0x5A5EEC: transformSelectStmt (analyze.c:1214)
==15168== by 0x5A4453: transformStmt (analyze.c:297)
==15168== by 0x5A4381: transformOptionalSelectInto (analyze.c:242)
==15168== by 0x5A423F: transformTopLevelStmt (analyze.c:192)
==15168== by 0x5A4097: parse_analyze (analyze.c:112)
==15168== by 0x87E0AF: pg_analyze_and_rewrite (postgres.c:664)
==15168== by 0x87E6EE: exec_simple_query (postgres.c:1045)

Seems hstoreUniquePairs may call memcpy with the same pointers in some
cases (which looks a bit dubious). But the code is ancient, so it's
strange it didn't fail before.

regards
Tomas

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#23)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#25)
#27Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#26)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Andrew Dunstan (#16)
#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#19)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#27)
#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#31)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#33)
#35Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#35)
#37Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#36)