pgsql: Generational memory allocator
Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).
Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.
Author: Tomas Vondra
Reviewed-by: Simon Riggs
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/a4ccc1cef5a04cc054af83bc4582a045d5232cb3
Modified Files
--------------
src/backend/replication/logical/reorderbuffer.c | 80 +--
src/backend/utils/mmgr/Makefile | 2 +-
src/backend/utils/mmgr/README | 23 +
src/backend/utils/mmgr/generation.c | 768 ++++++++++++++++++++++++
src/include/nodes/memnodes.h | 4 +-
src/include/nodes/nodes.h | 1 +
src/include/replication/reorderbuffer.h | 15 +-
src/include/utils/memutils.h | 5 +
8 files changed, 819 insertions(+), 79 deletions(-)
Hi,
On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.
Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
Greetings,
Andres Freund
On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
It doesn't report anything useful that would allow me to fix this.
Please enable some info reporting on the animal.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/23/2017 10:57 AM, Simon Riggs wrote:
On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01It doesn't report anything useful that would allow me to fix this.
Please enable some info reporting on the animal.
I agree it would be useful to get the valgrind log from the animal, but
I guess you'd need to get valgrind working to fix the issue anyway.
Anyway, the attached patch fixes the issues for me - strictly speaking
the Assert is not needed, but it doesn't hurt.
regards
Tomas
Attachments:
genalloc-valgrind-fix.patchtext/x-patch; name=genalloc-valgrind-fix.patchDownload+5-0
On 24 November 2017 at 02:16, Tomas Vondra <tv@fuzzy.cz> wrote:
On 11/23/2017 10:57 AM, Simon Riggs wrote:
On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
It doesn't report anything useful that would allow me to fix this.
Please enable some info reporting on the animal.
I agree it would be useful to get the valgrind log from the animal, but I
guess you'd need to get valgrind working to fix the issue anyway.
If something fails compilation it tells us why, not just "compilation failed".
The whole point of the buildfarm is to report errors to allow them to
be fixed, not just a fence that blocks things.
Anyway, the attached patch fixes the issues for me - strictly speaking the
Assert is not needed, but it doesn't hurt.
Thanks for the patch.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:
On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01It doesn't report anything useful that would allow me to fix this.
Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.
Please enable some info reporting on the animal.
Hey, I just pointed out for you that there's a problem with something
that you committed. ISTM you're inverting the responsibilities more than
a bit here.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:
On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01
It doesn't report anything useful that would allow me to fix this.
Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.
I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.
In any case, valgrind failures are generally easy enough to reproduce
locally.
Meanwhile, over on
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2017-11-23%2013%3A56%3A17
we have
ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I../../../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o generation.o generation.c
generation.c: In function ‘GenerationContextCreate’:
generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
make[4]: *** [generation.o] Error 1
Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
must be a maxaligned quantity, which is wrong on platforms where
MAXALIGN is bigger than sizeof(pointer).
regards, tom lane
On 11/23/2017 09:06 PM, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:
On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01It doesn't report anything useful that would allow me to fix this.
Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.
It may not be immediately obvious that the failure is due to valgrind,
but otherwise I agree it's up to us to investigate.
I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.
Are the valgrind errors really written to postmaster log? I'm assuming
it failed because valgrind ran into an issue and killed the process.
In any case, valgrind failures are generally easy enough to reproduce
locally.
Right.
Meanwhile, over on
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2017-11-23%2013%3A56%3A17we have
ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I../../../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o generation.o generation.c
generation.c: In function ‘GenerationContextCreate’:
generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
make[4]: *** [generation.o] Error 1Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
must be a maxaligned quantity, which is wrong on platforms where
MAXALIGN is bigger than sizeof(pointer).
Hmmm, I see. Presumably adding this to GenerationChunk (similarly to
what we do in AllocChunkData) would address the issue:
#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endif
but I have no way to verify that (no access to such machine). I wonder
why SlabChunk doesn't need to do that (perhaps a comment explaining that
would be appropriate?).
regards
Tomas
On 24 November 2017 at 07:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Meanwhile, over on
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2017-11-23%2013%3A56%3A17we have
ccache gcc-4.7 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I../../../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o generation.o generation.c
generation.c: In function ‘GenerationContextCreate’:
generation.c:206:7: error: static assertion failed: "padding calculation in GenerationChunk is wrong"
make[4]: *** [generation.o] Error 1Looks to me like GenerationChunk imagines that 3*sizeof(pointer)
must be a maxaligned quantity, which is wrong on platforms where
MAXALIGN is bigger than sizeof(pointer).
Thanks, yes.
I can't see how to resolve that without breaking the design assumption
at line 122, "there must not be any padding to reach a MAXALIGN
boundary here!".
So I'll wait for Tomas to comment.
(I'm just about to catch a long flight, so will be offline for 24
hours, so we may need to revert this before I get back if it is
difficult to resolve.)
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:
I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.Are the valgrind errors really written to postmaster log? I'm assuming it
failed because valgrind ran into an issue and killed the process.
Yes. Search e.g. in
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-09-18%2004%3A10%3A01
for VALGRINDERROR-BEGIN.
(You could argue that they're only written there in certain
configurations, because I'd assume it'd not work in e.g. syslog
configured systems, because valgrind just prints to stderr).
Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
do in AllocChunkData) would address the issue:#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endifbut I have no way to verify that (no access to such machine). I wonder why
SlabChunk doesn't need to do that (perhaps a comment explaining that would
be appropriate?).
Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?
Greetings,
Andres Freund
On 24 November 2017 at 06:39, Andres Freund <andres@anarazel.de> wrote:
On 2017-11-23 20:57:10 +1100, Simon Riggs wrote:
On 23 November 2017 at 11:16, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-11-22 18:48:19 +0000, Simon Riggs wrote:
Generational memory allocator
Add new style of memory allocator, known as Generational
appropriate for use in cases where memory is allocated
and then freed in roughly oldest first order (FIFO).Use new allocator for logical decoding’s reorderbuffer
to significantly reduce memory usage and improve performance.Looks like it's not quite valgrind clean:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-11-22%2022%3A30%3A01It doesn't report anything useful that would allow me to fix this.
Uh. It reports a failure, and you can go from there. The error output
actually is in postmaster.log but for some reason the buildfarm code
didn't display that in this case.
Please enable some info reporting on the animal.
Hey, I just pointed out for you that there's a problem with something
that you committed. ISTM you're inverting the responsibilities more than
a bit here.
I don't believe I was inverting responsibilties, but sorry if you thought I was.
Yes, there is a bug in something I committed which I know about
because of the buildfarm. I am happy to work on fixing it. No
discussion on responsibility there.
I can see that buildfarm animal is yours, so I asked you for more info
about the bug on that animal. ISTM reasonable to ask for some more
detail on a bug report, and where that is an automated service to
request that the owner of that service make changes to assist. I don't
see why that implies some change of my responsibilities.
Thanks for your help
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andres Freund <andres@anarazel.de> writes:
On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:
Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
do in AllocChunkData) would address the issue:#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endifbut I have no way to verify that (no access to such machine). I wonder why
SlabChunk doesn't need to do that (perhaps a comment explaining that would
be appropriate?).
Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?
I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)
regards, tom lane
On 11/23/2017 11:04 PM, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:
Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
do in AllocChunkData) would address the issue:#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endifbut I have no way to verify that (no access to such machine). I wonder why
SlabChunk doesn't need to do that (perhaps a comment explaining that would
be appropriate?).Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)
Thanks. Do we need to do something similar to the other memory contexts?
I see Slab does not do this at all (assuming it's not necessary), and
AllocSet does this in a different way (which seems a bit strange).
regards
Tomas
Tomas Vondra <tv@fuzzy.cz> writes:
On 11/23/2017 11:04 PM, Tom Lane wrote:
I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)
Thanks. Do we need to do something similar to the other memory contexts?
I see Slab does not do this at all (assuming it's not necessary), and
AllocSet does this in a different way (which seems a bit strange).
Hm ... the coding in AllocSet is ancient and I have to say that I don't
like it as well as what I put into generation.c. We could not have done
it in that way at the time, because (IIRC) we did not have constant macros
for SIZEOF_SIZE_T, and maybe not SIZEOF_VOID_P either --- and you need
constant macros in order to do the #if calculation, because sizeof() does
not work in preprocessor expressions. But we have 'em now, so I'm tempted
to change aset.c to match generation.c.
As for SlabChunk, I think that's fine as-is (except I wonder why the
"block" field is declared "void *" rather than "SlabBlock *"). Since
the contents are fixed at two pointer fields, it's impossible that
there's any unexpected padding in there --- the struct size is
basically guaranteed to be 2 * sizeof(pointer). Which makes it either 8
or 16 bytes, either one of which is certain to be a multiple of MAXALIGN.
So I think the StaticAssert that that's true is plenty sufficient in
that case.
regards, tom lane
On 24 November 2017 at 09:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
On 2017-11-23 22:34:57 +0100, Tomas Vondra wrote:
Hmmm, I see. Presumably adding this to GenerationChunk (similarly to what we
do in AllocChunkData) would address the issue:#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
Size padding;
#endifbut I have no way to verify that (no access to such machine). I wonder why
SlabChunk doesn't need to do that (perhaps a comment explaining that would
be appropriate?).Can't you just compile pg on a 32bit system and manually define MAXALIGN
to 8 bytes?I pushed a patch that computes how much padding to add and adds it.
(It might not really work if size_t and void * are different sizes,
because then there could be additional padding in the struct; but
that seems very unlikely.)
Oh, OK, thanks.
It sunk in what was needed while flying, but that's better than my efforts.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/23/2017 03:06 PM, Tom Lane wrote:
I think it's a legitimate complaint that postmaster.log wasn't captured
in this failure, but that's a buildfarm script oversight and hardly
Andres' fault.
A fix for this will be in the next buildfarm release.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tv@fuzzy.cz> writes:
I agree it would be useful to get the valgrind log from the animal, but
I guess you'd need to get valgrind working to fix the issue anyway.
Anyway, the attached patch fixes the issues for me - strictly speaking
the Assert is not needed, but it doesn't hurt.
For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run:
==00:00:00:08.768 15475== Invalid read of size 8
==00:00:00:08.768 15475== at 0x710BD0: SnapBuildProcessNewCid (snapbuild.c:780)
==00:00:00:08.768 15475== by 0x70291E: LogicalDecodingProcessRecord (decode.c:371)
==00:00:00:08.768 15475== by 0x705A04: pg_logical_slot_get_changes_guts (logicalfuncs.c:308)
==00:00:00:08.768 15475== by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.768 15475== by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.768 15475== by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.768 15475== by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.768 15475== by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.768 15475== by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.768 15475== by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.768 15475== by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.768 15475== by 0x6EFDC9: PostmasterMain (postmaster.c:4412)
==00:00:00:08.768 15475== Address 0xe7e846c is 28 bytes inside a block of size 34 client-defined
==00:00:00:08.768 15475== at 0x89A17F: palloc (mcxt.c:871)
==00:00:00:08.768 15475== by 0x51FF07: DecodeXLogRecord (xlogreader.c:1283)
==00:00:00:08.768 15475== by 0x520CD3: XLogReadRecord (xlogreader.c:475)
==00:00:00:08.768 15475== by 0x7059E4: pg_logical_slot_get_changes_guts (logicalfuncs.c:293)
==00:00:00:08.769 15475== by 0x6279F5: ExecMakeTableFunctionResult (execSRF.c:231)
==00:00:00:08.769 15475== by 0x634C41: FunctionNext (nodeFunctionscan.c:95)
==00:00:00:08.769 15475== by 0x626976: ExecScan (execScan.c:97)
==00:00:00:08.769 15475== by 0x622AF6: standard_ExecutorRun (executor.h:241)
==00:00:00:08.769 15475== by 0x76B07A: PortalRunSelect (pquery.c:932)
==00:00:00:08.769 15475== by 0x76C3E0: PortalRun (pquery.c:773)
==00:00:00:08.769 15475== by 0x7687EC: exec_simple_query (postgres.c:1120)
==00:00:00:08.769 15475== by 0x76A5C7: PostgresMain (postgres.c:4139)
==00:00:00:08.769 15475==
Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?
I'm tempted to push Tomas' patch as-is, just to see if skink sees the
same failure I do.
BTW, I'm pretty certain that there are additional bugs in generation.c's
valgrind support, in code paths that the regression tests likely do not
reach. I do not think it's setting up valgrind correctly for "large"
chunks, and it looks to me like GenerationCheck would fail because it
tries to access chunk->requested_size which is generally kept NOACCESS.
I'm tempted to rip out all of the logic concerned with flipping
chunk->requested_size between normal and NOACCESS states, as that seems to
me like an exercise much more likely to introduce bugs than to catch any.
regards, tom lane
I wrote:
Tomas Vondra <tv@fuzzy.cz> writes:
Thanks. Do we need to do something similar to the other memory contexts?
I see Slab does not do this at all (assuming it's not necessary), and
AllocSet does this in a different way (which seems a bit strange).
Hm ... the coding in AllocSet is ancient and I have to say that I don't
like it as well as what I put into generation.c.
I take that back: the current coding of padding in AllocChunkData only
dates back to 7e3aa03b. But I still don't like it, and will migrate
generation.c's version into aset.c. And maybe improve the comments.
BTW, it appears that some of the confusion in generation.c can be
accounted for by not having entirely followed the changes in 7e3aa03b.
regards, tom lane
I wrote:
For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run: ...
Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?
I've pushed fixes for everything that I could find wrong in generation.c
(and there was a lot :-(). But I'm still seeing the "invalid read in
SnapBuildProcessNewCid" failure when I run test_decoding under valgrind.
Somebody who has more familiarity with the logical decoding stuff than
I do needs to look into that.
I tried to narrow down exactly which fetch in SnapBuildProcessNewCid was
triggering the failure, with the attached patch. Weirdly, *it does not
fail* with this. I have no explanation for that.
regards, tom lane
Attachments:
hack-to-split-up-fetches.patchtext/x-diff; charset=us-ascii; name=hack-to-split-up-fetches.patchDownload+21-15
Hi,
On 11/25/2017 02:25 AM, Tom Lane wrote:
I wrote:
For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run: ...
Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?I've pushed fixes for everything that I could find wrong in generation.c
(and there was a lot :-(). But I'm still seeing the "invalid read in
SnapBuildProcessNewCid" failure when I run test_decoding under valgrind.
Somebody who has more familiarity with the logical decoding stuff than
I do needs to look into that.I tried to narrow down exactly which fetch in SnapBuildProcessNewCid was
triggering the failure, with the attached patch. Weirdly, *it does not
fail* with this. I have no explanation for that.
I have no explanation for that either. FWIW I don't think this is
related to the new memory contexts. I can reproduce it on 3bae43c (i.e.
before the Generation memory context was introduced), and with Slab
removed from ReorderBuffer.
I wonder if this might be a valgrind issue. I'm not sure which version
skink is using, but I'm running with valgrind-3.12.0-9.el7_4.x86_64.
BTW I also see these failures in hstore:
==15168== Source and destination overlap in memcpy(0x5d0fed0, 0x5d0fed0, 40)
==15168== at 0x4C2E00C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
==15168== by 0x15419A06: hstoreUniquePairs (hstore_io.c:343)
==15168== by 0x15419EE4: hstore_in (hstore_io.c:416)
==15168== by 0x9ED11A: InputFunctionCall (fmgr.c:1635)
==15168== by 0x9ED3C2: OidInputFunctionCall (fmgr.c:1738)
==15168== by 0x6014A2: stringTypeDatum (parse_type.c:641)
==15168== by 0x5E1ADC: coerce_type (parse_coerce.c:304)
==15168== by 0x5E17A9: coerce_to_target_type (parse_coerce.c:103)
==15168== by 0x5EDD6D: transformTypeCast (parse_expr.c:2724)
==15168== by 0x5E8860: transformExprRecurse (parse_expr.c:203)
==15168== by 0x5E8601: transformExpr (parse_expr.c:156)
==15168== by 0x5FCF95: transformTargetEntry (parse_target.c:103)
==15168== by 0x5FD15D: transformTargetList (parse_target.c:191)
==15168== by 0x5A5EEC: transformSelectStmt (analyze.c:1214)
==15168== by 0x5A4453: transformStmt (analyze.c:297)
==15168== by 0x5A4381: transformOptionalSelectInto (analyze.c:242)
==15168== by 0x5A423F: transformTopLevelStmt (analyze.c:192)
==15168== by 0x5A4097: parse_analyze (analyze.c:112)
==15168== by 0x87E0AF: pg_analyze_and_rewrite (postgres.c:664)
==15168== by 0x87E6EE: exec_simple_query (postgres.c:1045)
Seems hstoreUniquePairs may call memcpy with the same pointers in some
cases (which looks a bit dubious). But the code is ancient, so it's
strange it didn't fail before.
regards
Tomas