valgrind error in subscription code

Started by Andres Freundover 8 years ago5 messages
#1Andres Freund
andres@anarazel.de

Hi,

I enabled skink / the valgrind animal to run the tap tests too (hugely
increasing the test time :(), and that paid of:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-22%2004%3A52%3A13

==606== VALGRINDERROR-BEGIN
==606== Conditional jump or move depends on uninitialised value(s)
==606== at 0x46A207: logicalrep_rel_open (relation.c:361)
==606== by 0x472D12: copy_table (tablesync.c:669)
==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
==606== by 0x475287: ApplyWorkerMain (worker.c:1530)
==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577)
==606== by 0x44E59F: maybe_start_bgworker (postmaster.c:5761)
==606== by 0x44F11D: sigusr1_handler (postmaster.c:5015)
==606== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
==606== by 0x6FB8212: __select_nocancel (syscall-template.S:84)
==606== by 0x44F868: ServerLoop (postmaster.c:1693)
==606== by 0x450C53: PostmasterMain (postmaster.c:1337)
==606== Uninitialised value was created by a heap allocation
==606== at 0x605382: MemoryContextAlloc (mcxt.c:729)
==606== by 0x5E4E6A: DynaHashAlloc (dynahash.c:266)
==606== by 0x5E4EEE: element_alloc (dynahash.c:1637)
==606== by 0x5E5018: get_hash_entry (dynahash.c:1248)
==606== by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033)
==606== by 0x5E5A0D: hash_search (dynahash.c:890)
==606== by 0x469D38: logicalrep_relmap_update (relation.c:179)
==606== by 0x472D05: copy_table (tablesync.c:666)
==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
==606== by 0x475287: ApplyWorkerMain (worker.c:1530)
==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577)
==606==
==606== VALGRINDERROR-END

there's also another bleat:
==2161== VALGRINDERROR-BEGIN
==2161== Syscall param write(buf) points to uninitialised byte(s)
==2161== at 0x4E42190: __write_nocancel (syscall-template.S:84)
==2161== by 0x4676C7: CheckPointReplicationOrigin (origin.c:574)
==2161== by 0x25DB99: CheckPointGuts (xlog.c:9007)
==2161== by 0x2647CD: CreateCheckPoint (xlog.c:8792)
==2161== by 0x26734F: StartupXLOG (xlog.c:7576)
==2161== by 0x4510FA: StartupProcessMain (startup.c:217)
==2161== by 0x273832: AuxiliaryProcessMain (bootstrap.c:426)
==2161== by 0x44D8A0: StartChildProcess (postmaster.c:5256)
==2161== by 0x44DF33: PostmasterStateMachine (postmaster.c:3791)
==2161== by 0x44EE25: reaper (postmaster.c:3038)
==2161== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
==2161== by 0x6FB8212: __select_nocancel (syscall-template.S:84)
==2161== Location 0xffeffda42 is 2 bytes inside local var "disk_state"
==2161== declared at origin.c:555, in frame #1 of thread 1
==2161== Uninitialised value was created by a stack allocation
==2161== at 0x467421: CheckPointReplicationOrigin (origin.c:505)
==2161==
==2161== VALGRINDERROR-END
==2161== VALGRINDERROR-BEGIN
==2161== Syscall param write(buf) points to uninitialised byte(s)
==2161== at 0x4E42190: __write_nocancel (syscall-template.S:84)
==2161== by 0x467712: CheckPointReplicationOrigin (origin.c:591)
==2161== by 0x25DB99: CheckPointGuts (xlog.c:9007)
==2161== by 0x2647CD: CreateCheckPoint (xlog.c:8792)
==2161== by 0x26734F: StartupXLOG (xlog.c:7576)
==2161== by 0x4510FA: StartupProcessMain (startup.c:217)
==2161== by 0x273832: AuxiliaryProcessMain (bootstrap.c:426)
==2161== by 0x44D8A0: StartChildProcess (postmaster.c:5256)
==2161== by 0x44DF33: PostmasterStateMachine (postmaster.c:3791)
==2161== by 0x44EE25: reaper (postmaster.c:3038)
==2161== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
==2161== by 0x6FB8212: __select_nocancel (syscall-template.S:84)
==2161== Location 0xffeffda58 is 0 bytes inside local var "crc"
==2161== declared at origin.c:511, in frame #1 of thread 1
==2161== Uninitialised value was created by a stack allocation
==2161== at 0x467421: CheckPointReplicationOrigin (origin.c:505)
==2161==
==2161== VALGRINDERROR-END

but that looks more like something that needs to be silenced, based on
the 3s look I gave it.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#1)
1 attachment(s)
Re: valgrind error in subscription code

On 22/04/17 20:31, Andres Freund wrote:

Hi,

I enabled skink / the valgrind animal to run the tap tests too (hugely
increasing the test time :(), and that paid of:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-22%2004%3A52%3A13

==606== VALGRINDERROR-BEGIN
==606== Conditional jump or move depends on uninitialised value(s)
==606== at 0x46A207: logicalrep_rel_open (relation.c:361)
==606== by 0x472D12: copy_table (tablesync.c:669)
==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
==606== by 0x475287: ApplyWorkerMain (worker.c:1530)
==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577)
==606== by 0x44E59F: maybe_start_bgworker (postmaster.c:5761)
==606== by 0x44F11D: sigusr1_handler (postmaster.c:5015)
==606== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
==606== by 0x6FB8212: __select_nocancel (syscall-template.S:84)
==606== by 0x44F868: ServerLoop (postmaster.c:1693)
==606== by 0x450C53: PostmasterMain (postmaster.c:1337)
==606== Uninitialised value was created by a heap allocation
==606== at 0x605382: MemoryContextAlloc (mcxt.c:729)
==606== by 0x5E4E6A: DynaHashAlloc (dynahash.c:266)
==606== by 0x5E4EEE: element_alloc (dynahash.c:1637)
==606== by 0x5E5018: get_hash_entry (dynahash.c:1248)
==606== by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033)
==606== by 0x5E5A0D: hash_search (dynahash.c:890)
==606== by 0x469D38: logicalrep_relmap_update (relation.c:179)
==606== by 0x472D05: copy_table (tablesync.c:666)
==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
==606== by 0x475287: ApplyWorkerMain (worker.c:1530)
==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577)
==606==
==606== VALGRINDERROR-END

Thanks, here is patch to fix that - I also removed the individual
settings of everything to NULL/0/InvalidOid etc and just replaced it all
with memset.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

Properly-initialize-memory-for-logical-replication-r.patchtext/plain; charset=UTF-8; name=Properly-initialize-memory-for-logical-replication-r.patchDownload
From 4993487f5e8750b708e76181bb78eddea933d897 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 22 Apr 2017 20:43:16 +0200
Subject: [PATCH] Properly initialize memory for logical replication relation
 cache

We missed setting some of the properties of the cache entry sometimes,
use memset to ensure entry is always fully cleaned up.
---
 src/backend/replication/logical/relation.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 875a081..5bc54dd 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -141,19 +141,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 		pfree(remoterel->attnames);
 		pfree(remoterel->atttyps);
 	}
-	remoterel->attnames = NULL;
-	remoterel->atttyps = NULL;
-
 	bms_free(remoterel->attkeys);
-	remoterel->attkeys = NULL;
 
 	if (entry->attrmap)
 		pfree(entry->attrmap);
-
-	entry->attrmap = NULL;
-	remoterel->natts = 0;
-	entry->localreloid = InvalidOid;
-	entry->localrel = NULL;
 }
 
 /*
@@ -182,6 +173,8 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 	if (found)
 		logicalrep_relmap_free_entry(entry);
 
+	memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+
 	/* Make cached copy of the data */
 	oldctx = MemoryContextSwitchTo(LogicalRepRelMapContext);
 	entry->remoterel.remoteid = remoterel->remoteid;
@@ -197,8 +190,6 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 	}
 	entry->remoterel.replident = remoterel->replident;
 	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
-	entry->attrmap = NULL;
-	entry->localreloid = InvalidOid;
 	MemoryContextSwitchTo(oldctx);
 }
 
-- 
2.7.4

#3Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#2)
Re: valgrind error in subscription code

Hi,

On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote:

Thanks, here is patch to fix that - I also removed the individual
settings of everything to NULL/0/InvalidOid etc and just replaced it all
with memset.

Cool.

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 875a081..5bc54dd 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -141,19 +141,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
pfree(remoterel->attnames);
pfree(remoterel->atttyps);
}
-	remoterel->attnames = NULL;
-	remoterel->atttyps = NULL;
-
bms_free(remoterel->attkeys);
-	remoterel->attkeys = NULL;

if (entry->attrmap)
pfree(entry->attrmap);
-

Btw, I think it's a good pattern to zero things like attrmap after
freeing. Based on a minute of looking it's unclear to me if
logicalrep_relmap_update() could be called again, if e.g. one of the
pallocs after the logicalrep_relmap_free_entry() errors out. I think
you essentially addressed that with the memset, so that's good.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#3)
Re: valgrind error in subscription code

On 22/04/17 21:16, Andres Freund wrote:

Hi,

On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote:

Thanks, here is patch to fix that - I also removed the individual
settings of everything to NULL/0/InvalidOid etc and just replaced it all
with memset.

Cool.

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 875a081..5bc54dd 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -141,19 +141,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
pfree(remoterel->attnames);
pfree(remoterel->atttyps);
}
-	remoterel->attnames = NULL;
-	remoterel->atttyps = NULL;
-
bms_free(remoterel->attkeys);
-	remoterel->attkeys = NULL;

if (entry->attrmap)
pfree(entry->attrmap);
-

Btw, I think it's a good pattern to zero things like attrmap after
freeing. Based on a minute of looking it's unclear to me if
logicalrep_relmap_update() could be called again, if e.g. one of the
pallocs after the logicalrep_relmap_free_entry() errors out. I think
you essentially addressed that with the memset, so that's good.

Yes, memset is called immediately after logicalrep_relmap_free_entry()
which is why I found it redundant to set things to NULL there. We could
put the memset directly inside logicalrep_relmap_free_entry() and the
call either logicalrep_relmap_free_entry() or memset in the caller if
that would make you feel better about it.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#2)
Re: valgrind error in subscription code

On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote:

On 22/04/17 20:31, Andres Freund wrote:

Hi,

I enabled skink / the valgrind animal to run the tap tests too (hugely
increasing the test time :(), and that paid of:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2017-04-22%2004%3A52%3A13

==606== VALGRINDERROR-BEGIN
==606== Conditional jump or move depends on uninitialised value(s)
==606== at 0x46A207: logicalrep_rel_open (relation.c:361)
==606== by 0x472D12: copy_table (tablesync.c:669)
==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
==606== by 0x475287: ApplyWorkerMain (worker.c:1530)
==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577)
==606== by 0x44E59F: maybe_start_bgworker (postmaster.c:5761)
==606== by 0x44F11D: sigusr1_handler (postmaster.c:5015)
==606== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so)
==606== by 0x6FB8212: __select_nocancel (syscall-template.S:84)
==606== by 0x44F868: ServerLoop (postmaster.c:1693)
==606== by 0x450C53: PostmasterMain (postmaster.c:1337)
==606== Uninitialised value was created by a heap allocation
==606== at 0x605382: MemoryContextAlloc (mcxt.c:729)
==606== by 0x5E4E6A: DynaHashAlloc (dynahash.c:266)
==606== by 0x5E4EEE: element_alloc (dynahash.c:1637)
==606== by 0x5E5018: get_hash_entry (dynahash.c:1248)
==606== by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033)
==606== by 0x5E5A0D: hash_search (dynahash.c:890)
==606== by 0x469D38: logicalrep_relmap_update (relation.c:179)
==606== by 0x472D05: copy_table (tablesync.c:666)
==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803)
==606== by 0x475287: ApplyWorkerMain (worker.c:1530)
==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835)
==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577)
==606==
==606== VALGRINDERROR-END

Thanks, here is patch to fix that - I also removed the individual
settings of everything to NULL/0/InvalidOid etc and just replaced it all
with memset.

I've pushed this, and a fix for the (afaik harmless) replication origins
bleat. Let's see what skink says - I wasn't patient enough to locally
verify that your fix indeed fixes things.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers