valgrind errors
Valgrind'ing the postmaster yields a fair number of errors. A lot of
them are similar, such as the following:
==29929== Use of uninitialised value of size 4
==29929== at 0x80AFB80: XLogInsert (xlog.c:570)
==29929== by 0x808B0A6: heap_insert (heapam.c:1189)
==29929== by 0x808B19D: simple_heap_insert (heapam.c:1226)
==29929== by 0x80C28CC: AddNewAttributeTuples (heap.c:499)
==29929== by 0x80C2E36: heap_create_with_catalog (heap.c:787)
==29929== by 0x811F5AD: DefineRelation (tablecmds.c:252)
==29929== by 0x81DC9BF: ProcessUtility (utility.c:376)
==29929== by 0x81DB893: PortalRunUtility (pquery.c:780)
==29929== by 0x81DB9CE: PortalRunMulti (pquery.c:844)
==29929== by 0x81DB35C: PortalRun (pquery.c:501)
==29929== by 0x81D75E2: exec_simple_query (postgres.c:935)
==29929== by 0x81D9F95: PostgresMain (postgres.c:2984)
==29929==
==29929== Syscall param write(buf) contains uninitialised or
unaddressable byte(s)
==29929== at 0x3C1BAB28: write (in /usr/lib/debug/libc-2.3.2.so)
==29929== by 0x80B2124: XLogFlush (xlog.c:1416)
==29929== by 0x80AE348: RecordTransactionCommit (xact.c:549)
==29929== by 0x80AE82A: CommitTransaction (xact.c:930)
==29929== by 0x80AED8B: CommitTransactionCommand (xact.c:1242)
==29929== by 0x81D8934: finish_xact_command (postgres.c:1820)
==29929== by 0x81D762C: exec_simple_query (postgres.c:967)
==29929== by 0x81D9F95: PostgresMain (postgres.c:2984)
==29929== by 0x81A524E: BackendRun (postmaster.c:2662)
==29929== by 0x81A489E: BackendStartup (postmaster.c:2295)
==29929== by 0x81A2D0A: ServerLoop (postmaster.c:1165)
==29929== by 0x81A2773: PostmasterMain (postmaster.c:926)
==29929== Address 0x3C37BB57 is not stack'd, malloc'd or free'd
(These occur hundreds of times while valgrind'ing "make installcheck".)
The particular call chain that results in the XLogInsert() error is
variable; for example, here's another error report:
==29937== Use of uninitialised value of size 4
==29937== at 0x80AFA37: XLogInsert (xlog.c:555)
==29937== by 0x80978F3: _bt_split (nbtinsert.c:907)
==29937== by 0x80966A1: _bt_insertonpg (nbtinsert.c:504)
==29937== by 0x8095BB0: _bt_doinsert (nbtinsert.c:141)
==29937== by 0x809CC78: btinsert (nbtree.c:266)
==29937== by 0x826200E: OidFunctionCall6 (fmgr.c:1484)
==29937== by 0x80944FA: index_insert (indexam.c:226)
==29937== by 0x80C79E6: CatalogIndexInsert (indexing.c:121)
==29937== by 0x80C2A0B: AddNewAttributeTuples (heap.c:557)
==29937== by 0x80C2E36: heap_create_with_catalog (heap.c:787)
==29937== by 0x811F5AD: DefineRelation (tablecmds.c:252)
==29937== by 0x81DC9BF: ProcessUtility (utility.c:376)
Any thoughts on what could be causing these errors? (I looked into it,
but couldn't see anything that looked like an obvious culprit.)
-Neil
Neil Conway <neilc@samurai.com> writes:
Any thoughts on what could be causing these errors?
I suspect valgrind is complaining because XLogInsert is memcpy'ing a
struct that has allocation padding in it. Which of course is a bogus
complaint ...
regards, tom lane
Tom Lane wrote:
Neil Conway <neilc@samurai.com> writes:
Any thoughts on what could be causing these errors?
I suspect valgrind is complaining because XLogInsert is memcpy'ing a
struct that has allocation padding in it. Which of course is a bogus
complaint ...
As far as I remember (couldn't find modern documentation on the matter)
Valgrind is resitant to this problem. When a block of memory is copied,
the initialized/uninitialized status is copied along. It only complains
when an actual operation is performed using uninitialized memory. This
was developed for the explicit reason of avoiding the problem you describe.
Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/
Shachar Shemesh wrote:
Tom Lane wrote:
I suspect valgrind is complaining because XLogInsert is memcpy'ing a
struct that has allocation padding in it. Which of course is a bogus
complaint ...As far as I remember (couldn't find modern documentation on the
matter) Valgrind is resitant to this problem. When a block of memory
is copied, the initialized/uninitialized status is copied along. It
only complains when an actual operation is performed using
uninitialized memory. This was developed for the explicit reason of
avoiding the problem you describe.Shachar
Found it:
http://developer.kde.org/~sewardj/docs-2.0.0/mc_main.html, section 3.3.2
It is important to understand that your program can copy around junk
(uninitialised) data to its heart's content. Memcheck observes this
and keeps track of the data, but does not complain. A complaint is
issued only when your program attempts to make use of uninitialised data.
What IS possible, however, is that there is a bug in one of the
underlying libraries.
--
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/
I am also interested in this so I want to make some comments.
On Thu, 22 Apr 2004 Shachar Shemesh wrote :
Found it:
http://developer.kde.org/~sewardj/docs-2.0.0/mc_main.html, section 3.3.2It is important to understand that your program can copy around junk
(uninitialised) data to its heart's content. Memcheck observes this
and keeps track of the data, but does not complain. A complaint is
issued only when your program attempts to make use of uninitialised data.
I am confused by how valgrind define "make use" of data? Isn't
"copy" data a type of "make use"? I mean, if valgrind checks if the
data was used as inputs of memcpy(), it is fine. But if user uses
his own memory_copy(), which loads the data into register,
as if the data is going to be used in some useful computation,
and then copy the register value to some other memory location
to finish the copy (yeah, this IS slow), then valgrind is likely
to be confused too. It may think the data is "used".
I guess all I am saying is that valgrind _can_ still make
mistakes about it.
-Min
--
We've heard that a million monkeys at a million keyboards could produce
the complete works of Shakespeare; now, thanks to the Internet, we know
that it is not true.
--Robert Wilensky
Min Xu (Hsu) wrote:
I am confused by how valgrind define "make use" of data? Isn't
"copy" data a type of "make use"? I mean, if valgrind checks if the
data was used as inputs of memcpy(), it is fine. But if user uses
his own memory_copy(), which loads the data into register,
as if the data is going to be used in some useful computation,
and then copy the register value to some other memory location
to finish the copy (yeah, this IS slow), then valgrind is likely
to be confused too. It may think the data is "used".I guess all I am saying is that valgrind _can_ still make
mistakes about it.-Min
If I understand correctly, a data is defined to be "used" when anything
other than copying is done on it. Arithmetic operations, branches, etc.
will trigger the error. If you copy the data by adding and then
subtracting a constant from it, valgrind will complain. If all you do
(as in your example) is copy it around, and then copy it some more, it
will not.
Yes, it does keep "uninitialized" bits over your registers. Brrr.
Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/
Neil Conway <neilc@samurai.com> writes:
Valgrind'ing the postmaster yields a fair number of errors. A lot of
them are similar, such as the following:
==29929== Use of uninitialised value of size 4
==29929== at 0x80AFB80: XLogInsert (xlog.c:570)
Oh, I see the issue. Shachar is correct that valgrind doesn't complain
about copying uninitialized bytes. But it *does* complain about adding
them into a CRC ... so what we are seeing here is gripes about including
padding bytes into a CRC, or writing them out in the case of the
complaints like this one:
==29929== Syscall param write(buf) contains uninitialised or
unaddressable byte(s)
The original pad bytes may be fairly far removed from the point of the
error ... an example is that I was able to make one XLogInsert complaint
go away by changing palloc to palloc0 at tupdesc.c line 413 (in
TupleDescInitEntry), which is several memcpy's removed from the data
that gets passed to XLogInsert. valgrind's habit of propagating
undef'ness through copies isn't real helpful here.
BTW, valgrind's report about "size 4" is actively misleading, because
the only part of that struct that TupleDescInitEntry isn't careful to
set explicitly is a one-byte pad between attislocal and attinhcount.
regards, tom lane
Tom Lane wrote:
==29929== Syscall param write(buf) contains uninitialised or
unaddressable byte(s)The original pad bytes may be fairly far removed from the point of the
error ... an example is that I was able to make one XLogInsert complaint
go away by changing palloc to palloc0 at tupdesc.c line 413 (in
TupleDescInitEntry), which is several memcpy's removed from the data
that gets passed to XLogInsert.
Anything asking valgrind to give more stack output might help?
valgrind's habit of propagating
undef'ness through copies isn't real helpful here.
Well, considering the amount of false-positives you would get if you
didn't.......
If I understand this correctly, that was a real bug there, wasn't it?
BTW, valgrind's report about "size 4" is actively misleading, because
the only part of that struct that TupleDescInitEntry isn't careful to
set explicitly is a one-byte pad between attislocal and attinhcount.
You might want to report that to their bugs list. My browsing the docs
just now leads me to believe valgrind is, generally, aware that only
parts of a word can be uninitialized. You can even set it to report it
at the point where uninitialized and initialized data are merged into a
single operation.
In fact, that may help with getting the errors closer to the place where
the actual problem resides. Then again, it may cause it to generate way
more false positives.
--
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/
Shachar Shemesh <psql@shemesh.biz> writes:
Tom Lane wrote:
The original pad bytes may be fairly far removed from the point of the
error ... an example is that I was able to make one XLogInsert complaint
go away by changing palloc to palloc0 at tupdesc.c line 413 (in
TupleDescInitEntry), which is several memcpy's removed from the data
that gets passed to XLogInsert.
If I understand this correctly, that was a real bug there, wasn't it?
No, just a complete waste of time. The "uninitialized" data is just
struct padding, and it matters not what's in there.
To get rid of this class of reports we'd probably have to palloc0 rather
than palloc almost everything, and that strikes me as useless overhead.
It would make more sense to tell valgrind to suppress these particular
events in XLogInsert and XLogFlush.
AFAICS, if we actually had an uninitialized field (rather than
uninitialized padding) it would get detected at the point where the
field is used. If you run with large enough shared_buffers to avoid
having to discard pages from shmem, I think this would be detected even
across a (nominal) disk write and read.
BTW, there is something in the valgrind manual about adding hints to
teach valgrind about custom alloc/free mechanisms. Has anyone taught
it about palloc?
regards, tom lane