valgrind error
I was just trying to revive lousyjack, my valgrind buildfarm animal
which has been offline for 12 days, after having upgraded the machine
(fedora 31, gcc 9.3.1, valgrind 3.15) and noticed lots of errors like this:
2020-04-17 19:26:03.483 EDT [63741:3] pg_regress LOG: statement: CREATE
DATABASE "regression" TEMPLATE=template0
==63717== VALGRINDERROR-BEGIN
==63717== Use of uninitialised value of size 8
==63717== at 0xAC5BB5: pg_comp_crc32c_sb8 (pg_crc32c_sb8.c:82)
==63717== by 0x55A98B: XLogRecordAssemble (xloginsert.c:785)
==63717== by 0x55A268: XLogInsert (xloginsert.c:461)
==63717== by 0x8BC9E0: LogCurrentRunningXacts (standby.c:1005)
==63717== by 0x8BC8F9: LogStandbySnapshot (standby.c:961)
==63717== by 0x550CB3: CreateCheckPoint (xlog.c:8937)
==63717== by 0x82A3B2: CheckpointerMain (checkpointer.c:441)
==63717== by 0x56347D: AuxiliaryProcessMain (bootstrap.c:453)
==63717== by 0x83CA18: StartChildProcess (postmaster.c:5474)
==63717== by 0x83A120: reaper (postmaster.c:3045)
==63717== by 0x4874B1F: ??? (in /usr/lib64/libpthread-2.30.so)
==63717== by 0x5056F29: select (in /usr/lib64/libc-2.30.so)
==63717== by 0x8380A0: ServerLoop (postmaster.c:1691)
==63717== by 0x837A1F: PostmasterMain (postmaster.c:1400)
==63717== by 0x74A71D: main (main.c:210)
==63717== Uninitialised value was created by a stack allocation
==63717== at 0x8BC942: LogCurrentRunningXacts (standby.c:984)
==63717==
==63717== VALGRINDERROR-END
{
<insert_a_suppression_name_here>
Memcheck:Value8
fun:pg_comp_crc32c_sb8
fun:XLogRecordAssemble
fun:XLogInsert
fun:LogCurrentRunningXacts
fun:LogStandbySnapshot
fun:CreateCheckPoint
fun:CheckpointerMain
fun:AuxiliaryProcessMain
fun:StartChildProcess
fun:reaper
obj:/usr/lib64/libpthread-2.30.so
fun:select
fun:ServerLoop
fun:PostmasterMain
fun:main
}
I can't see what the problem is immediately.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/18/20 9:15 AM, Andrew Dunstan wrote:
I was just trying to revive lousyjack, my valgrind buildfarm animal
which has been offline for 12 days, after having upgraded the machine
(fedora 31, gcc 9.3.1, valgrind 3.15) and noticed lots of errors like this:2020-04-17 19:26:03.483 EDT [63741:3] pg_regress LOG: statement: CREATE
DATABASE "regression" TEMPLATE=template0
==63717== VALGRINDERROR-BEGIN
==63717== Use of uninitialised value of size 8
==63717== at 0xAC5BB5: pg_comp_crc32c_sb8 (pg_crc32c_sb8.c:82)
==63717== by 0x55A98B: XLogRecordAssemble (xloginsert.c:785)
==63717== by 0x55A268: XLogInsert (xloginsert.c:461)
==63717== by 0x8BC9E0: LogCurrentRunningXacts (standby.c:1005)
==63717== by 0x8BC8F9: LogStandbySnapshot (standby.c:961)
==63717== by 0x550CB3: CreateCheckPoint (xlog.c:8937)
==63717== by 0x82A3B2: CheckpointerMain (checkpointer.c:441)
==63717== by 0x56347D: AuxiliaryProcessMain (bootstrap.c:453)
==63717== by 0x83CA18: StartChildProcess (postmaster.c:5474)
==63717== by 0x83A120: reaper (postmaster.c:3045)
==63717== by 0x4874B1F: ??? (in /usr/lib64/libpthread-2.30.so)
==63717== by 0x5056F29: select (in /usr/lib64/libc-2.30.so)
==63717== by 0x8380A0: ServerLoop (postmaster.c:1691)
==63717== by 0x837A1F: PostmasterMain (postmaster.c:1400)
==63717== by 0x74A71D: main (main.c:210)
==63717== Uninitialised value was created by a stack allocation
==63717== at 0x8BC942: LogCurrentRunningXacts (standby.c:984)
==63717==
==63717== VALGRINDERROR-END
{
<insert_a_suppression_name_here>
Memcheck:Value8
fun:pg_comp_crc32c_sb8
fun:XLogRecordAssemble
fun:XLogInsert
fun:LogCurrentRunningXacts
fun:LogStandbySnapshot
fun:CreateCheckPoint
fun:CheckpointerMain
fun:AuxiliaryProcessMain
fun:StartChildProcess
fun:reaper
obj:/usr/lib64/libpthread-2.30.so
fun:select
fun:ServerLoop
fun:PostmasterMain
fun:main
}
After many hours of testing I have a culprit for this. The error appears
with valgrind 3.15.0 with everything else held constant. 3.14.0 does
not produce the problem. So lousyjack will be back on the air before long.
Here are the build flags it's using:
CFLAGS=-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncatio
n -g -fno-omit-frame-pointer -O0 -fPIC
CPPFLAGS=-DUSE_VALGRIND -DRELCACHE_FORCE_RELEASE -D_GNU_SOURCE
-I/usr/include/libxml2
and valgrind is invoked like this:
valgrind --quiet --trace-children=yes --track-origins=yes
--read-var-info=yes --num-callers=20 --leak-check=no
--gen-suppressions=all --error-limit=no
--suppressions=../pgsql/src/tools/valgrind.supp
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END bin/postgres -D data-C
Does anyone see anything here that needs tweaking?
Note that this is quite an old machine:
andrew@freddo:bf (master)*$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 16
Model: 6
Model name: AMD Athlon(tm) II X2 215 Processor
Stepping: 2
CPU MHz: 2700.000
CPU max MHz: 2700.0000
CPU min MHz: 800.0000
BogoMIPS: 5425.13
Virtualization: AMD-V
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy
svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save
I did not manage to reproduce this anywhere else, tried on various
physical, Virtualbox and Docker instances.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, May 10, 2020 at 09:29:05AM -0400, Andrew Dunstan wrote:
On 4/18/20 9:15 AM, Andrew Dunstan wrote:
I was just trying to revive lousyjack, my valgrind buildfarm animal
which has been offline for 12 days, after having upgraded the machine
(fedora 31, gcc 9.3.1, valgrind 3.15) and noticed lots of errors like this:
{
�� <insert_a_suppression_name_here>
�� Memcheck:Value8
�� fun:pg_comp_crc32c_sb8
�� fun:XLogRecordAssemble
�� fun:XLogInsert
�� fun:LogCurrentRunningXacts
�� fun:LogStandbySnapshot
�� fun:CreateCheckPoint
�� fun:CheckpointerMain
�� fun:AuxiliaryProcessMain
�� fun:StartChildProcess
�� fun:reaper
�� obj:/usr/lib64/libpthread-2.30.so
�� fun:select
�� fun:ServerLoop
�� fun:PostmasterMain
�� fun:main
}
After many hours of testing I have a culprit for this. The error appears
with valgrind 3.15.0� with everything else held constant. 3.14.0� does
not produce the problem.
I suspect 3.15.0 is just better at tracking the uninitialized data. A
more-remote possibility is valgrind-3.14.0 emulating sse42. That would make
pg_crc32c_sse42_available() return true, avoiding the pg_comp_crc32c_sb8().
andrew@freddo:bf (master)*$ lscpu
...
Flags:�������������� fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy
svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_saveI did not manage to reproduce this anywhere else, tried on various
physical, Virtualbox and Docker instances.
I can reproduce this on a 2017-vintage CPU with ./configure
... USE_SLICING_BY_8_CRC32C=1 and then running "make installcheck-parallel"
under valgrind-3.15.0 (as packaged by RHEL 7.8). valgrind.supp has a
suppression for CRC calculations, but it didn't get the memo when commit
4f700bc renamed the function. The attached patch fixes the suppression.
Attachments:
crc32c-supp-v1.patchtext/plain; charset=us-asciiDownload
Author: Noah Misch <noah@leadboat.com>
Commit: Noah Misch <noah@leadboat.com>
Refresh function name in CRC-associated Valgrind suppressions.
Back-patch to 9.5, where commit 4f700bcd20c087f60346cb8aefd0e269be8e2157
first appeared.
Reviewed by FIXME. Reported by Andrew Dunstan.
Discussion: https://postgr.es/m/4dfabec2-a3ad-0546-2d62-f816c97edd0c@2ndQuadrant.com
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index ec47a22..acdb620 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -45,7 +45,7 @@
padding_XLogRecData_CRC
Memcheck:Value8
- fun:pg_comp_crc32c
+ fun:pg_comp_crc32c*
fun:XLogRecordAssemble
}
@@ -89,7 +89,7 @@
{
padding_twophase_CRC
Memcheck:Value8
- fun:pg_comp_crc32c
+ fun:pg_comp_crc32c*
fun:EndPrepare
}
Noah Misch <noah@leadboat.com> writes:
I can reproduce this on a 2017-vintage CPU with ./configure
... USE_SLICING_BY_8_CRC32C=1 and then running "make installcheck-parallel"
under valgrind-3.15.0 (as packaged by RHEL 7.8). valgrind.supp has a
suppression for CRC calculations, but it didn't get the memo when commit
4f700bc renamed the function. The attached patch fixes the suppression.
I can also reproduce this, on RHEL 8.2 which likewise has valgrind-3.15.0,
using the same configuration to force use of that CRC function. I concur
with your diagnosis that this is just a missed update of the pre-existing
suppression rule. However, rather than
- fun:pg_comp_crc32c
+ fun:pg_comp_crc32c*
as you have it, I'd prefer to use
- fun:pg_comp_crc32c
+ fun:pg_comp_crc32c_sb8
which precisely matches what 4f700bc did. The other way seems like
it's giving a free pass to problems that could lurk in unrelated CRC
implementations.
regards, tom lane
On Fri, Jun 05, 2020 at 12:17:54PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
I can reproduce this on a 2017-vintage CPU with ./configure
... USE_SLICING_BY_8_CRC32C=1 and then running "make installcheck-parallel"
under valgrind-3.15.0 (as packaged by RHEL 7.8). valgrind.supp has a
suppression for CRC calculations, but it didn't get the memo when commit
4f700bc renamed the function. The attached patch fixes the suppression.I can also reproduce this, on RHEL 8.2 which likewise has valgrind-3.15.0,
using the same configuration to force use of that CRC function. I concur
with your diagnosis that this is just a missed update of the pre-existing
suppression rule. However, rather than- fun:pg_comp_crc32c + fun:pg_comp_crc32c*as you have it, I'd prefer to use
- fun:pg_comp_crc32c + fun:pg_comp_crc32c_sb8which precisely matches what 4f700bc did. The other way seems like
it's giving a free pass to problems that could lurk in unrelated CRC
implementations.
The undefined data is in the CRC input, namely the padding bytes in xl_*
structs. Apparently, valgrind-3.15.0 doesn't complain about undefined input
to _mm_crc32_u* functions. We should not be surprised if Valgrind gains the
features necessary to complain about the other implementations.
Most COMP_CRC32C callers don't have a suppression, so Valgrind still studies
each CRC implementation via those other callers.
Noah Misch <noah@leadboat.com> writes:
On Fri, Jun 05, 2020 at 12:17:54PM -0400, Tom Lane wrote:
as you have it, I'd prefer to use
- fun:pg_comp_crc32c
+ fun:pg_comp_crc32c_sb8
which precisely matches what 4f700bc did. The other way seems like
it's giving a free pass to problems that could lurk in unrelated CRC
implementations.
The undefined data is in the CRC input, namely the padding bytes in xl_*
structs.
Oh, I see. Objection withdrawn.
Apparently, valgrind-3.15.0 doesn't complain about undefined input
to _mm_crc32_u* functions. We should not be surprised if Valgrind gains the
features necessary to complain about the other implementations.
Perhaps it already has ... I wonder if anyone's tried this on ARMv8
lately.
regards, tom lane
I wrote:
Noah Misch <noah@leadboat.com> writes:
Apparently, valgrind-3.15.0 doesn't complain about undefined input
to _mm_crc32_u* functions. We should not be surprised if Valgrind gains the
features necessary to complain about the other implementations.
Perhaps it already has ... I wonder if anyone's tried this on ARMv8
lately.
I installed Fedora 32/aarch64 on a Raspberry Pi 3B+, and can report that
valgrind 3.16.0 is just as blind to this problem in pg_comp_crc32c_armv8
as it is in pg_comp_crc32c_sse42. Seems odd, but there you have it.
(There are some other issues, but they seem fit for separate threads.)
regards, tom lane