BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Started by PG Bug reporting formover 6 years ago8 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15943
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 12beta2
Operating system: Ubuntu 18.04
Description:

When running under valgrind (sitting on the master branch):
printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql

I get the following error:
==00:00:00:12.885 19648== Syscall param write(buf) points to uninitialised
byte(s)
==00:00:00:12.885 19648== at 0x5054281: write (write.c:27)
==00:00:00:12.885 19648== by 0x2687F0: SlruPhysicalWritePage
(slru.c:851)
==00:00:00:12.885 19648== by 0x268E95: SlruInternalWritePage
(slru.c:543)
==00:00:00:12.885 19648== by 0x2690F1: SlruSelectLRUPage (slru.c:1095)
==00:00:00:12.885 19648== by 0x269777: SimpleLruZeroPage (slru.c:269)
==00:00:00:12.885 19648== by 0x31DABA: asyncQueueAddEntries
(async.c:1379)
==00:00:00:12.885 19648== by 0x31F130: PreCommit_Notify (async.c:845)
==00:00:00:12.885 19648== by 0x273C19: CommitTransaction (xact.c:2128)
==00:00:00:12.885 19648== by 0x274A10: CommitTransactionCommand
(xact.c:2921)
==00:00:00:12.885 19648== by 0x53BD7D: finish_xact_command
(postgres.c:2597)
==00:00:00:12.885 19648== by 0x53E2F3: exec_simple_query
(postgres.c:1256)
==00:00:00:12.885 19648== by 0x540545: PostgresMain (postgres.c:4256)
==00:00:00:12.885 19648== Address 0xf4f95bf is in a rw- anonymous segment
==00:00:00:12.885 19648== Uninitialised value was created by a stack
allocation
==00:00:00:12.885 19648== at 0x31D967: asyncQueueAddEntries
(async.c:1306)

Increasing the count to 8192 produces another one, slightly different
stack:
==00:00:00:29.999 19648== Syscall param write(buf) points to uninitialised
byte(s)
==00:00:00:29.999 19648== at 0x5054281: write (write.c:27)
==00:00:00:29.999 19648== by 0x2687F0: SlruPhysicalWritePage
(slru.c:851)
==00:00:00:29.999 19648== by 0x268E95: SlruInternalWritePage
(slru.c:543)
==00:00:00:29.999 19648== by 0x26A1BD: SimpleLruTruncate (slru.c:1234)
==00:00:00:29.999 19648== by 0x31D5A5: asyncQueueAdvanceTail
(async.c:2022)
==00:00:00:29.999 19648== by 0x31F231: ProcessCompletedNotifies
(async.c:1132)
==00:00:00:29.999 19648== by 0x5403D1: PostgresMain (postgres.c:4176)
==00:00:00:29.999 19648== by 0x4B16D9: BackendRun (postmaster.c:4431)
==00:00:00:29.999 19648== by 0x4B481B: BackendStartup
(postmaster.c:4122)
==00:00:00:29.999 19648== by 0x4B4B32: ServerLoop (postmaster.c:1704)
==00:00:00:29.999 19648== by 0x4B5F70: PostmasterMain
(postmaster.c:1377)
==00:00:00:29.999 19648== by 0x406428: main (main.c:228)
==00:00:00:29.999 19648== Address 0xf4fb5bf is in a rw- anonymous segment
==00:00:00:29.999 19648== Uninitialised value was created by a stack
allocation
==00:00:00:29.999 19648== at 0x31D967: asyncQueueAddEntries
(async.c:1306)

(It's also reproduced on REL_12_BETA3, REL9_6_15.)

#2Andres Freund
andres@anarazel.de
In reply to: PG Bug reporting form (#1)
Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Hi,

On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 15943
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 12beta2
Operating system: Ubuntu 18.04
Description:

When running under valgrind (sitting on the master branch):
printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql

I get the following error:
==00:00:00:12.885 19648== Syscall param write(buf) points to
uninitialised
byte(s)
==00:00:00:12.885 19648== at 0x5054281: write (write.c:27)
==00:00:00:12.885 19648== by 0x2687F0: SlruPhysicalWritePage
(slru.c:851)
==00:00:00:12.885 19648== by 0x268E95: SlruInternalWritePage
(slru.c:543)
==00:00:00:12.885 19648== by 0x2690F1: SlruSelectLRUPage
(slru.c:1095)
==00:00:00:12.885 19648== by 0x269777: SimpleLruZeroPage
(slru.c:269)
==00:00:00:12.885 19648== by 0x31DABA: asyncQueueAddEntries
(async.c:1379)
==00:00:00:12.885 19648== by 0x31F130: PreCommit_Notify
(async.c:845)
==00:00:00:12.885 19648== by 0x273C19: CommitTransaction
(xact.c:2128)
==00:00:00:12.885 19648== by 0x274A10: CommitTransactionCommand
(xact.c:2921)
==00:00:00:12.885 19648== by 0x53BD7D: finish_xact_command
(postgres.c:2597)
==00:00:00:12.885 19648== by 0x53E2F3: exec_simple_query
(postgres.c:1256)
==00:00:00:12.885 19648== by 0x540545: PostgresMain
(postgres.c:4256)
==00:00:00:12.885 19648== Address 0xf4f95bf is in a rw- anonymous
segment
==00:00:00:12.885 19648== Uninitialised value was created by a stack
allocation
==00:00:00:12.885 19648== at 0x31D967: asyncQueueAddEntries
(async.c:1306)

Increasing the count to 8192 produces another one, slightly different
stack:
==00:00:00:29.999 19648== Syscall param write(buf) points to
uninitialised
byte(s)
==00:00:00:29.999 19648== at 0x5054281: write (write.c:27)
==00:00:00:29.999 19648== by 0x2687F0: SlruPhysicalWritePage
(slru.c:851)
==00:00:00:29.999 19648== by 0x268E95: SlruInternalWritePage
(slru.c:543)
==00:00:00:29.999 19648== by 0x26A1BD: SimpleLruTruncate
(slru.c:1234)
==00:00:00:29.999 19648== by 0x31D5A5: asyncQueueAdvanceTail
(async.c:2022)
==00:00:00:29.999 19648== by 0x31F231: ProcessCompletedNotifies
(async.c:1132)
==00:00:00:29.999 19648== by 0x5403D1: PostgresMain
(postgres.c:4176)
==00:00:00:29.999 19648== by 0x4B16D9: BackendRun
(postmaster.c:4431)
==00:00:00:29.999 19648== by 0x4B481B: BackendStartup
(postmaster.c:4122)
==00:00:00:29.999 19648== by 0x4B4B32: ServerLoop
(postmaster.c:1704)
==00:00:00:29.999 19648== by 0x4B5F70: PostmasterMain
(postmaster.c:1377)
==00:00:00:29.999 19648== by 0x406428: main (main.c:228)
==00:00:00:29.999 19648== Address 0xf4fb5bf is in a rw- anonymous
segment
==00:00:00:29.999 19648== Uninitialised value was created by a stack
allocation
==00:00:00:29.999 19648== at 0x31D967: asyncQueueAddEntries
(async.c:1306)

(It's also reproduced on REL_12_BETA3, REL9_6_15.)

That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#3Alexander Lakhin
exclusion@gmail.com
In reply to: Andres Freund (#2)
Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Hello Andres,

08.08.2019 13:33, Andres Freund wrote:

Hi,

On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 15943
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 12beta2
Operating system: Ubuntu 18.04
Description:

When running under valgrind (sitting on the master branch):
printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql

That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.

With "--track-origins=yes --read-var-info=yes" I get:

==00:00:00:30.800 24110== Syscall param write(buf) points to
uninitialised byte(s)
==00:00:00:30.801 24110==    at 0x5054281: write (write.c:27)
==00:00:00:30.801 24110==    by 0x2687F0: SlruPhysicalWritePage (slru.c:851)
==00:00:00:30.801 24110==    by 0x268E95: SlruInternalWritePage (slru.c:543)
==00:00:00:30.801 24110==    by 0x2690F1: SlruSelectLRUPage (slru.c:1095)
==00:00:00:30.801 24110==    by 0x269777: SimpleLruZeroPage (slru.c:269)
==00:00:00:30.801 24110==    by 0x31DABA: asyncQueueAddEntries
(async.c:1381)
==00:00:00:30.801 24110==    by 0x31F130: PreCommit_Notify (async.c:845)
==00:00:00:30.801 24110==    by 0x273C19: CommitTransaction (xact.c:2128)
==00:00:00:30.801 24110==    by 0x274A10: CommitTransactionCommand
(xact.c:2921)
==00:00:00:30.801 24110==    by 0x53BD7D: finish_xact_command
(postgres.c:2597)
==00:00:00:30.801 24110==    by 0x53E2F3: exec_simple_query
(postgres.c:1256)
==00:00:00:30.801 24110==    by 0x540545: PostgresMain (postgres.c:4256)
==00:00:00:30.801 24110==  Address 0xf4f95bf is in a rw- anonymous segment
==00:00:00:30.801 24110==  Uninitialised value was created by a stack
allocation
==00:00:00:30.801 24110==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

and
==00:00:00:48.177 24110== Syscall param write(buf) points to
uninitialised byte(s)
==00:00:00:48.177 24110==    at 0x5054281: write (write.c:27)
==00:00:00:48.177 24110==    by 0x2687F0: SlruPhysicalWritePage (slru.c:851)
==00:00:00:48.177 24110==    by 0x268E95: SlruInternalWritePage (slru.c:543)
==00:00:00:48.177 24110==    by 0x26A1BD: SimpleLruTruncate (slru.c:1234)
==00:00:00:48.177 24110==    by 0x31D5A5: asyncQueueAdvanceTail
(async.c:2024)
==00:00:00:48.177 24110==    by 0x31F231: ProcessCompletedNotifies
(async.c:1132)
==00:00:00:48.177 24110==    by 0x5403D1: PostgresMain (postgres.c:4176)
==00:00:00:48.177 24110==    by 0x4B16D9: BackendRun (postmaster.c:4431)
==00:00:00:48.177 24110==    by 0x4B481B: BackendStartup (postmaster.c:4122)
==00:00:00:48.177 24110==    by 0x4B4B32: ServerLoop (postmaster.c:1704)
==00:00:00:48.178 24110==    by 0x4B5F70: PostmasterMain (postmaster.c:1377)
==00:00:00:48.178 24110==    by 0x406428: main (main.c:228)
==00:00:00:48.178 24110==  Address 0xf4fb5bf is in a rw- anonymous segment
==00:00:00:48.178 24110==  Uninitialised value was created by a stack
allocation
==00:00:00:48.178 24110==    at 0x31D967: asyncQueueAddEntries
(async.c:1306)

Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
asyncQueueAddEntries fixes the issue.
More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
place fixes it too (the patch is attached).
So it seems, the padding in AsyncQueueEntry has an alibi.

Best regards,
Alexander

Attachments:

asyncQueueAddEntries-memset.patchtext/x-patch; name=asyncQueueAddEntries-memset.patchDownload+2-0
#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alexander Lakhin (#3)
Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

On 2019-Aug-08, Alexander Lakhin wrote:

Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
asyncQueueAddEntries fixes the issue.
More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
place fixes it too (the patch is attached).

Maybe VALGRIND_MAKE_MEM_DEFINED?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#4)
Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Hi,

On August 8, 2019 12:43:32 PM EDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Aug-08, Alexander Lakhin wrote:

Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
asyncQueueAddEntries fixes the issue.
More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the

same

place fixes it too (the patch is attached).

Maybe VALGRIND_MAKE_MEM_DEFINED?

Seems there's no reason to not zero initialize memory here? But perhaps just by initializing the stack variable?
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#5)
Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Andres Freund <andres@anarazel.de> writes:

Seems there's no reason to not zero initialize memory here? But perhaps just by initializing the stack variable?

I think all we need here is another suppression in
src/tools/valgrind.supp. We have not insisted on zeroing
pad bytes that get sent to disk in the places already
enumerated in valgrind.supp, so why would we apply a
different rule to async.c?

regards, tom lane

#7Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#6)
Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Hi,

On August 8, 2019 2:41:42 PM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

Seems there's no reason to not zero initialize memory here? But

perhaps just by initializing the stack variable?

I think all we need here is another suppression in
src/tools/valgrind.supp. We have not insisted on zeroing
pad bytes that get sent to disk in the places already
enumerated in valgrind.supp, so why would we apply a
different rule to async.c?

Well, with a lot of the other case there's a lot of sources for such uninitialized data. Here there probably is just one? If it weren't such a game of whack a mole, I'd even consider properly initializing the other places. And initializing the stack data here ought to be cheap in this case?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#8Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#3)
Re: BUG #15943: Valgrind-detected error in SlruPhysicalWritePage

Hello hackers,

08.08.2019 13:33, Andres Freund wrote:

On August 8, 2019 12:44:43 AM EDT, PG Bug reporting form <noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 15943
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 12beta2
Operating system: Ubuntu 18.04
Description:

When running under valgrind (sitting on the master branch):
printf "SELECT pg_notify('notify_async1',NULL); %.0s" `seq 2048` | psql

That seems likely to be padding. Probably easier to see if you enable read-var-info and track-origins.

Indeed, "memset(&qe, 0, sizeof(qe));" added in the beginning of
asyncQueueAddEntries fixes the issue.
More interestingly, "memset(qe.data, 0, sizeof(qe.data));" in the same
place fixes it too (the patch is attached).
So it seems, the padding in AsyncQueueEntry has an alibi.

Andres, you're right. The padding is found guilty (inside the qe.data
itself).
Please look at the patch addressing the cause of the problem (inspired
by btree_bit.c).
But if a valgrind suppression is preferred, look at the alternate patch.

Best regards,
Alexander

Attachments:

fix-aqe-padding.patchtext/x-patch; name=fix-aqe-padding.patchDownload+6-2
valgrind_slru_write.patchtext/x-patch; name=valgrind_slru_write.patchDownload+8-0