Crash: invalid DSA memory alloc request
Hello,
I'm running a couple of large tests, and in this particular test I have
a few million tables more.
At some point it fails, and I gathered the following trace:
2024-12-12 22:22:55.307 CET [1496210] ERROR: invalid DSA memory alloc
request size 1073741824
2024-12-12 22:22:55.307 CET [1496210] BACKTRACE:
postgres: ads tabletest [local] CREATE TABLE(+0x15e570)
[0x6309c379c570]
postgres: ads tabletest [local] CREATE
TABLE(dshash_find_or_insert+0x1a4) [0x6309c39882d4]
postgres: ads tabletest [local] CREATE
TABLE(pgstat_get_entry_ref+0x440) [0x6309c3b0a530]
postgres: ads tabletest [local] CREATE
TABLE(pgstat_prep_pending_entry+0x3a) [0x6309c3b0676a]
postgres: ads tabletest [local] CREATE
TABLE(pgstat_assoc_relation+0x32) [0x6309c3b086c2]
postgres: ads tabletest [local] CREATE
TABLE(StartReadBuffer+0x3c0) [0x6309c3ab9870]
postgres: ads tabletest [local] CREATE
TABLE(ReadBufferExtended+0xa1) [0x6309c3abb271]
postgres: ads tabletest [local] CREATE TABLE(+0x2c6caa)
[0x6309c3904caa]
postgres: ads tabletest [local] CREATE
TABLE(AlterSequence+0xc0) [0x6309c3905860]
postgres: ads tabletest [local] CREATE TABLE(+0x4b6336)
[0x6309c3af4336]
postgres: ads tabletest [local] CREATE
TABLE(standard_ProcessUtility+0x259) [0x6309c3af33f9]
postgres: ads tabletest [local] CREATE TABLE(+0x4b6e64)
[0x6309c3af4e64]
postgres: ads tabletest [local] CREATE
TABLE(standard_ProcessUtility+0x259) [0x6309c3af33f9]
postgres: ads tabletest [local] CREATE TABLE(+0x4b3d2f)
[0x6309c3af1d2f]
postgres: ads tabletest [local] CREATE TABLE(+0x4b3e4b)
[0x6309c3af1e4b]
postgres: ads tabletest [local] CREATE TABLE(PortalRun+0x16f)
[0x6309c3af226f]
postgres: ads tabletest [local] CREATE TABLE(+0x4b06cc)
[0x6309c3aee6cc]
postgres: ads tabletest [local] CREATE
TABLE(PostgresMain+0xf67) [0x6309c3aefa87]
postgres: ads tabletest [local] CREATE TABLE(+0x4accc5)
[0x6309c3aeacc5]
postgres: ads tabletest [local] CREATE
TABLE(postmaster_child_launch+0x8f) [0x6309c3a5b95f]
postgres: ads tabletest [local] CREATE TABLE(+0x421479)
[0x6309c3a5f479]
postgres: ads tabletest [local] CREATE
TABLE(PostmasterMain+0xd71) [0x6309c3a61251]
postgres: ads tabletest [local] CREATE TABLE(main+0x207)
[0x6309c379efc7]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x710c33a2a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)
[0x710c33a2a28b]
postgres: ads tabletest [local] CREATE TABLE(_start+0x25)
[0x6309c379f595]
2024-12-12 22:22:55.307 CET [1496210] STATEMENT: CREATE TABLE IF NOT
EXISTS test_16718629 (id SERIAL PRIMARY KEY, d VARCHAR(200), e
VARCHAR(200), f VARCHAR(200), i INTEGER, j INTEGER);
PostgreSQL Version is 17.2, compiled with debug symbols.
tabletest=# select version();
version
--------------------------------------------------------------------------------------------------
PostgreSQL 17.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
13.2.0-23ubuntu4) 13.2.0, 64-bit
(1 row)
I'm not able to reproduce this for every DDL statement, but grouping
together about 50 of them it fails at some point.
Regards,
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
On Thu, 12 Dec 2024 at 22:28, Andreas 'ads' Scherbaum <ads@pgug.de> wrote:
Hello,
I'm running a couple of large tests, and in this particular test I have
a few million tables more.At some point it fails, and I gathered the following trace:
2024-12-12 22:22:55.307 CET [1496210] ERROR: invalid DSA memory alloc
request size 1073741824
2024-12-12 22:22:55.307 CET [1496210] BACKTRACE:
postgres: ads tabletest [local] CREATE TABLE(+0x15e570)
[0x6309c379c570]
postgres: ads tabletest [local] CREATE
TABLE(dshash_find_or_insert+0x1a4) [0x6309c39882d4]
postgres: ads tabletest [local] CREATE
TABLE(pgstat_get_entry_ref+0x440) [0x6309c3b0a530]
It looks like the dshash table used in the pgstats system uses
resize(), which only specifies DSA_ALLOC_ZERO, not DSA_ALLOC_HUGE,
causing issues when the table grows larger than 1 GB.
I expect that error to disappear when you replace the
dsa_allocate0(...) call in dshash.c's resize function with
dsa_allocate_extended(..., DSA_ALLOC_HUGE | DSA_ALLOC_ZERO) as
attached, but haven't tested it due to a lack of database with
millions of relations.
Kind regards,
Matthias van de Meent
Attachments:
dshash_pgstat_fix.patchapplication/octet-stream; name=dshash_pgstat_fix.patchDownload
From 5b8e5e92de0ff68e058b31f450fa55211af7c3a8 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Thu, 12 Dec 2024 22:47:04 +0100
Subject: [PATCH] Fix "invalid DSA memory alloc request size" in pgstat
PGStat may use more than 1GB of DSA segments in its dshash table, so allow the DSHash system to allocate such sizes.
---
src/backend/lib/dshash.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index 93a9e21ddd209..c364a38f12d54 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -887,8 +887,9 @@ resize(dshash_table *hash_table, size_t new_size_log2)
Assert(new_size_log2 == hash_table->control->size_log2 + 1);
/* Allocate the space for the new table. */
- new_buckets_shared = dsa_allocate0(hash_table->area,
- sizeof(dsa_pointer) * new_size);
+ new_buckets_shared =
+ dsa_allocate_extended(hash_table->area, sizeof(dsa_pointer) * new_size,
+ DSA_ALLOC_HUGE | DSA_ALLOC_ZERO);
new_buckets = dsa_get_address(hash_table->area, new_buckets_shared);
/*
On 12/12/2024 22:49, Matthias van de Meent wrote:
On Thu, 12 Dec 2024 at 22:28, Andreas 'ads' Scherbaum <ads@pgug.de> wrote:
Hello,
I'm running a couple of large tests, and in this particular test I have
a few million tables more.At some point it fails, and I gathered the following trace:
2024-12-12 22:22:55.307 CET [1496210] ERROR: invalid DSA memory alloc
request size 1073741824
2024-12-12 22:22:55.307 CET [1496210] BACKTRACE:
postgres: ads tabletest [local] CREATE TABLE(+0x15e570)
[0x6309c379c570]
postgres: ads tabletest [local] CREATE
TABLE(dshash_find_or_insert+0x1a4) [0x6309c39882d4]
postgres: ads tabletest [local] CREATE
TABLE(pgstat_get_entry_ref+0x440) [0x6309c3b0a530]It looks like the dshash table used in the pgstats system uses
resize(), which only specifies DSA_ALLOC_ZERO, not DSA_ALLOC_HUGE,
causing issues when the table grows larger than 1 GB.I expect that error to disappear when you replace the
dsa_allocate0(...) call in dshash.c's resize function with
dsa_allocate_extended(..., DSA_ALLOC_HUGE | DSA_ALLOC_ZERO) as
attached, but haven't tested it due to a lack of database with
millions of relations.
IIUC the table is doubled in size when filled over 75%, so we went from
500MB to 1GB here, doubling the number of available buckets.
It's probably good up to a point but the size limit is exceed here only
by 1 byte and 1GB-1 are hopefully more than enough pointers.
Is it interesting to revisit the logic to increase size less quickly
(over 500MB) ? (if at all possible given how buckets and partitions are
managed).
There is this comment in 8c0d7bafad3 which introduce this "dshash":
There is a wide range of potential users for such a hash table, though
it's very likely the interface will need to evolve as we come to
understand the needs of different kinds of users. E.g support for
iterators and incremental resizing is planned for later commits and the
details of the callback signatures are likely to change.
I'm unsure iterators and incremental resizing has made it ?
---
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D
Hi Matthias,
On Thu, Dec 12, 2024 at 10:49 PM Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:
On Thu, 12 Dec 2024 at 22:28, Andreas 'ads' Scherbaum <ads@pgug.de> wrote:
Hello,
I'm running a couple of large tests, and in this particular test I have
a few million tables more.At some point it fails, and I gathered the following trace:
2024-12-12 22:22:55.307 CET [1496210] ERROR: invalid DSA memory alloc
request size 1073741824
2024-12-12 22:22:55.307 CET [1496210] BACKTRACE:
postgres: ads tabletest [local] CREATE TABLE(+0x15e570)
[0x6309c379c570]
postgres: ads tabletest [local] CREATE
TABLE(dshash_find_or_insert+0x1a4) [0x6309c39882d4]
postgres: ads tabletest [local] CREATE
TABLE(pgstat_get_entry_ref+0x440) [0x6309c3b0a530]It looks like the dshash table used in the pgstats system uses
resize(), which only specifies DSA_ALLOC_ZERO, not DSA_ALLOC_HUGE,
causing issues when the table grows larger than 1 GB.I expect that error to disappear when you replace the
dsa_allocate0(...) call in dshash.c's resize function with
dsa_allocate_extended(..., DSA_ALLOC_HUGE | DSA_ALLOC_ZERO) as
attached, but haven't tested it due to a lack of database with
millions of relations.
Can confirm that the crash no longer happens when applying your patch.
Was able to both continue the old and crashed test, as well as run a new
test:
tabletest=# select count(*) from information_schema.tables;
count
----------
20000211
(1 row)
Thanks,
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
On Mon, Dec 16, 2024 at 08:00:00AM +0100, Andreas 'ads' Scherbaum wrote:
Can confirm that the crash no longer happens when applying your patch.
The patch looks reasonable to me. I'll commit it soon unless someone
objects. I was surprised to learn that the DSA_ALLOC_HUGE flag is only
intended to catch faulty allocation requests [0]/messages/by-id/28062.1487456862@sss.pgh.pa.us.
Was able to both continue the old and crashed test, as well as run a new
test:tabletest=# select count(*) from information_schema.tables;
count
----------
20000211
(1 row)
That's a lot of tables...
[0]: /messages/by-id/28062.1487456862@sss.pgh.pa.us
--
nathan
Hello,
On Mon, Dec 16, 2024 at 11:18 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
On Mon, Dec 16, 2024 at 08:00:00AM +0100, Andreas 'ads' Scherbaum wrote:
Can confirm that the crash no longer happens when applying your patch.
The patch looks reasonable to me. I'll commit it soon unless someone
objects. I was surprised to learn that the DSA_ALLOC_HUGE flag is only
intended to catch faulty allocation requests [0].
Is there a way to test it, except by creating so many tables?
There might be more such problems.
I did run a few basic queries in the database, but that's far from a full
test.
Was able to both continue the old and crashed test, as well as run a new
test:tabletest=# select count(*) from information_schema.tables;
count
----------
20000211
(1 row)That's a lot of tables...
Started as a discussion, got me curious and it's only about a magnitude or
so off
from what I've seen in production.
Not unrealistic to find out when and where it breaks.
Thanks,
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
On Mon, Dec 16, 2024 at 04:18:26PM -0600, Nathan Bossart wrote:
On Mon, Dec 16, 2024 at 08:00:00AM +0100, Andreas 'ads' Scherbaum wrote:
Can confirm that the crash no longer happens when applying your patch.
The patch looks reasonable to me. I'll commit it soon unless someone
objects. I was surprised to learn that the DSA_ALLOC_HUGE flag is only
intended to catch faulty allocation requests [0].
No objections.
Most likely this issue gets by a large degree easier to reach now that
we can plug into the backend custom pgstats kinds. If pgstats or an
equivalent implementation uses pgstats, I don't think that we'll be
able to live without lifting this limit (500k query entries are
common, at 2kB each it would be enough to blow things), so using
DSA_ALLOC_HUGE sounds good to me. I don't see a huge point in
backpatching, FWIW.
20000211
That's a lot of tables...
And most likely don't do that. If you want to play more in this area,
there is also the join-1M-tables-in-a-single-query game.
--
Michael
Hi,
On 2024-12-17 16:50:45 +0900, Michael Paquier wrote:
On Mon, Dec 16, 2024 at 04:18:26PM -0600, Nathan Bossart wrote:
On Mon, Dec 16, 2024 at 08:00:00AM +0100, Andreas 'ads' Scherbaum wrote:
Can confirm that the crash no longer happens when applying your patch.
The patch looks reasonable to me. I'll commit it soon unless someone
objects. I was surprised to learn that the DSA_ALLOC_HUGE flag is only
intended to catch faulty allocation requests [0].No objections.
Most likely this issue gets by a large degree easier to reach now that
we can plug into the backend custom pgstats kinds. If pgstats or an
equivalent implementation uses pgstats, I don't think that we'll be
able to live without lifting this limit (500k query entries are
common, at 2kB each it would be enough to blow things), so using
DSA_ALLOC_HUGE sounds good to me. I don't see a huge point in
backpatching, FWIW.
I don't see why we wouldn't want to backpatch? The number of objects here
isn't entirely unrealistic to reach with relations alone, and if you enable
e.g. function execution stats it can reasonably reach higher numbers more
quickly. And use DSA_ALLOC_HUGE in that place feels like a rather low risk
change?
Greetings,
Andres Freund
On Tue, Dec 17, 2024 at 10:53:07AM -0500, Andres Freund wrote:
On 2024-12-17 16:50:45 +0900, Michael Paquier wrote:
I don't see a huge point in backpatching, FWIW.
I don't see why we wouldn't want to backpatch? The number of objects here
isn't entirely unrealistic to reach with relations alone, and if you enable
e.g. function execution stats it can reasonably reach higher numbers more
quickly. And use DSA_ALLOC_HUGE in that place feels like a rather low risk
change?
Agreed, this feels low-risk enough to back-patch to at least v15, where
statistics were moved to shared memory. But I don't see a strong reason to
avoid back-patching it to all supported versions, either.
--
nathan
On 2024-12-17 15:32:06 -0600, Nathan Bossart wrote:
Committed.
Thanks!
On 17/12/2024 22:32, Nathan Bossart wrote:
Committed.
Thanks, I see you backpatched it all the way to 13.
Will see how far back I can test this, will take a while.
Regards,
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
On Tue, Dec 17, 2024 at 04:47:31PM -0500, Andres Freund wrote:
On 2024-12-17 15:32:06 -0600, Nathan Bossart wrote:
Committed.
Thanks!
Thanks, Nathan!
--
Michael
On Wed, Dec 18, 2024 at 2:42 AM Andreas 'ads' Scherbaum <ads@pgug.de> wrote:
On 17/12/2024 22:32, Nathan Bossart wrote:
Committed.
Thanks, I see you backpatched it all the way to 13.
Will see how far back I can test this, will take a while.
Was able to test HEAD in all branches back to 13, no crash seen.
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project