Preallocation changes in Postgresql 16

Started by Riku Ikialmost 2 years ago5 messagesgeneral
Jump to latest
#1Riku Iki
riku.iki.x@gmail.com

Hi,

We have PostgreSQL server, which currently runs PostgreSQL 15 on compressed
btrfs.

I tried to migrate DB to PostgreSQL 16, and found that data is not being
compressed for PostgreSQL 16 server. One of the possible reason why btrfs
won't compress data is data preallocation.

When running "compsize" tool, I indeed see that PostgreSQL preallocating
data and it is not compressed (there is separate "preallocated" entry in
output).

I am wondering if there were preallocation related changes in PG16, and if
it is possible to disable preallocation in PostgreSQL 16?

I posted
<https://dba.stackexchange.com/questions/338906/preallocation-changes-in-postgresql-16&gt;
this on StackExchange, and someone pointed on this commit
<https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=00d1e02be24987180115e371abaeb84738257ae2&gt;
as possible reason of such behavior.

Long discussion
<https://lore.kernel.org/all/6ae85272-3967-417e-bc9a-e2141a4c688a@gmx.com/T/&gt;
on
lore.kernel.org about exactly this issue.

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Riku Iki (#1)
Re: Preallocation changes in Postgresql 16

On Fri, Apr 26, 2024 at 4:37 AM Riku Iki <riku.iki.x@gmail.com> wrote:

I am wondering if there were preallocation related changes in PG16, and if it is possible to disable preallocation in PostgreSQL 16?

I have no opinion on the btrfs details, but I was wondering if someone
might show up with a system that doesn't like that change. Here is a
magic 8, tuned on "some filesystems":

/*
* If available and useful, use posix_fallocate() (via
* FileFallocate()) to extend the relation. That's often more
* efficient than using write(), as it commonly won't cause the kernel
* to allocate page cache space for the extended pages.
*
* However, we don't use FileFallocate() for small extensions, as it
* defeats delayed allocation on some filesystems. Not clear where
* that decision should be made though? For now just use a cutoff of
* 8, anything between 4 and 8 worked OK in some local testing.
*/
if (numblocks > 8)

I wonder if it wants to be a GUC.

#3Riku Iki
riku.iki.x@gmail.com
In reply to: Thomas Munro (#2)
Re: Preallocation changes in Postgresql 16

Thank you, I have such a system. I think my task would be to compile PG
from sources(need to learn this), and see how it works with and without
that code block.

On Thu, Apr 25, 2024 at 2:25 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Show quoted text

On Fri, Apr 26, 2024 at 4:37 AM Riku Iki <riku.iki.x@gmail.com> wrote:

I am wondering if there were preallocation related changes in PG16, and

if it is possible to disable preallocation in PostgreSQL 16?

I have no opinion on the btrfs details, but I was wondering if someone
might show up with a system that doesn't like that change. Here is a
magic 8, tuned on "some filesystems":

/*
* If available and useful, use posix_fallocate() (via
* FileFallocate()) to extend the relation. That's often more
* efficient than using write(), as it commonly won't cause the
kernel
* to allocate page cache space for the extended pages.
*
* However, we don't use FileFallocate() for small extensions, as
it
* defeats delayed allocation on some filesystems. Not clear where
* that decision should be made though? For now just use a cutoff
of
* 8, anything between 4 and 8 worked OK in some local testing.
*/
if (numblocks > 8)

I wonder if it wants to be a GUC.

#4Riku Iki
riku.iki.x@gmail.com
In reply to: Riku Iki (#3)
Re: Preallocation changes in Postgresql 16

I did the testing and confirmed that this was the issue.

I run following query:

create table t as select '1234567890' from generate_series(1, 1000000000);

I commented if (numblocks > 8) codeblock, and see the following results
from "compsize /dbdir/" command.

Before my changes:

Processed 1381 files, 90007 regular extents (90010 refs), 15 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 97% 41G 42G 42G
none 100% 41G 41G 41G
zstd 14% 157M 1.0G 1.0G
prealloc 100% 16M 16M 16M

After the changes:

Processed 1381 files, 347328 regular extents (347331 refs), 15 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 3% 1.4G 42G 42G
none 100% 80K 80K 80K
zstd 3% 1.4G 42G 42G

It is clearly visible that files created with fallocate are not compressed,
and disk usage is much larger.
I am wondering if there is a way to have some feature request to have this
parameter user configurable..

On Fri, Apr 26, 2024 at 4:15 PM Riku Iki <riku.iki.x@gmail.com> wrote:

Show quoted text

Thank you, I have such a system. I think my task would be to compile PG
from sources(need to learn this), and see how it works with and without
that code block.

On Thu, Apr 25, 2024 at 2:25 PM Thomas Munro <thomas.munro@gmail.com>
wrote:

On Fri, Apr 26, 2024 at 4:37 AM Riku Iki <riku.iki.x@gmail.com> wrote:

I am wondering if there were preallocation related changes in PG16, and

if it is possible to disable preallocation in PostgreSQL 16?

I have no opinion on the btrfs details, but I was wondering if someone
might show up with a system that doesn't like that change. Here is a
magic 8, tuned on "some filesystems":

/*
* If available and useful, use posix_fallocate() (via
* FileFallocate()) to extend the relation. That's often more
* efficient than using write(), as it commonly won't cause the
kernel
* to allocate page cache space for the extended pages.
*
* However, we don't use FileFallocate() for small extensions, as
it
* defeats delayed allocation on some filesystems. Not clear where
* that decision should be made though? For now just use a cutoff
of
* 8, anything between 4 and 8 worked OK in some local testing.
*/
if (numblocks > 8)

I wonder if it wants to be a GUC.

#5Pierre Barre
pierre@barre.sh
In reply to: Riku Iki (#4)
Re: Preallocation changes in Postgresql 16

Hello,

It seems that I am running into this issue as well.
Is it likely that this would ever be a config option?

Best,
Pierre Barre

Show quoted text

On Fri, May 3, 2024, at 05:11, Riku Iki wrote:

I did the testing and confirmed that this was the issue.

I run following query:

create table t as select '1234567890' from generate_series(1, 1000000000);

I commented if (numblocks > 8) codeblock, and see the following results from "compsize /dbdir/" command.

Before my changes:

Processed 1381 files, 90007 regular extents (90010 refs), 15 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 97% 41G 42G 42G
none 100% 41G 41G 41G
zstd 14% 157M 1.0G 1.0G
prealloc 100% 16M 16M 16M

After the changes:

Processed 1381 files, 347328 regular extents (347331 refs), 15 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 3% 1.4G 42G 42G
none 100% 80K 80K 80K
zstd 3% 1.4G 42G 42G

It is clearly visible that files created with fallocate are not compressed, and disk usage is much larger.
I am wondering if there is a way to have some feature request to have this parameter user configurable..

On Fri, Apr 26, 2024 at 4:15 PM Riku Iki <riku.iki.x@gmail.com> wrote:

Thank you, I have such a system. I think my task would be to compile PG from sources(need to learn this), and see how it works with and without that code block.

On Thu, Apr 25, 2024 at 2:25 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Apr 26, 2024 at 4:37 AM Riku Iki <riku.iki.x@gmail.com> wrote:

I am wondering if there were preallocation related changes in PG16, and if it is possible to disable preallocation in PostgreSQL 16?

I have no opinion on the btrfs details, but I was wondering if someone
might show up with a system that doesn't like that change. Here is a
magic 8, tuned on "some filesystems":

/*
* If available and useful, use posix_fallocate() (via
* FileFallocate()) to extend the relation. That's often more
* efficient than using write(), as it commonly won't cause the kernel
* to allocate page cache space for the extended pages.
*
* However, we don't use FileFallocate() for small extensions, as it
* defeats delayed allocation on some filesystems. Not clear where
* that decision should be made though? For now just use a cutoff of
* 8, anything between 4 and 8 worked OK in some local testing.
*/
if (numblocks > 8)

I wonder if it wants to be a GUC.