full_page_writes on SSD?

Started by marcin mankover 10 years ago9 messagesgeneral

marcin.mank@gmail.com

over 10 years ago

I saw this:
http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the Postgres
page size equal to the SSD page size, do we still need full_page_writes?

Regards
Marcin Mańk

Kevin.Grittner@wicourts.gov

over 10 years ago

In reply to: marcin mank (#1)

Re: full_page_writes on SSD?

On Tue, Nov 24, 2015 at 12:48 PM, Marcin Mańk <marcin.mank@gmail.com> wrote:

if SSDs have 4kB/8kB sectors, and we'd make the Postgres page
size equal to the SSD page size, do we still need full_page_writes?

If an OS write of the PostgreSQL page size has no chance of being
partially persisted (a/k/a torn), I don't think full page writes
are needed. That seems likely to be true if pg page size matches
SSD sector size.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

andres@anarazel.de

over 10 years ago

In reply to: Kevin Grittner (#2)

Re: full_page_writes on SSD?

On 2015-11-24 13:09:58 -0600, Kevin Grittner wrote:

On Tue, Nov 24, 2015 at 12:48 PM, Marcin Mańk <marcin.mank@gmail.com> wrote:

if SSDs have 4kB/8kB sectors, and we'd make the Postgres page
size equal to the SSD page size, do we still need full_page_writes?

If an OS write of the PostgreSQL page size has no chance of being
partially persisted (a/k/a torn), I don't think full page writes
are needed. That seems likely to be true if pg page size matches
SSD sector size.

At the very least it also needs to match the page size used by the OS
(4KB on x86).

But be generally wary of turning of fpw's if you use replication. Not
having them often turns a asynchronously batched write workload into one
containing a lot of synchronous, single threaded, reads. Even with SSDs
that can very quickly lead to not being able to keep up with replay
anymore.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pierce@hogranch.com

over 10 years ago

In reply to: marcin mank (#1)

Re: full_page_writes on SSD?

On 11/24/2015 10:48 AM, Marcin Mańk wrote:

I saw this:
http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the
Postgres page size equal to the SSD page size, do we still need
full_page_writes?

an SSD's actual write block is much much larger than that. they
emulate 512 or 4k sectors, but they are not actually written in sector
order, rather new writes are accumulated in a buffer on the drive, then
written out to a whole block, and a sector mapping table is maintained
by the drive.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Andres Freund (#3)

Re: full_page_writes on SSD?

On 11/24/2015 08:14 PM, Andres Freund wrote:

On 2015-11-24 13:09:58 -0600, Kevin Grittner wrote:

On Tue, Nov 24, 2015 at 12:48 PM, Marcin Mańk <marcin.mank@gmail.com> wrote:

if SSDs have 4kB/8kB sectors, and we'd make the Postgres page
size equal to the SSD page size, do we still need
full_page_writes?

If an OS write of the PostgreSQL page size has no chance of being
partially persisted (a/k/a torn), I don't think full page writes
are needed. That seems likely to be true if pg page size matches
SSD sector size.

At the very least it also needs to match the page size used by the
OS (4KB on x86).

Right. I find this possibility (when the OS and SSD page sizes match)
interesting, exactly because it might make the storage resilient to torn
pages.

But be generally wary of turning of fpw's if you use replication.
Not having them often turns a asynchronously batched write workload
into one containing a lot of synchronous, single threaded, reads.
Even with SSDs that can very quickly lead to not being able to keep
up with replay anymore.

I don't immediately see why that would happen? Can you elaborate?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: John R Pierce (#4)

Re: full_page_writes on SSD?

On 11/24/2015 08:40 PM, John R Pierce wrote:

On 11/24/2015 10:48 AM, Marcin Mańk wrote:

I saw this:
http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the
Postgres page size equal to the SSD page size, do we still need
full_page_writes?

an SSD's actual write block is much much larger than that. they
emulate 512 or 4k sectors, but they are not actually written in
sector order, rather new writes are accumulated in a buffer on the
drive, then written out to a whole block, and a sector mapping table
is maintained by the drive.

I don't see how that's related to full_page_writes?

It's true that SSDs optimize the writes in various ways, generally along
the lines you described, because they do work with "erase
blocks"(generally 256kB - 1MB chunks) and such.

But the internal structure of SSD has very little to do with FPW because
what matters is whether the on-drive write cache is volatile or not (SSD
can't really work without it).

What matters (when it comes to resiliency to torn pages) is the page
size at the OS level, because that's what's being handed over to the SSD.

Of course, there might be other benefits of further lowering page sizes
at the OS/database level (and AFAIK there are SSD drives that use pages
smaller than 4kB).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

NTPT@seznam.cz

over 10 years ago

In reply to: marcin mank (#1)

Re: full_page_writes on SSD?

Hi,

I investigate bit about SSD and how it works and need to be aligned .

And I conclude that in the ideal world we need a general --ebs=xxx switch
in various linux tools to ensure alignment. Or make calculation by had..

On the market there are SSD disks with page size 4 or 8 kb. But there is
for ssd disk typical property - the EBS - Erase Block Size. If disk operate
and write to single sector, whole Erase block must be read by driver
electronic, modified and write back to the drive.

On the market there are devices with multiple EBS sizes . 128, 256, 512 1024
1534 2048 kib etc
In my case Samsung 850evo there are 8k pages and 1536 Erase Block

So first problem with alegment - partition should start on the Erase block
bounduary . So --ebs switch in partition tools for propper aignment would
be practical. Or calculate by hand. In my sase 1536 = 3072 512b sectors.

Things get complicate if You use mdadm raid. Because Raid superblock is
located on the begining of the raid device and does not fill whole rerase
block, it is practical to set in creation of raid --offset to real
filesystem start at next erase block from the begining of raid device so
underlying filesystem would be aligned as well. so --ebs=xxx on mdadm would
be practice

And now ext4 so blocksize 4096 . because page size of ssd is 8kb , setting
stride´wit is a smallest unit on with filesystem operate in one disk to 2
to fill ssd pagesize is practical. And stripe size set as ebs/pagesize or
as whole ebs . and may be it would be useful to use ext4 --offset to edb as
well.

this should align partition, raid and filesystem. fix me if I am wrong.

And now it is turn for database storage engine. I think try to write on
erase block size bounduary and erase block size amount of data may have
some benefits not with the speed but in lower wear-out of the entire ssd
disk..

---------- Původní zpráva ----------
Od: Marcin Mańk <marcin.mank@gmail.com>
Komu: PostgreSQL <pgsql-general@postgresql.org>
Datum: 24. 11. 2015 20:07:30
Předmět: [GENERAL] full_page_writes on SSD?

"

I saw this: http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-
pages(http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages)

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the Postgres
page size equal to the SSD page size, do we still need full_page_writes?

Regards

Marcin Mańk

"=

FarjadFarid(ChkNet)

farjad.farid@checknetworks.com

over 10 years ago

In reply to: NTPT (#7)

Re: full_page_writes on SSD?

I am constantly using SSD both on my OS and database and have none of these problems.

However I don’t use SSD for O/S’s virtual memory.

From what I have read of this thread.

Potentially there could also be a situation that SSD is hitting its limit of auto recovery and has been over used.

It is well known that using SSD’s for OS’s virtual memory causes SSDs to wear out much quicker.

To really test all these. One needs to use a brand new SSD. Also ensure you are not using O/S’s virtual memory on the same SSD as DB and its log file.

You might want to also double check the language of the OS and postgresql installed. As these determine the final size of memory used to read and write.

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of NTPT
Sent: 25 November 2015 12:10
To: Marcin Mańk
Cc: PostgreSQL
Subject: Re: [GENERAL] full_page_writes on SSD?

Hi,

I investigate bit about SSD and how it works and need to be aligned .

And I conclude that in the ideal world we need a general --ebs=xxx switch in various linux tools to ensure alignment. Or make calculation by had..

On the market there are SSD disks with page size 4 or 8 kb. But there is for ssd disk typical property - the EBS - Erase Block Size. If disk operate and write to single sector, whole Erase block must be read by driver electronic, modified and write back to the drive.

On the market there are devices with multiple EBS sizes . 128, 256, 512 1024 1534 2048 kib etc
In my case Samsung 850evo there are 8k pages and 1536 Erase Block

So first problem with alegment - partition should start on the Erase block bounduary . So --ebs switch in partition tools for propper aignment would be practical. Or calculate by hand. In my sase 1536 = 3072 512b sectors.

Things get complicate if You use mdadm raid. Because Raid superblock is located on the begining of the raid device and does not fill whole rerase block, it is practical to set in creation of raid --offset to real filesystem start at next erase block from the begining of raid device so underlying filesystem would be aligned as well. so --ebs=xxx on mdadm would be practice

And now ext4 so blocksize 4096 . because page size of ssd is 8kb , setting stride´wit is a smallest unit on with filesystem operate in one disk to 2 to fill ssd pagesize is practical. And stripe size set as ebs/pagesize or as whole ebs . and may be it would be useful to use ext4 --offset to edb as well.

this should align partition, raid and filesystem. fix me if I am wrong.

And now it is turn for database storage engine. I think try to write on erase block size bounduary and erase block size amount of data may have some benefits not with the speed but in lower wear-out of the entire ssd disk..

---------- Původní zpráva ----------
Od: Marcin Mańk <marcin.mank@gmail.com>
Komu: PostgreSQL <pgsql-general@postgresql.org>
Datum: 24. 11. 2015 20:07:30
Předmět: [GENERAL] full_page_writes on SSD?

I saw this: http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the Postgres page size equal to the SSD page size, do we still need full_page_writes?

Regards

Marcin Mańk

=

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Tomas Vondra (#5)

Re: full_page_writes on SSD?

On 11/25/15 5:38 AM, Tomas Vondra wrote:

But be generally wary of turning of fpw's if you use replication.
Not having them often turns a asynchronously batched write workload
into one containing a lot of synchronous, single threaded, reads.
Even with SSDs that can very quickly lead to not being able to keep
up with replay anymore.

I don't immediately see why that would happen? Can you elaborate?

If there's no FPI records in WAL then recovery/replay has to read the
blocks from disk before it can apply the real WAL record.

Way back in the day, recovery would always do this... someone had the
bright idea around 8.0 to make use of the FPIs if they're present. IIRC
that resulted in order of magnitude improvements of recovery time in
many cases.

For SR, the effect might not be as large, if the slave is actively being
used, and if the queries hitting the slave tend to be grabbing the same
data that's being written on the master. In many environments I expect
that to be the case. But if it's not it wouldn't surprise me if it
became very easy to lag a slave as replay constantly waited for blocks
to come in.

If running with full_page_writes turned off becomes remotely common it'd
probably be worth finding a way to pre-issue read requests to the OS,
similar to what we do in some cases if effective_io_concurrency > 1.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general