PostgreSQL win32 fragmentation issue
Hello,
I ran a simple test... I ran defrag on my drives. Then I initialized
pgbench with a -s 1000.
11% fragmentation. I dropped the bench database, and my fragmentation is 1%.
I know this isn't *our* fault :) but I am curious if there is anything
we can do about the way postgresql writes files to help limit fragmentation.
Essentially, this makes win32 impossible in a 24x7 environment (jokes
aside about Win32 in general) because we *have* to defrag on Windows and
Windows won't defrag open files (thus anything PostgreSQL is using).
Sincerely,
Joshua D. Drake
On Fri, Dec 01, 2006 at 10:58:44PM -0800, Joshua D. Drake wrote:
I know this isn't *our* fault :) but I am curious if there is anything
we can do about the way postgresql writes files to help limit fragmentation.Essentially, this makes win32 impossible in a 24x7 environment (jokes
aside about Win32 in general) because we *have* to defrag on Windows and
Windows won't defrag open files (thus anything PostgreSQL is using).
I thought fragmentation was something that disappeared with the FAT
filesystem. Isn't NTFS smart enought o avoid fragmentation in the first
place?
The way you avoid fragmentation is by preallocating larger blocks, but
most filesystems are smart enough to handle that somehwta
automatically.
BTW, do you know what 11% fragmentation means? Does that mean each file
is on average split in 9 pieces, because for a 1GB file, 9 pieces isn't
all that bad.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Fri, Dec 01, 2006 at 10:58:44PM -0800, Joshua D. Drake wrote:
I know this isn't *our* fault :) but I am curious if there is anything
we can do about the way postgresql writes files to help limit fragmentation.Essentially, this makes win32 impossible in a 24x7 environment (jokes
aside about Win32 in general) because we *have* to defrag on Windows and
Windows won't defrag open files (thus anything PostgreSQL is using).BTW, do you know what 11% fragmentation means? Does that mean each file
is on average split in 9 pieces, because for a 1GB file, 9 pieces isn't
all that bad.
Or more specifically do you have any idea what it means in terms of
performance? And why do you say that you *have to* defrag under
windows, isn't this only a performance issue and not necessarily a huge
one at that? Also, as a solution, I would think that CLUSTER might help.
Matt
On Sat, 2006-12-02 at 14:31 +0100, Martijn van Oosterhout wrote:
On Fri, Dec 01, 2006 at 10:58:44PM -0800, Joshua D. Drake wrote:
I know this isn't *our* fault :) but I am curious if there is anything
we can do about the way postgresql writes files to help limit fragmentation.Essentially, this makes win32 impossible in a 24x7 environment (jokes
aside about Win32 in general) because we *have* to defrag on Windows and
Windows won't defrag open files (thus anything PostgreSQL is using).I thought fragmentation was something that disappeared with the FAT
filesystem. Isn't NTFS smart enought o avoid fragmentation in the first
place?
Nope... you still have the good old defrag command (well button now).
BTW, do you know what 11% fragmentation means? Does that mean each file
is on average split in 9 pieces, because for a 1GB file, 9 pieces isn't
all that bad.
No, the report had Total Fragmentation, and File Fragmentation. I was
reporting on Total, which I assume is some aggregation.
My concern is that this is over a single bench run. I could imagine that
after a week or two weeks of stead PostgreSQL use, the IO would
gradually get worse and worse.
Joshua D. Drake
Have a nice day,
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Joshua D. Drake wrote:
My concern is that this is over a single bench run. I could imagine that
after a week or two weeks of stead PostgreSQL use, the IO would
gradually get worse and worse.
Well, we mostly do random reads (seek) all the time anyway, so is this
really a concern?
Are you using NTFS or FAT? You didn't answer that question.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Or more specifically do you have any idea what it means in terms of
performance? And why do you say that you *have to* defrag under
windows, isn't this only a performance issue and not necessarily a huge
one at that? Also, as a solution, I would think that CLUSTER might help.
Yes it is a performance issue. And all performance issues if they can
not be resolved within the database or application, eventually become an
outage.
A CLUSTER may, but that puts us back to an outage. A CLUSTER is an
exclusive lock. Usually for long periods of time.
Joshua D. Drake
Matt
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Sat, 2006-12-02 at 13:34 -0300, Alvaro Herrera wrote:
Joshua D. Drake wrote:
My concern is that this is over a single bench run. I could imagine that
after a week or two weeks of stead PostgreSQL use, the IO would
gradually get worse and worse.Well, we mostly do random reads (seek) all the time anyway, so is this
really a concern?
Wouldn't it be for writes?
Are you using NTFS or FAT? You didn't answer that question.
NTFS of course :) (sorry).
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Alvaro Herrera wrote:
Joshua D. Drake wrote:
My concern is that this is over a single bench run. I could imagine that
after a week or two weeks of stead PostgreSQL use, the IO would
gradually get worse and worse.Well, we mostly do random reads (seek) all the time anyway, so is this
really a concern?Are you using NTFS or FAT? You didn't answer that question.
If it's FAT then you deserve what you get.
If you really need this, a Slony setup would probably work well. Shut down
replica, defrag, start up replica, wait for sync completion, switchover
and repeat. The new Slony release is said to support Windows.
cheers
andrew
Martijn van Oosterhout wrote:
On Fri, Dec 01, 2006 at 10:58:44PM -0800, Joshua D. Drake wrote:
I know this isn't *our* fault :) but I am curious if there is anything
we can do about the way postgresql writes files to help limit
fragmentation.Essentially, this makes win32 impossible in a 24x7 environment (jokes
aside about Win32 in general) because we *have* to defrag on Windows and
Windows won't defrag open files (thus anything PostgreSQL is using).BTW, do you know what 11% fragmentation means? Does that mean each file
is on average split in 9 pieces, because for a 1GB file, 9 pieces isn't
all that bad.
in our win32/ntfs environment, only 6 pgsql data-files are fragmented. but
they are heavily fragmented. fragmentiation ranges from 1369 fragments for a
14mb file to 4548 fragments for a 628mb one... the database is only 1 week
old.
- thomas
Well, we mostly do random reads (seek) all the time anyway, so is this
really a concern?Are you using NTFS or FAT? You didn't answer that question.
If it's FAT then you deserve what you get.
I am investigating this for the wider community. I do not, use
PostgreSQL Win32 for anything but testing, but guess what -- our silent
majority does. I have a customer with 5000 Win32 installations and they
are not the only ones with those types of deployments.
I am trying to get as much information as possible so that:
1. We can update the documentation to correctly reflect issues with
Win32.
2. We can hopefully become the dominant database on Win32.
If you really need this, a Slony setup would probably work well.
The idea of Slony with Windows is a bit terryfying considering the
target Admin audience and the performance hit one would take.
Shut down
replica, defrag, start up replica, wait for sync completion, switchover
and repeat. The new Slony release is said to support Windows.
O.k., doesn't this seem like killing a squirrel with a 50mm tank gun?
The fact that this was suggested as anything but a last resort isn't to
inspiring.
Sincerely,
Joshua D. Drake
cheers
andrew
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Joshua D. Drake wrote:
O.k., doesn't this seem like killing a squirrel with a 50mm tank gun?
The fact that this was suggested as anything but a last resort isn't to
inspiring.
Before you start ringing alarm bells, you need to instrument the actual
performance effect. So far all I have seen is an assumption that the
effect will be serious. Let's see some performance metrics that
demonstrate the problem you think might exist.
The Windows port has been out for nearly 2 years. The fact that we have
not seen complaints about this leads me to be somewhat skeptical. You
could be right, but I want hard evidence. (And why wouldn't this be a
problem for any DBMS running on Windows? There are just huge numbers of
24/7 Windows servers running SQLServer or Oracle.)
cheers
andrew
On Sat, 2006-12-02 at 11:05 -0600, Andrew Dunstan wrote:
Joshua D. Drake wrote:
O.k., doesn't this seem like killing a squirrel with a 50mm tank gun?
The fact that this was suggested as anything but a last resort isn't to
inspiring.Before you start ringing alarm bells, you need to instrument the actual
performance effect.
Alarm bells? I saw a potential issue, I brought it to the community.
That is all.
So far all I have seen is an assumption that the
effect will be serious. Let's see some performance metrics that
demonstrate the problem you think might exist.
It is not an assumption that defragmentation causes performance issues.
The assumption is that the above fact will cause problems with
PostgreSQL performance.
Yes, the PostgreSQL portion is an assumption which is why I have brought
it to the community for further analysis.
The Windows port has been out for nearly 2 years. The fact that we have
not seen complaints about this leads me to be somewhat skeptical.
Fair enough but I don't hold too much weight to this argument as even
the remote idea that anything but the slimmest <1% of our Wndows users
even read this list (let alone post to it) beyond a possible Google
search is far fetched.
You
could be right, but I want hard evidence.
Sure, which is why I opened the dialog.
(And why wouldn't this be a
problem for any DBMS running on Windows? There are just huge numbers of
24/7 Windows servers running SQLServer or Oracle.)
Well I don't know about SQLServer but Oracle uses a single file (at
least it does on Unix). I don't know if that would make a difference or
not.
Sincerely,
Joshua D. Drake
cheers
andrew
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On 12/2/06, Joshua D. Drake <jd@commandprompt.com> wrote:
Well I don't know about SQLServer but Oracle uses a single file (at
least it does on Unix). I don't know if that would make a difference or
not.
ditto sql server. I agree with Andrew though, let's determine there
to be negative performance impact before assuming the worst.
merlin
Joshua D. Drake wrote:
On Sat, 2006-12-02 at 11:05 -0600, Andrew Dunstan wrote:
Joshua D. Drake wrote:
O.k., doesn't this seem like killing a squirrel with a 50mm tank gun?
The fact that this was suggested as anything but a last resort isn'tto
inspiring.
Before you start ringing alarm bells, you need to instrument the actual
performance effect.Alarm bells? I saw a potential issue, I brought it to the community.
That is all.
You said:
"Essentially, this makes win32 impossible in a 24x7 environment (jokes
aside about Win32 in general) because we *have* to defrag on Windows and
Windows won't defrag open files (thus anything PostgreSQL is using)."
What I am asking for is evidence to back up that assertion. At the moment
it is, as Tom likes to say, "a fact not in evidence".
(I should have thought a single file DB would be MORE liable to
fragmentation, BTW).
cheers
andrew
(I should have thought a single file DB would be MORE liable to
fragmentation, BTW).
AFAIK what mssql does against fragmentation is: it preallocates its
tablespace in chunks of several mb/gb and then starts filling the free
allocated space until it has to reallocate more. the fragmentation for such
a single file (actually in most times 2 files for mssql) is practically
ignorable.
- thomas
On Sat, 2006-12-02 at 09:22 -0800, Joshua D. Drake wrote:
On Sat, 2006-12-02 at 11:05 -0600, Andrew Dunstan wrote:
Joshua D. Drake wrote:
O.k., doesn't this seem like killing a squirrel with a 50mm tank gun?
The fact that this was suggested as anything but a last resort isn't to
inspiring.Before you start ringing alarm bells, you need to instrument the actual
performance effect.Alarm bells? I saw a potential issue, I brought it to the community.
That is all.
That's understood and appreciated. In case no one else says it: thanks.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Starting point: 1% Fragmentation
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 50
number of transactions per client: 1000
number of transactions actually processed: 50000/50000
tps = 46.001927 (including connections establishing)
tps = 46.045008 (excluding connections establishing)
Total Fragmentation: 2%
File Fragmentation: 5%
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 50
number of transactions per client: 1000
number of transactions actually processed: 50000/50000
tps = 50.507082 (including connections establishing)
tps = 50.558191 (excluding connections establishing)
Total Fragmentation: 2%
File Fragmentation: 5%
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 100
number of transactions per client: 1000
number of transactions actually processed: 100000/100000
tps = 45.976461 (including connections establishing)
tps = 46.026986 (excluding connections establishing)
Total Fragmentation: 2%
File Fragmentation: 5%
Initialize with scaling of 100
Total Fragmentation: 2%
File Fragmentation: 5%
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 50
number of transactions per client: 1000
number of transactions actually processed: 50000/50000
tps = 43.054164 (including connections establishing)
tps = 43.090163 (excluding connections establishing)
Total Fragmentation: 2%
File Fragmentation: 5%
O.k. my guess here is that autovacuum is keeping up with the actual
pgbench runs and thus causing reusable tuples? Does that make sense?
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
I know this isn't *our* fault :) but I am curious if there is
anything we can do about the way postgresql writes files to help
limit fragmentation.Essentially, this makes win32 impossible in a 24x7 environment
(jokes aside about Win32 in general) because we *have* todefrag on
Windows and Windows won't defrag open files (thus
anything PostgreSQL is using).
BTW, do you know what 11% fragmentation means? Does that mean each
file is on average split in 9 pieces, because for a 1GB file, 9
pieces isn't all that bad.in our win32/ntfs environment, only 6 pgsql data-files are
fragmented. but they are heavily fragmented. fragmentiation
ranges from 1369 fragments for a 14mb file to 4548 fragments
for a 628mb one... the database is only 1 week old.
Which relations do these files represent?
//Magnus
in our win32/ntfs environment, only 6 pgsql data-files are
fragmented. but they are heavily fragmented. fragmentiation
ranges from 1369 fragments for a 14mb file to 4548 fragments
for a 628mb one... the database is only 1 week old.
Which relations do these files represent?
all 6 of them tables are tables. makes sense as reindex would recreate the
index file more or less at once and thus without much fragmentation...
- thomas
Andrew Dunstan wrote:
(I should have thought a single file DB would be MORE liable to
fragmentation, BTW).
The likes of Oracle/SQLServer preallocate the files(s) at database
creation time, so there will be little or no initial fragmentation.
However, they are typically allowed to grow to accommodate extra data -
this means that fragmentation over time *can* become a problem for these
guys too - but looks like we can suffer it from day 1.
Cheers
Mark