PostgreSQL over NFS?
I've been using PostgreSQL for a few years now in various projects, and
I've been very happy with its features and performance.
I have a client who wants to know how PostgreSQL (7) will perform over
NFS. The NFS client is Solaris, the server is a big network storage
appliance. The network is probably gigabit ethernet.
My first reaction is to tell them they're just crazy, and that's not how
a database is intended to work, and they should buy some fast, redundant
local storage. I thought I'd ask the designers and users here so I could
back up my recommendations.
--
Shaw Terwilliger <sterwill@sourcegear.com>
SourceGear Corporation
217.356.0105 x 641
-- Start of PGP signed section.
I've been using PostgreSQL for a few years now in various projects, and
I've been very happy with its features and performance.I have a client who wants to know how PostgreSQL (7) will perform over
NFS. The NFS client is Solaris, the server is a big network storage
appliance. The network is probably gigabit ethernet.My first reaction is to tell them they're just crazy, and that's not how
a database is intended to work, and they should buy some fast, redundant
local storage. I thought I'd ask the designers and users here so I could
back up my recommendations.
It is not performance I would be concerned about, but reliability. NFS
has no state for reliability. I have to ask why they are using NFS
rather than putting it on a drive local to the machine. If they say
they want to share the data between two machines, that is even crazier.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
* Shaw Terwilliger <sterwill@sourcegear.com> [010209 15:12] wrote:
I've been using PostgreSQL for a few years now in various projects, and
I've been very happy with its features and performance.I have a client who wants to know how PostgreSQL (7) will perform over
NFS. The NFS client is Solaris, the server is a big network storage
appliance. The network is probably gigabit ethernet.My first reaction is to tell them they're just crazy, and that's not how
a database is intended to work, and they should buy some fast, redundant
local storage. I thought I'd ask the designers and users here so I could
back up my recommendations.
NFS == Not F****** Stable, don't do it. :) Any DBA will want to
hurt you when he hears about you running a production DB over NFS.
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
It is not performance I would be concerned about, but reliability. NFS
has no state for reliability. I have to ask why they are using NFS
rather than putting it on a drive local to the machine. If they say
they want to share the data between two machines, that is even crazier.
They may want to put the data on a file server, so that it gets backed up,
and has the advantage of a hardware fault-tolerant RAID array. Tht has it's
merits, but I don't think it woul dbe that great for performance. We spent
the money to put another RAID array in the database machine, both for
reliability and speed. Once per night, the DB is dumped, and the dump is
copied to the file server, so that we have a second copy somewhere, and it's
included in the nightly tape backup.
steve
Shaw Terwilliger <sterwill@sourcegear.com> writes:
I have a client who wants to know how PostgreSQL (7) will perform over
NFS. The NFS client is Solaris, the server is a big network storage=20
appliance. The network is probably gigabit ethernet.
My first reaction is to tell them they're just crazy,
I agree.
Aside from the speed issues, there's a serious reliability problem;
AFAIR, NFS doesn't guarantee very much about write ordering, which
means a crash could leave you with corrupt data.
Seems a lot better to run the Postgres server on the machine where
the storage is.
regards, tom lane
Bruce Momjian wrote:
It is not performance I would be concerned about, but reliability. NFS
has no state for reliability. I have to ask why they are using NFS
rather than putting it on a drive local to the machine. If they say
they want to share the data between two machines, that is even crazier.
I did search through the archives, and saw the reliability thing mentioned
(no state means locks aren't gone when the database server dies).
Nothing has been deployed yet, and I'm pretty sure I can convince them to
go for local disks (this is a large deployment, and they've got the
budget for the system).
--
Shaw Terwilliger <sterwill@sourcegear.com>
SourceGear Corporation
217.356.0105 x 641
* Tom Lane <tgl@sss.pgh.pa.us> [010209 15:40] wrote:
Shaw Terwilliger <sterwill@sourcegear.com> writes:
I have a client who wants to know how PostgreSQL (7) will perform over
NFS. The NFS client is Solaris, the server is a big network storage=20
appliance. The network is probably gigabit ethernet.My first reaction is to tell them they're just crazy,
I agree.
Aside from the speed issues, there's a serious reliability problem;
AFAIR, NFS doesn't guarantee very much about write ordering, which
means a crash could leave you with corrupt data.
Actually NFS has very strong write ordering semantics, it just has
terrible cache coherency. Meaning two machines accessing the same
file will most likely see different things if the file is updated
moderately.
For NFSv2 _all_ writes must be done syncronously. For NFSv3 there's
a seperate commit RPC that needs to succeed before the local machine
returns from fsync.
There's also the issue that NFS requests are done async, meaning
that you won't get an error on write(2)/read(2) until close(2) I'm
not even sure if fsync(2) can or will return an error on an NFS
error, this can be disasterous for a database. You can get around
that by not running any async nfs kernel threads on the client
(nfsiod under *BSD) however your performance will go to pot if you
disable the async NFS subsystem.
Even then there's probably more black magic that can cause wierd
things to happen.
Seems a lot better to run the Postgres server on the machine where
the storage is.
Agreed. :)
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
Alfred Perlstein wrote:
NFS == Not F****** Stable, don't do it. :) Any DBA will want to
hurt you when he hears about you running a production DB over NFS.
Well, I thought it wasn't a very safe thing to do. Their network
administrator (the guy who buys machines, schedules backups, etc.)
suggested looking into NFS storage. I think he just wanted to keep his
backup/restore plans simple (do it all on the network applicance) and
doesn't realize how a database server reads and writes to disks,
and NFS gets in the way.
--
Shaw Terwilliger <sterwill@sourcegear.com>
SourceGear Corporation
217.356.0105 x 641
Import Notes
Resolved by subject fallback
Alfred Perlstein wrote:
Actually NFS has very strong write ordering semantics, it just has
terrible cache coherency. Meaning two machines accessing the same
file will most likely see different things if the file is updated
moderately.
I'm not an NFS guru by any means, but I've noticed episodes of cache
incoherency from NFS-mounted home directories on two workstations.
Running two postgresql database servers on two hosts accessing the
same data space through NFS sounds like asking for corruption, and
was, luckily, never suggested.
--
Shaw Terwilliger <sterwill@sourcegear.com>
SourceGear Corporation
217.356.0105 x 641
On Fri, 9 Feb 2001, Shaw Terwilliger wrote:
I've been using PostgreSQL for a few years now in various projects, and
I've been very happy with its features and performance.I have a client who wants to know how PostgreSQL (7) will perform over
NFS. The NFS client is Solaris, the server is a big network storage
appliance. The network is probably gigabit ethernet.My first reaction is to tell them they're just crazy, and that's not how
a database is intended to work, and they should buy some fast, redundant
local storage. I thought I'd ask the designers and users here so I could
back up my recommendations.
we use mysql over nfs here. it works fine. the nfs server is a netapp.
it's mounted over a gigabit ethernet crossover cable to the linux machine
that runs mysqld. it's unlikely that gigabit ethernet would be much of a
bottleneck on any modern disks, plus netapp's wafl filesystem is basically
faster than anything out there right now, so the only overhead is the
tcp/ip (udp) layer, which probably isn't that expensive. running it over
a 100megabit line would probably slow things down some, but it all depends
on how hard you're going to be pushing it.
so the issue left is reliability. i've looked over some of the replies
already that talk about unreliability in general, async issues, and
locking issues. here's what you do to avoid all of that: use an nfs
implementation that works. we use linux 2.2.18 with mount and nfs-utils
packages from http://nfs.sourceforge.net/. we use nfsv3, not v2, and we
mount the filesystem with the sync option. problem solved.
i haven't tried this with postgresql, but we actually are planning on
switching to postgresql down the road, so hopefully it'll be ok. heh.
-tcl.
On Fri, 9 Feb 2001, tc lewis wrote:
On Fri, 9 Feb 2001, Shaw Terwilliger wrote:
I've been using PostgreSQL for a few years now in various projects, and
I've been very happy with its features and performance.I have a client who wants to know how PostgreSQL (7) will perform over
NFS. The NFS client is Solaris, the server is a big network storage
appliance. The network is probably gigabit ethernet.My first reaction is to tell them they're just crazy, and that's not how
a database is intended to work, and they should buy some fast, redundant
local storage. I thought I'd ask the designers and users here so I could
back up my recommendations.we use mysql over nfs here. it works fine. the nfs server is a netapp.
it's mounted over a gigabit ethernet crossover cable to the linux machine
that runs mysqld. it's unlikely that gigabit ethernet would be much of a
bottleneck on any modern disks, plus netapp's wafl filesystem is basically
faster than anything out there right now, so the only overhead is the
tcp/ip (udp) layer, which probably isn't that expensive. running it over
a 100megabit line would probably slow things down some, but it all depends
on how hard you're going to be pushing it.so the issue left is reliability. i've looked over some of the replies
already that talk about unreliability in general, async issues, and
locking issues. here's what you do to avoid all of that: use an nfs
implementation that works. we use linux 2.2.18 with mount and nfs-utils
packages from http://nfs.sourceforge.net/. we use nfsv3, not v2, and we
mount the filesystem with the sync option. problem solved.i haven't tried this with postgresql, but we actually are planning on
switching to postgresql down the road, so hopefully it'll be ok. heh.
oh, and of course, we only mount the data files on _one_ client. we do
_not_ run 2 mysqlds on the same data. that would probably be very bad
with mysql or postgresql.
-tcl.
Steve Wolfe wrote:
They may want to put the data on a file server, so that it gets backed up,
and has the advantage of a hardware fault-tolerant RAID array. Tht has it's
merits, but I don't think it woul dbe that great for performance. We spent
the money to put another RAID array in the database machine, both for
reliability and speed. Once per night, the DB is dumped, and the dump is
copied to the file server, so that we have a second copy somewhere, and it's
included in the nightly tape backup.
The feeling I got from speaking with their administrator was that they
wanted what you mentioned: easy backups on the applicance with the automatic
tape robot. In my application, if the PostgreSQL host goes down, all
customers have no service. Assuming RAID on the database server host,
and plenty of RAM and CPU cycles, they CPUs could still explode, or a
meteorite could find its way into the power supply (supplies?).
It will probably be necessary to have a backup host on hand (assume
identical hardware). Database connections to other system components
do not need to be maintained (they are pooled by a local process running
on the database server which speaks stateless things to other components).
If this host could run PostgreSQL out of the production environment,
but obtain periodic updates (once or twice daily would probably be fine),
it could be configured to assume the first host's IP address in case of
failure, and the system will chug along nicely.
Besides dumping and COPY'ing the data into the second server, is there
another form of easy one-way replication available? My schema has just
a few simple tables (six tables, a few rows each). But these tables
may hold a few million records each. I've estimated the total data size
to be somewhere around 10 GB (with WAGs for index sizes) for 5 million
users. I guess COPY would work.
--
Shaw Terwilliger <sterwill@sourcegear.com>
SourceGear Corporation
217.356.0105 x 641
* Shaw Terwilliger <sterwill@sourcegear.com> [010209 16:18] wrote:
Alfred Perlstein wrote:
Actually NFS has very strong write ordering semantics, it just has
terrible cache coherency. Meaning two machines accessing the same
file will most likely see different things if the file is updated
moderately.I'm not an NFS guru by any means, but I've noticed episodes of cache
incoherency from NFS-mounted home directories on two workstations.
Running two postgresql database servers on two hosts accessing the
same data space through NFS sounds like asking for corruption, and
was, luckily, never suggested.
My statement is also incorrect, NFS has loose write ordering
semantics, but it does have strong reliability, meaning that
fsync(2)/close(2) must sync all data to backing NFS server or return
an error.
The part about cache incoherency is very true, the _only_ cache coherency
NFS offers is _only_ in the v3 spec, and is specifically called 'wcc'
'weak cache coherency' (afaik). So yes, you can expect differnent
NFS clients to get different inconsistant views on heavily modified
files unless they use advisory locking (which should fsync out locked
ranges before release).
Anyhow, if the idea is just to get a nice backup system, you could
do a pg_dump and write the output to a NFS mounted FS, there's probably
less that can go wrong with a large sequencial write than heavy shared
read/write/seek.
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."
Shaw Terwilliger wrote:
Besides dumping and COPY'ing the data into the second server, is there
another form of easy one-way replication available? My schema has just
a few simple tables (six tables, a few rows each). But these tables
may hold a few million records each. I've estimated the total data size
to be somewhere around 10 GB (with WAGs for index sizes) for 5 million
users. I guess COPY would work.
Oops... up there, I meant to say my tables had a few columns each, not
rows. A few tables may hold a few million records. One big users table
holds an int id and a few TEXT fields, the other tables use keys into
the users table's id field and may store some strings.
--
Shaw Terwilliger <sterwill@sourcegear.com>
SourceGear Corporation
217.356.0105 x 641
I've heard of other people running databases on a Netapp Filer
I think this is generally ok because the netapp has a special
write cache in NVram and that's what makes it so fast.
On Fri, 9 Feb 2001, tc lewis wrote:
Show quoted text
On Fri, 9 Feb 2001, Shaw Terwilliger wrote:
I've been using PostgreSQL for a few years now in various projects, and
I've been very happy with its features and performance.I have a client who wants to know how PostgreSQL (7) will perform over
NFS. The NFS client is Solaris, the server is a big network storage
appliance. The network is probably gigabit ethernet.My first reaction is to tell them they're just crazy, and that's not how
a database is intended to work, and they should buy some fast, redundant
local storage. I thought I'd ask the designers and users here so I could
back up my recommendations.we use mysql over nfs here. it works fine. the nfs server is a netapp.
it's mounted over a gigabit ethernet crossover cable to the linux machine
that runs mysqld. it's unlikely that gigabit ethernet would be much of a
bottleneck on any modern disks, plus netapp's wafl filesystem is basically
faster than anything out there right now, so the only overhead is the
tcp/ip (udp) layer, which probably isn't that expensive. running it over
a 100megabit line would probably slow things down some, but it all depends
on how hard you're going to be pushing it.so the issue left is reliability. i've looked over some of the replies
already that talk about unreliability in general, async issues, and
locking issues. here's what you do to avoid all of that: use an nfs
implementation that works. we use linux 2.2.18 with mount and nfs-utils
packages from http://nfs.sourceforge.net/. we use nfsv3, not v2, and we
mount the filesystem with the sync option. problem solved.i haven't tried this with postgresql, but we actually are planning on
switching to postgresql down the road, so hopefully it'll be ok. heh.-tcl.
On Fri, 9 Feb 2001, adb wrote:
I've heard of other people running databases on a Netapp Filer
I think this is generally ok because the netapp has a special
write cache in NVram and that's what makes it so fast.
yeah, basically a write returns almost instantaneously (or much closer
than comparative to your normal machine writing to disk at least) with a
netapp filer. they're very fast. and also very expensive.
-tcl.
The part about cache incoherency is very true, the _only_ cache coherency
NFS offers is _only_ in the v3 spec, and is specifically called 'wcc'
'weak cache coherency' (afaik). So yes, you can expect differnent
NFS clients to get different inconsistant views on heavily modified
files unless they use advisory locking (which should fsync out locked
ranges before release).Anyhow, if the idea is just to get a nice backup system, you could
do a pg_dump and write the output to a NFS mounted FS, there's probably
less that can go wrong with a large sequencial write than heavy shared
read/write/seek.
Or just rsh over to the database system and pipe the output back to the
'backup' machine.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Fri, Feb 09, 2001 at 04:23:10PM -0800, Alfred Perlstein wrote:
Anyhow, if the idea is just to get a nice backup system, you could
do a pg_dump and write the output to a NFS mounted FS, there's probably
less that can go wrong with a large sequencial write than heavy shared
read/write/seek.
Still new to this but....
Can pg_dump be piped through gzip et al? Use a few cpu strokes to save
network bandwidth may be beneficial.
mrc
--
Mike Castle Life is like a clock: You can work constantly
dalgoda@ix.netcom.com and be right all the time, or not work at all
www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc
We are all of us living in the shadow of Manhattan. -- Watchmen
* adb <adb@Beast.COM> [010209 16:41] wrote:
I've heard of other people running databases on a Netapp Filer
I think this is generally ok because the netapp has a special
write cache in NVram and that's what makes it so fast.
That can't help bugs in the NFS server or client code that can
cause weird stuff to happen (crashes, corruption, outtages).
-Alfred
On Fri, Feb 09, 2001 at 07:42:12PM -0500, Bruce Momjian wrote:
Or just rsh over to the database system and pipe the output back to the
'backup' machine.
Can one even log into a netapp to do this kind of things? I always thought
they were pretty much a black box file server, but I've never played with
one.
mrc
--
Mike Castle Life is like a clock: You can work constantly
dalgoda@ix.netcom.com and be right all the time, or not work at all
www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc
We are all of us living in the shadow of Manhattan. -- Watchmen