SSDs with Postgresql?
The speed benefits of SSDs as benchmarked would seem incredible. Can anybody
comment on SSD benefits and problems in real life use?
I maintain some 100 databases on 3 servers, with 32 GB of RAM each and an
extremely rich, complex schema. (300+ normalized tables)
I was wondering if anybody here could comment on the benefits of SSD in
similar, high-demand rich schema situations?
-Ben
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
On 04/13/11 9:19 PM, Benjamin Smith wrote:
The speed benefits of SSDs as benchmarked would seem incredible. Can
anybody comment on SSD benefits and problems in real life use?I maintain some 100 databases on 3 servers, with 32 GB of RAM each and
an extremely rich, complex schema. (300+ normalized tables)I was wondering if anybody here could comment on the benefits of SSD
in similar, high-demand rich schema situations?
consumer grade MLC SSD's will crash and burn in short order under a
heavy transactional workload characterized by sustained small block
random writes.
The enterprise grade SLC SSDs' will perform very nicely, but they are
very very expensive, and found in high end enterprise database servers
like Oracle's Exadata machines.
On Thu, April 14, 2011 06:19, Benjamin Smith wrote:
The speed benefits of SSDs as benchmarked would seem incredible. Can anybody
comment on SSD benefits and problems in real life use?I maintain some 100 databases on 3 servers, with 32 GB of RAM each and an
extremely rich, complex schema. (300+ normalized tables)I was wondering if anybody here could comment on the benefits of SSD in
similar, high-demand rich schema situations?
Even if you only use SSDs for your indexes, the gains are staggering. We use
them on several servers, one of which is extremely busy (xid wraparound stuff)
and the performance gains are game-changing.
There is no going back. Hint: don't use cheap SSDs - cough up and use Intel.
Cheers
Henry
On 14/04/2011 4:35 PM, Henry C. wrote:
There is no going back. Hint: don't use cheap SSDs - cough up and use Intel.
The server-grade SLC stuff with a supercap, I hope, not the scary
consumer-oriented MLC "pray you weren't writing anything during
power-loss" devices?
--
Craig Ringer
Tech-related writing at http://soapyfrogs.blogspot.com/
On 04/14/11 1:35 AM, Henry C. wrote:
Hint: don't use cheap SSDs - cough up and use Intel.
aren't most of the Intel SSD's still MLC, and still have performance and
reliability issues with sustained small block random writes such as are
generated by database servers? the enterprise grade SLC SSD drives are
things like STEC ZeusIOPS and Seagate Pulsar. and the majority of them
end up in EMC and other big iron SAN systems.
have a look at
http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td4268261.html
It looks like those are "safe" to use with a db, and aren't that expensive.
Le 14/04/2011 10:54, John R Pierce a �crit :
On 04/14/11 1:35 AM, Henry C. wrote:
Hint: don't use cheap SSDs - cough up and use Intel.
aren't most of the Intel SSD's still MLC, and still have performance and
reliability issues with sustained small block random writes such as are
generated by database servers? the enterprise grade SLC SSD drives are
things like STEC ZeusIOPS and Seagate Pulsar. and the majority of them
end up in EMC and other big iron SAN systems.
I thnik Henry is referring to Intel's X25-E line. They are SLC,
enterprise grade.
Quite expensive though, ~700 euros for the 64GB version.
We have one of them in a production server (light load), it works very
well so far.
Performance gain versus WD Raptor RAID array is huge. I never tried to
quantify it.
Arnaud
On Thu, April 14, 2011 10:51, Craig Ringer wrote:
On 14/04/2011 4:35 PM, Henry C. wrote:
There is no going back. Hint: don't use cheap SSDs - cough up and use
Intel.The server-grade SLC stuff with a supercap, I hope, not the scary
consumer-oriented MLC "pray you weren't writing anything during power-loss"
devices?
That's what a UPS and genset are for. Who writes critical stuff to *any*
drive without power backup?
You have a valid point about using SLC if that's what you need though.
However, MLC works just fine provided you stick them into RAID1. In fact, we
use a bunch of them in RAID0 on top of RAID1.
In our environment (clusters) it's all about using cheap consumer-grade
commodity hardware with lots of redundancy to cater for the inevitable
failures. The trade-off is huge: performance with low cost.
We've been using MLC intel drives since they came out and have never had a
failure. Other SSDs we've tried have failed, and so have hard drives. The
point though, is that there are tremendous performance gains to be had with
commodity h/w if you factor in failure rates and make *sure* you have
redundancy.
h
On Thu, April 14, 2011 11:30, Leonardo Francalanci wrote:
have a look at
http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td426826
1.htmlIt looks like those are "safe" to use with a db, and aren't that expensive.
The new SSDs look great. From our experience, we trust SSDs (even MLC) far
more than mechanical hard drives.
I believe this perception that SSDs are less "safe" than failure-prone
mechanical hard drives will eventually change.
In the meantime, we've embraced them and the advantages are compelling.
h
Le 14/04/2011 11:40, Henry C. a �crit :
You have a valid point about using SLC if that's what you need though.
However, MLC works just fine provided you stick them into RAID1. In fact, we
use a bunch of them in RAID0 on top of RAID1.
AFAIK, you won't have TRIM support on RAID-arrayed SSDs.
That might change soon, but I think that RAID boards supporting TRIM are
still a work in progress.
Arnaud
On Thu, 14 Apr 2011 11:46:12 +0200, Henry C. wrote:
On Thu, April 14, 2011 11:30, Leonardo Francalanci wrote:
have a look at
http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td426826
1.htmlIt looks like those are "safe" to use with a db, and aren't that
expensive.The new SSDs look great. From our experience, we trust SSDs (even
MLC) far
more than mechanical hard drives.I believe this perception that SSDs are less "safe" than
failure-prone
mechanical hard drives will eventually change.In the meantime, we've embraced them and the advantages are
compelling.h
One thing you should care about is such called write endurance - number
of writes to one memory region before it will be destroyed - if your SSD
driver do not have transparent allocation, then you may destroy it
really fast, because write of each "block" will be in same memory
segment, clog/xlog may be failed with 10k-100k writes. But if your SSD
has transparent allocation, then internal controller will count your
writes to given memory cell, and when lifetime of this cell will be at
the end, it will "associate" block with different cell. With transparent
allocation, You may sometimes do not fear if system uses journaling, you
store logs there on any kind of often updatable data. You may calculate
life time of your SSD with:
WritesToDestroyCells = "write_endurance" * "disk_size"
AvgLifeTime = WritesToDestroyCells / writes_per_sec
Those are high numbers, even with simply disks as 10.000 * 60GB, means
you need to send 600TB of data to one SSD (not completely true, as you
can't send one byte, but full blocks) . Ofc, In order to extend life
time of SSD you should provide file systems cache, or SSD with cache, as
well turn off FS journaling.
Regards,
Radek
I believe this perception that SSDs are less "safe" than failure-prone
mechanical hard drives will eventually change.
By "safe" I mean they won't corrupt data in case of crash of the machine.
On 14/04/2011 5:40 PM, Henry C. wrote:
The server-grade SLC stuff with a supercap, I hope, not the scary
consumer-oriented MLC "pray you weren't writing anything during power-loss"
devices?That's what a UPS and genset are for. Who writes critical stuff to *any*
drive without power backup?
Even a server with redundant PSUs on a UPS backed by generators can go
down hard and unexpectedly. I'd be extremely nervous unless I could
afford to lose data since the last backup, or unless I had a really
trustworthy replication setup going.
Of course, it's wise to have one or both of those conditions be true
anyway, because no redundant storage system will save you from file
system corruption caused by an OS bug, data corruption caused by a Pg
bug, or a "DELETE FROM critical_table;" by a careless superuser. So I
guess it doesn't cost you more than the risk of some downtime to use
potentially corruption-prone non-supercap MLC, and it's probably worth
it for the performance in your clustered environment.
All I meant with my post was to raise the concern that the OP needs to
be aware of the untrustworthy nature of even the low-end Intel SSDs.
They're usable, you just have to compensate for their deficiencies.
You have a valid point about using SLC if that's what you need though.
However, MLC works just fine provided you stick them into RAID1. In fact, we
use a bunch of them in RAID0 on top of RAID1.
RAID won't help you if they all drop their caches if the power supply
throws a wobbly. That said, it's certainly good for the lifetime issues
especially if the units are upgraded or rotated out in phases.
--
Craig Ringer
Tech-related writing at http://soapyfrogs.blogspot.com/
On Thu, Apr 14, 2011 at 12:19 AM, Benjamin Smith
<lists@benjamindsmith.com>wrote:
I was wondering if anybody here could comment on the benefits of SSD in
similar, high-demand rich schema situations?
For the last several months, I've been using Texas Memory Systems RamSAN 620
drives on my main DB servers. Having near zero seek times has been a
tremendous boon to our performance, and will have pretty much paid for
themselves within the next couple of months. Ie, the "throw hardware at it"
solution worked really well :)
At 06:07 PM 4/14/2011, RadosÅaw Smogura wrote:
One thing you should care about is such called
write endurance - number of writes to one memory
region before it will be destroyed - if your SSD
driver do not have transparent allocation, then
you may destroy it really fast, because write of
each "block" will be in same memory segment,
clog/xlog may be failed with 10k-100k writes.
But if your SSD has transparent allocation, then
internal controller will count your writes to
given memory cell, and when lifetime of this
cell will be at the end, it will "associate"
block with different cell. With transparent
allocation, You may sometimes do not fear if
system uses journaling, you store logs there on
any kind of often updatable data. You may calculate life time of your SSD with:
WritesToDestroyCells = "write_endurance" * "disk_size"
AvgLifeTime = WritesToDestroyCells / writes_per_secThose are high numbers, even with simply disks
as 10.000 * 60GB, means you need to send 600TB
of data to one SSD (not completely true, as you
can't send one byte, but full blocks) . Ofc, In
order to extend life time of SSD you should
provide file systems cache, or SSD with cache, as well turn off FS journaling.
I'm not an expert on SSDs, but I believe modern
SSDs are supposed to automatically spread the
writes across the entire disk where possible -
even to the extent of moving already written stuff.
So if the drives are full or near full, the
tradeoff is between lower performance (because
you have to keep moving stuff about) or lower
lifespan (one area gets overused).
If the drives are mostly empty the SSD's
controller has an easier job - it doesn't have to move as much data around.
Regards,
Link.
After a glowing review at AnandTech (including DB benchmarks!) I decided to
spring for an OCX Vertex 3 Pro 120 for evaluation purposes. It cost about $300
with shipping, etc and at this point, won't be putting any
Considering that I sprang for 96 GB of ECC RAM last spring for around $5000,
even if I put the OCX drives in pairs w/RAID1, I'd still come out well ahead
if it allows me to put off buying more servers for a year or two.
-Ben
On Thursday, April 14, 2011 02:30:06 AM Leonardo Francalanci wrote:
have a look at
http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td426
8261.htmlIt looks like those are "safe" to use with a db, and aren't that expensive.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
On Thu, Apr 14, 2011 at 3:40 AM, Henry C. <henka@cityweb.co.za> wrote:
On Thu, April 14, 2011 10:51, Craig Ringer wrote:
On 14/04/2011 4:35 PM, Henry C. wrote:
There is no going back. Hint: don't use cheap SSDs - cough up and use
Intel.The server-grade SLC stuff with a supercap, I hope, not the scary
consumer-oriented MLC "pray you weren't writing anything during power-loss"
devices?That's what a UPS and genset are for. Who writes critical stuff to *any*
drive without power backup?
Because power supply systems with UPS never fail.
(hint, I've seen them fail, more than once)
On Thu, Apr 14, 2011 at 12:27:34PM -0600, Scott Marlowe wrote:
That's what a UPS and genset are for. Who writes critical stuff to *any*
drive without power backup?Because power supply systems with UPS never fail.
Right, there's obviously a trade-off here. Some of this has to do
with how much your data is worth vs. how much the speed is worth.
There's also the issue of whether you can stand to lose a few rows,
and whether you can stand to lose them for a short time. For
instance, collecting user comments might be a matter of great value,
but if you write them to more than one system, you might not care
whether one of the systems fails briefly. In that case, maybe big
redundancy of cheap disks with power backup is good enough to meet the
price:value ratio. On stock trades worth maybe millions of dollars,
not so much: you miss your teeny window of opportunity to do a trade
and suddenly you're out in the street wearing a barrel.
I can think of lots of different points to be along that continuum,
and surely nobody is suggesting that there is one right answer for
everything.
A
--
Andrew Sullivan
ajs@crankycanuck.ca
Henry C. wrote:
I believe this perception that SSDs are less "safe" than failure-prone
mechanical hard drives will eventually change.
Only because the manufacturers are starting to care about write
durability enough to include the right hardware for it. Many of them
are less safe right now on some common database tasks. Intel's gen 1
and gen 2 drives are garbage for database use. I've had customers lose
terabytes of data due to them. Yes, every system can fail, but these
*will* fail and corrupt your database the first time there's a serious
power problem of some sort. And the idea that a UPS is sufficient to
protect against that even happening in wildly optimistic.
See http://wiki.postgresql.org/wiki/Reliable_Writes for more background
here, and links to reading on the older Intel drives. I summarized the
situation with their newer 320 series drives at
http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html
Those finally get the write flushing right. But the random seeks IOPS
is wildly lower than you might expect on read/write workloads. My own
tests and other sources have all come up with around 3500 IOPS as being
a real-world expectation for the larger sizes of these drives. Also, it
is cheap flash, so durability in a server environment won't be great.
Don't put your WAL on them if you have a high transaction rate. Put
some indexes there instead.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Thu, Apr 14, 2011 at 1:14 PM, Greg Smith <greg@2ndquadrant.com> wrote:
And the idea that a UPS is sufficient to protect against that even happening in
wildly optimistic.
Note that the real danger in relying on a UPS is that most power
conditioning / UPS setups tend to fail in total, not in parts. The
two times I've seen it happen, the whole grid shut down completely for
a few hours. The first time we had Oracle, Ingress, Sybase,
SQL-Server, etc. etc. database server across the company corrupted.
DAYS of recovery time, and since they all failed at once, the machines
in replication got corrupted as well. Restoring production dbs from
backups took days.
The only machine to survive was the corporate intranet running pgsql
on twin 15k SCSI drives with a proven reliable battery backed
controller on it. It was mine. This was a company that lost
something like $10k a minute for downtime. And the downtime was
measured not in seconds, minutes or hours, but days because everyone
had said the same thing, "The UPS and power conditioners make power
plug pull survivability a non issue." When the only machine with an
uncorrupted database is the corporate intranet server the 24/7
production guys look pretty stupid. They also suddenly decided to
start doing power plug pull tests on all database servers.
To make matters worse, the kind of system to NEED the higher
throughput from SSDs is likely the kind of system to be the worst kind
to suffer downtime due to corruption. OTOH, restores from backups
should run pretty fast. :)