Using AWS ephemeral SSD storage for production database workload?
Hi everyone,
As you may know, EBS volumes though durable are very costly when you
need provisioned IOPS. As opposed to this AWS instance attached ephemeral SSD
is very fast but isn't durable.
I have come across some ideas on the Internet where people hinted at
running production PostgreSQL workloads on AWS ephemeral SSD storage.
Generally, this involves shipping WAL logs continuously to S3 and keeping
an async read replica in another AWS availability zone. Worst case scenario
in such deployment is data loss of a few seconds. But beyond this the
details are sketchy.
Have you come across such a deployment? What are some best practices that
need to be followed to pull this through without significant data loss?
Even though WAL logs are being shipped to S3, in case of loss of both the
instances, the restore time is going be quite a bit for databases of a few
hundred GBs.
Just to be clear, I am not planning anything like this, anytime soon :-)
But I am curious about trade-offs of such a deployment. Any concrete
information in this aspect is well appreciated.
Regards,
Pritam.
On 01/29/2018 05:41 PM, Pritam Barhate wrote:
Hi everyone,
As you may know, EBS volumes though durable are very costly when you
need provisioned IOPS. As opposed to this AWS instance attached
ephemeral SSD is very fast but isn't durable.I have come across some ideas on the Internet where people hinted at
running production PostgreSQL workloads on AWS ephemeral SSD
storage. Generally, this involves shipping WAL logs continuously to
S3 and keeping an async read replica in another AWS availability
zone. Worst case scenario in such deployment is data loss of a few
seconds. But beyond this the details are sketchy.
Both log shipping and async replication are ancient features, and should
be well understood. What exactly is unclear?
Have you come across such a deployment? What are some best practices
that need to be followed to pull this through without significant
data loss? Even though WAL logs are being shipped to S3, in case of
loss of both the instances, the restore time is going be quite a bit
for databases of a few hundred GBs.
Pretty much everyone who is serious about HA is running such cluster. If
they can't afford any data loss, they use synchronous replicas instead.
That's a basic latency-durability trade-off.
Just to be clear, I am not planning anything like this, anytime soon
:-) But I am curious about trade-offs of such a deployment. Any
concrete information in this aspect is well appreciated.
Pretty much everyone is using such architecture (primary + streaming
replicas) nowadays, so it's a reasonably well understood scenario. But
it's really unclear what kind of information you expect to get, or how
much time have you spent reading about this.
There is quite a bit of information in the official docs, although maybe
a bit too low level - it certainly gives you the building blocks instead
of a complete solution. There are also books like [1]https://www.packtpub.com/big-data-and-business-intelligence/postgresql-replication-second-edition for example.
And finally there are tools that help with managing such clusters, like
for example [2]https://repmgr.org/. Not only it's rather bad idea to implement this on your
own (bugs, unnecessary effort) but the tools also show how to do stuff.
[1]: https://www.packtpub.com/big-data-and-business-intelligence/postgresql-replication-second-edition
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-replication-second-edition
[2]: https://repmgr.org/
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Both log shipping and async replication are ancient features, and should
be well understood. What exactly is unclear?
I know about these and I know how to operate them also. The only part I am
concerned about is the ephemeral storage. The risk appetite around it and
the steps people take in order to ensure no "serious" data is lost when
both the primary and the standby are lost (very unlikely when both are in
different AZ but still possible.). I was just wondering if there is any
secret sauce (like some wisdom that comes only from operating a real-world
deployment) to it. Even Heroku seems to be using PIOS (
https://devcenter.heroku.com/articles/heroku-postgres-production-tier-technical-characterization)
and these guys created WAL-E. Anyways I did learn some new things from
Manuel's response.
In short, I am just trying to learn from other people's experience.
Thanks for all the information.
Pritam.
On Mon, Jan 29, 2018 at 11:02 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com
Show quoted text
wrote:
On 01/29/2018 05:41 PM, Pritam Barhate wrote:
Hi everyone,
As you may know, EBS volumes though durable are very costly when you
need provisioned IOPS. As opposed to this AWS instance attached
ephemeral SSD is very fast but isn't durable.I have come across some ideas on the Internet where people hinted at
running production PostgreSQL workloads on AWS ephemeral SSD
storage. Generally, this involves shipping WAL logs continuously to
S3 and keeping an async read replica in another AWS availability
zone. Worst case scenario in such deployment is data loss of a few
seconds. But beyond this the details are sketchy.Both log shipping and async replication are ancient features, and should
be well understood. What exactly is unclear?Have you come across such a deployment? What are some best practices
that need to be followed to pull this through without significant
data loss? Even though WAL logs are being shipped to S3, in case of
loss of both the instances, the restore time is going be quite a bit
for databases of a few hundred GBs.Pretty much everyone who is serious about HA is running such cluster. If
they can't afford any data loss, they use synchronous replicas instead.
That's a basic latency-durability trade-off.Just to be clear, I am not planning anything like this, anytime soon
:-) But I am curious about trade-offs of such a deployment. Any
concrete information in this aspect is well appreciated.Pretty much everyone is using such architecture (primary + streaming
replicas) nowadays, so it's a reasonably well understood scenario. But
it's really unclear what kind of information you expect to get, or how
much time have you spent reading about this.There is quite a bit of information in the official docs, although maybe
a bit too low level - it certainly gives you the building blocks instead
of a complete solution. There are also books like [1] for example.And finally there are tools that help with managing such clusters, like
for example [2]. Not only it's rather bad idea to implement this on your
own (bugs, unnecessary effort) but the tools also show how to do stuff.[1]
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-
replication-second-editionregards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 29 Jan 2018 23:27:32 +0530
Pritam Barhate <pritambarhate@gmail.com> wrote:
In short, I am just trying to learn from other people's experience.
This is identical to solutions that use tmpfs on linux for
database storage or dealing with a fully failed storage
system. Think about what you'd do if a RAID controller
fried and botchd your entire array at once. You'll feel
just the same way if a box using ephemeral storage goes
down.
Your application needs to handle restarting transactions
and either a reverse proxy/load-balancer or client-side
switchover.
Depending on your tolerance for data loss you might need
three servers up, on as a secondary failover if the primary
fails so that you (pretty much) always have two servers up
to maintain the data. The last server only has to last long
enough for a restart and recovery so it might not have to
be very heavy duty, it's main purpose is to keep the database
alive long enough to recover the "real" server.
Q: Why not just use RDS?
It'll be simpler.
--
Steven Lembark 1505 National Ave
Workhorse Compuing Rockford, IL 61103
lembark@wrkhors.com +1 888 359 3508
I have come across some ideas on the Internet
where people hinted at running production PostgreSQL workloads
on AWS ephemeral SSD storage.
I think people were more serious about that before AWS introduced
PIOPS. I wouldn't do this unless I had streaming replication to a
standby, plus regular backups (e.g. with WAL-E). Actually that's what
I use even with regular EBS volumes, and it's not hard to set up. The
standby gives you fast failover/recovery, and WAL-E gives you an extra
layer of insurance.
Another worry about ephemeral storage is that you can't add more, and
the amount you get depends on the instance type. Also it seems like
modern instances don't come with as much ephemeral storage as they
used to, although maybe that impression is mistaken.
I agree that PIOPS is still expensive though. Some people get around
that by running a RAID0 array of gp2 EBS volumes. The conversation at
https://news.ycombinator.com/item?id=13842044 has some details. I've
set it up for one client, and it wasn't too bad. It's been running
fine for 6 months or so.
Paul
Thank you, Paul and Steven, for the information.
@Paul: Thanks for the link. Planning to read through most of fdr's comments
on Hacker News.
@Steven:
Q: Why not just use RDS?
It'll be simpler.
Already using it for multiple deployments. I am primarily a programmer. But
now want to get out of the RDS convenience zone.
Regards,
Pritam.
G'day all,
We have been doing this in production for about five years, the client is
aware of the trade off between speed, cost and availability. (By this I
mean, if it goes down for a few minutes, no big concern to them). We had
around 2 million users, but very small payloads.
We take full database backups every six hours, log-ship to s3, and run
multiple hot streaming servers for a high level of user query activity.
Obviously, the risk was always there of an un-planned shutdown, but we had
very good performance on a very cheap setup. m3.medium / ubuntu anyone!
Disclaimer - we don't do this any more for our bigger production systems
... for a number of reasons. We ran out of space on the SSD, our write
queries are generally small and not complex, so we went to EBS backed
without PIOPS, and haven't really noticed any major problems with
performance. I felt that the risk / humbug associated with needing to
upgrade the operating system etc made it easier to just use the EBS. This
does also mean we can vertically scale our DB server easily and securely.
We still have hot-streaming replications, back ups and log shipping.
hth
cheers
Ben
On 30 January 2018 at 04:36, Pritam Barhate <pritambarhate@gmail.com> wrote:
Thank you, Paul and Steven, for the information.
@Paul: Thanks for the link. Planning to read through most of fdr's
comments on Hacker News.@Steven:
Q: Why not just use RDS?
It'll be simpler.
Already using it for multiple deployments. I am primarily a programmer.
But now want to get out of the RDS convenience zone.Regards,
Pritam.
--
Dr Ben Madin
Managing Director
m : +61 448 887 220
e : ben@ausvet.com.au
5 Shuffrey Street, Fremantle
Western Australia
on the web: www.ausvet.com.au
This transmission is for the intended addressee only and is confidential
information. If you have received this transmission in error, please delete
it and notify the sender. The contents of this email are the opinion of the
writer only and are not endorsed by Ausvet unless expressly stated
otherwise. Although Ausvet uses virus scanning software we do not accept
liability for viruses or similar in any attachments.
Why not use EBS storage, but don’t use provisioned iops SSDs (io1) for the
ebs volume. Just use the default storage type (gp2) and live with the 3000
IOPS peak for 30 minutes that that allows. You’d be amazed at just how much
I/o can be handled within the default IOPS allowance, though bear in mind
that you accrue iops credits at a rate that is proportional to storage
amount once you’ve started to eat into your quota, so the performance of
someone using general-purpose SSDs (gp2) with 2 terabytes of storage will
be different than someone using 100GB of storage. But I recently moved
several databases to gp2 storage and saved a ton of money doing so (we were
paying for 5000 IOPS and using 5 AT PEAK other than brief bursts to a
couple hundred when backing up and restoring). I’ve done numerous backups
and restores on those hosts since then and have had no trouble keeping up
and have never come close to the 3k theoretical max, even briefly.
Replication doesn’t appear to be bothered, either.
Going to ephemeral storage seems unnecessarily problem prone when instances
die, and I’m not even sure it is an option in RDS or recent EC2 instance
types, which require EBS volumes even for the boot volume. But EBS with
general purpose storage isn’t much more expensive than ephemeral.
On Mon, Jan 29, 2018 at 08:42 Pritam Barhate <pritambarhate@gmail.com>
wrote:
Show quoted text
Hi everyone,
As you may know, EBS volumes though durable are very costly when you
need provisioned IOPS. As opposed to this AWS instance attached ephemeral SSD
is very fast but isn't durable.I have come across some ideas on the Internet where people hinted at
running production PostgreSQL workloads on AWS ephemeral SSD storage.
Generally, this involves shipping WAL logs continuously to S3 and keeping
an async read replica in another AWS availability zone. Worst case scenario
in such deployment is data loss of a few seconds. But beyond this the
details are sketchy.Have you come across such a deployment? What are some best practices that
need to be followed to pull this through without significant data loss?
Even though WAL logs are being shipped to S3, in case of loss of both the
instances, the restore time is going be quite a bit for databases of a few
hundred GBs.Just to be clear, I am not planning anything like this, anytime soon :-)
But I am curious about trade-offs of such a deployment. Any concrete
information in this aspect is well appreciated.Regards,
Pritam.
On Jan 29, 2018, at 8:05 PM, Sam Gendler <sgendler@ideasculptor.com> wrote:
Why not use EBS storage, but don’t use provisioned iops SSDs (io1) for the ebs volume. Just use the default storage type (gp2) and live with the 3000 IOPS peak for 30 minutes that that allows. You’d be amazed at just how much I/o can be handled within the default IOPS allowance, though bear in mind that you accrue iops credits at a rate that is proportional to storage amount once you’ve started to eat into your quota, so the performance of someone using general-purpose SSDs (gp2) with 2 terabytes of storage will be different than someone using 100GB of storage. But I recently moved several databases to gp2 storage and saved a ton of money doing so (we were paying for 5000 IOPS and using 5 AT PEAK other than brief bursts to a couple hundred when backing up and restoring). I’ve done numerous backups and restores on those hosts since then and have had no trouble keeping up and have never come close to the 3k theoretical max, even briefly. Replication doesn’t appear to be bothered, either.
One reason would be that gp2 volumes cap out at 160MB/s. We have a bunch of databases on gp2 (it works great) but that throughput cap can bite you if you’re not expecting it.
Thanks Ben and Sam for sharing your experience.
On Jan 30, 2018 8:52 AM, "Ben Chobot" <bench@silentmedia.com> wrote:
Show quoted text
On Jan 29, 2018, at 8:05 PM, Sam Gendler <sgendler@ideasculptor.com>
wrote:
Why not use EBS storage, but don’t use provisioned iops SSDs (io1) for
the ebs volume. Just use the default storage type (gp2) and live with the
3000 IOPS peak for 30 minutes that that allows. You’d be amazed at just how
much I/o can be handled within the default IOPS allowance, though bear in
mind that you accrue iops credits at a rate that is proportional to storage
amount once you’ve started to eat into your quota, so the performance of
someone using general-purpose SSDs (gp2) with 2 terabytes of storage will
be different than someone using 100GB of storage. But I recently moved
several databases to gp2 storage and saved a ton of money doing so (we were
paying for 5000 IOPS and using 5 AT PEAK other than brief bursts to a
couple hundred when backing up and restoring). I’ve done numerous backups
and restores on those hosts since then and have had no trouble keeping up
and have never come close to the 3k theoretical max, even briefly.
Replication doesn’t appear to be bothered, either.One reason would be that gp2 volumes cap out at 160MB/s. We have a bunch
of databases on gp2 (it works great) but that throughput cap can bite you
if you’re not expecting it.