scaling postgres

Started by Chris Withersover 10 years ago11 messagesgeneral
Jump to latest
#1Chris Withers
chris@simplistix.co.uk

Hi All,

I'm curious if there are recommendations for scaling postgres to what,
for me, seems like "a lot" of data...

The app in question currently writes around 1.5 billion rows into a
table before rolling them up into tables that have a few million roll up
rows each. That 1.5 billion row table is emptied and refilled each day,
so we're talking about quite high write as well as quite high read.
Where can I find could examples/docs of how to scale postgres for this
kind of data load? What sort of hardware would I be looking to spec?

Okay, now this app may well eventually want to progress to storing those
1.5 billion rows per day. Is that feasible with postgres? If not, what
storage and processing solutions would people recommend for that kind of
data load?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Jony Cohen
jony.cohenjo@gmail.com
In reply to: Chris Withers (#1)
Re: scaling postgres

Hi,
I'd recommend looking into 2 solutions here, depending if you want to keep
the data or not and how "fault tolerant" is the app (could you loose data
from the last hour?)
If you could loose some data and you don't intend keeping the whole data
set - I'd look at using RAM as your storage - it's the fastest place to
store data and you can easily get servers with lot's of RAM these days.
If you can't loose data then you'll need to use disks - depending on how
big each row is, compute your expected read/write throughput and go
shopping :)
for 1kb rows you get 60GB per hour = 16MB per sec - simple disks can handle
this.
for 10kb rows you get 160MB/s - now you need better disks :)

SSD disks are cheep these days but they don't like repeated writes/deletes
so it might cause problems down the line (hence my first RAM recommendation)

as for keeping the raw data - you could easily do it if you use partitions,
if you have daily partitions inheriting from a master table you can quickly
access the last day (or even several days)
but do take note that a full table scan takes time and you'll need to
manage writing to the right partition yourself (not that hard - simply
insert to <tablename>_<date>)

if you can write the data daily, keeping it will not add any real load (a
little on parse times for queries that access the master table)

Just my 2 cents,
Regards,
- Jony

On Mon, Aug 3, 2015 at 9:53 AM, Chris Withers <chris@simplistix.co.uk>
wrote:

Show quoted text

Hi All,

I'm curious if there are recommendations for scaling postgres to what, for
me, seems like "a lot" of data...

The app in question currently writes around 1.5 billion rows into a table
before rolling them up into tables that have a few million roll up rows
each. That 1.5 billion row table is emptied and refilled each day, so we're
talking about quite high write as well as quite high read. Where can I find
could examples/docs of how to scale postgres for this kind of data load?
What sort of hardware would I be looking to spec?

Okay, now this app may well eventually want to progress to storing those
1.5 billion rows per day. Is that feasible with postgres? If not, what
storage and processing solutions would people recommend for that kind of
data load?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Chris Withers
chris@simplistix.co.uk
In reply to: Jony Cohen (#2)
Re: scaling postgres

On 03/08/2015 08:15, Jony Cohen wrote:

SSD disks are cheep these days but they don't like repeated
writes/deletes so it might cause problems down the line (hence my
first RAM recommendation)

as for keeping the raw data - you could easily do it if you use
partitions, if you have daily partitions inheriting from a master
table you can quickly access the last day (or even several days)
but do take note that a full table scan takes time and you'll need to
manage writing to the right partition yourself (not that hard - simply
insert to <tablename>_<date>)

if you can write the data daily, keeping it will not add any real load
(a little on parse times for queries that access the master table)

Interesting, you seem a lot less fussed by these numbers than I am,
which is good to hear!

At what point does postgres stop scaling?
What happens when the computational load no longer fits on one machine?
What are the options then?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Seref Arikan
serefarikan@kurumsalteknoloji.com
In reply to: Chris Withers (#3)
Re: scaling postgres

On Mon, Aug 3, 2015 at 8:22 AM, Chris Withers <chris@simplistix.co.uk>
wrote:

On 03/08/2015 08:15, Jony Cohen wrote:

SSD disks are cheep these days but they don't like repeated
writes/deletes so it might cause problems down the line (hence my first RAM
recommendation)

as for keeping the raw data - you could easily do it if you use
partitions, if you have daily partitions inheriting from a master table you
can quickly access the last day (or even several days)
but do take note that a full table scan takes time and you'll need to
manage writing to the right partition yourself (not that hard - simply
insert to <tablename>_<date>)

if you can write the data daily, keeping it will not add any real load (a
little on parse times for queries that access the master table)

Interesting, you seem a lot less fussed by these numbers than I am, which
is good to hear!

At what point does postgres stop scaling?
What happens when the computational load no longer fits on one machine?
What are the options then?

I think it is hard to come up with blanket responses to generic questions
such as "What happens when the computational load no longer fits on one
machine?"
I'd say consider a scaling strategy that may be able to make use of your
domain model first, if that is possible of course.

I work in healthcare and patient centric records let me consider multiple
servers for lots and lots of patients. The engineering team from instagram
has been sharing their experience with postgres, which is possible due to
their domain model.

So my humble suggestion is: start from the simplest scenario, ask yourself
if you can use multiple, independent postgres installations, if your domain
model and its data model allow you to do this. If not, you may still have
other options, but it all depends on your access patterns etc.

All the best
Seref

Show quoted text

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Chris Withers
chris@simplistix.co.uk
In reply to: Chris Withers (#1)
Re: scaling postgres - can child tables be in a different tablespace?

On 03/08/2015 08:40, Jony Cohen wrote:

Servers now days reach very impressive write speeds and at rather low
prices - it's simpler to split the write to 2 tables on different
tablespaces/devices than 2 servers.

This raises an interesting question: can a child table be in a different
tablespace to its parent and other children of that parent?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Chris Withers
chris@simplistix.co.uk
In reply to: Seref Arikan (#4)
Re: scaling postgres

On 03/08/2015 08:34, Seref Arikan wrote:

At what point does postgres stop scaling?
What happens when the computational load no longer fits on one
machine? What are the options then?

I think it is hard to come up with blanket responses to generic
questions such as "What happens when the computational load no longer
fits on one machine?"

Of course, I guess I'm looking out for the pain points that people
commonly hit with chunky postgres installs...

I work in healthcare and patient centric records let me consider
multiple servers for lots and lots of patients. The engineering team
from instagram has been sharing their experience with postgres, which is
possible due to their domain model.

I'll have a google, but do you have the Instagram links you're thinking of?

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Seref Arikan
serefarikan@kurumsalteknoloji.com
In reply to: Chris Withers (#6)
Re: scaling postgres

On Tue, Aug 4, 2015 at 9:06 AM, Chris Withers <chris@simplistix.co.uk>
wrote:

Show quoted text

On 03/08/2015 08:34, Seref Arikan wrote:

At what point does postgres stop scaling?
What happens when the computational load no longer fits on one
machine? What are the options then?

I think it is hard to come up with blanket responses to generic
questions such as "What happens when the computational load no longer
fits on one machine?"

Of course, I guess I'm looking out for the pain points that people
commonly hit with chunky postgres installs...

I work in healthcare and patient centric records let me consider

multiple servers for lots and lots of patients. The engineering team
from instagram has been sharing their experience with postgres, which is
possible due to their domain model.

I'll have a google, but do you have the Instagram links you're thinking of?

Nope, sorry, Google is your friend :)

cheers,

Chris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Chris Withers
chris@simplistix.co.uk
In reply to: Seref Arikan (#7)
Re: scaling postgres

On 04/08/2015 09:11, Seref Arikan wrote:

I work in healthcare and patient centric records let me consider

multiple servers for lots and lots of patients. The
engineering team
from instagram has been sharing their experience with
postgres, which is
possible due to their domain model.

I'll have a google, but do you have the Instagram links you're
thinking of?

Nope, sorry, Google is your friend :)

For the benefit of the archives:

http://instagram-engineering.tumblr.com/post/40781627982/handling-growth-with-postgres-5-tips-from

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

cheers,

Chris

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Chris Withers (#5)
Re: scaling postgres - can child tables be in a different tablespace?

Chris Withers wrote:

This raises an interesting question: can a child table be in a different
tablespace to its parent and other children of that parent?

Yes.

Inheritance is a logical concept and is independent of physical placement.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Melvin Davidson
melvin6925@gmail.com
In reply to: Laurenz Albe (#9)
Re: scaling postgres - can child tables be in a different tablespace?

As additional advice, to get the best performance, you will want all your
tablespaces to be on separate spindles/disks.
EG: disk1/tblspc1
disk2/tblspc2
disk3/tblspc3
...
disk99/tblspc99

On Tue, Aug 4, 2015 at 5:31 AM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:

Chris Withers wrote:

This raises an interesting question: can a child table be in a different
tablespace to its parent and other children of that parent?

Yes.

Inheritance is a logical concept and is independent of physical placement.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#11John R Pierce
pierce@hogranch.com
In reply to: Melvin Davidson (#10)
Re: scaling postgres - can child tables be in a different tablespace?

On 8/4/2015 6:14 AM, Melvin Davidson wrote:

As additional advice, to get the best performance, you will want all
your tablespaces to be on separate spindles/disks.
EG: disk1/tblspc1
disk2/tblspc2
disk3/tblspc3
...
disk99/tblspc99

actually, I find to get best performance most often, stripe all the
disks together and put everything on the same big volume, that way all
IO is evenly distributed. otherwise you'll find too much IO on some
tables/partitions, and not enoguh on others, so most of the disks are
idle most of the time.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general