Database Scalability
Hi All,
We are using Amazon RDS PostgreSQL.
We have some multi-tenant solutions which are separating the tenant data in
Postgresql mainly in the following manner.
1. Using different schemas
2. Using different tables for different tenants.
I was just wondering what would be the best way to scale in case no. of
schemas or no. of tables limit increase and probably cross the limits of
PostgreSQL.
I understand that no. of schemas are unlimited in DB as per
/messages/by-id/24478.1109618520@sss.pgh.pa.us
and some limits on tables are mentioned here
https://www.postgresql.org/docs/12/limits.html
In case of table separated tenancy some tables might have more data than
other tables. In the case of the table itself growing in size I can look
for partitioning the data.
In the case of limits being reached in terms of tables and schemas (if any
) I understand that database sharding or horizontal scaling over multiple
database instances could be one solution. It will also help in having
more database connections/resources for an ever increasing number of
tenants . Obviously with the headache of defining a balanced sharding
strategy and maintaining it.
Thanks and Best Regards,
Saurav
Saurav Sarkar wrote on 11/29/21 10:13 PM:
Hi All,
We have some multi-tenant solutions which are separating the tenant
data in Postgresql mainly in the following manner.1. Using different schemas
2. Using different tables for different tenants.
Without more details it's impossible to give you a detailed answer, so,
in general.... if you are breaking out your client data by schema, you
will likely be fine. We have used this method with great success to
scale our customer workload - each customer gets their own schema, which
we can then rebalance between databases as those clients drive more load
over time. For those tables that we want partitioned, we simply
partition them in every schema. That's arguably inefficient but we find
the schema consistency to be an overall win.
(For context, when I say we have used this method with great success, we
have over 13k customers, almost a PB of data, peak around 1.5M iops, and
it's all painless to operate.)
Hi Ben,
Thanks a lot for your reply.
So are all the schemas on one DB or are distributed/sharded across multiple
DBs ?
Best Regards,
Saurav
On Tue, Nov 30, 2021 at 11:43 PM Ben Chobot <bench@silentmedia.com> wrote:
Show quoted text
Saurav Sarkar wrote on 11/29/21 10:13 PM:
Hi All,
We have some multi-tenant solutions which are separating the tenant data
in Postgresql mainly in the following manner.1. Using different schemas
2. Using different tables for different tenants.Without more details it's impossible to give you a detailed answer, so, in
general.... if you are breaking out your client data by schema, you will
likely be fine. We have used this method with great success to scale our
customer workload - each customer gets their own schema, which we can then
rebalance between databases as those clients drive more load over time. For
those tables that we want partitioned, we simply partition them in every
schema. That's arguably inefficient but we find the schema consistency to
be an overall win.(For context, when I say we have used this method with great success, we
have over 13k customers, almost a PB of data, peak around 1.5M iops, and
it's all painless to operate.)
On 11/30/21 22:08, Saurav Sarkar wrote:
Hi Ben,
Thanks a lot for your reply.
So are all the schemas on one DB or are distributed/sharded across
multiple DBs ?Best Regards,
Saurav
To my knowledge PostgreSQL doesn't support sharding, which is well and
good because sharding is mostly useless, at least in my opinion.
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
Store a connection string for each tenant or the metadata to build one on demand. Then each tenant is in its own schema on whatever database instance the connection string points at. Then it doesn’t really matter how you spread your tenants across one database or many; just do whatever works best and adjust as needed.
~ Mike
Saurav Sarkar wrote on 11/30/21 7:08 PM:
So are all the schemas on one DB or are distributed/sharded across
multiple DBs ?
In our use case, every db entirely homes one or more schemas. Some dbs
host many schemas for small customers, some dbs host a handful of
schemas for medium customers, and some dbs are dedicated to a single
large customer. If the customer is very large, we get a bigger db than
normal for them. So theoretically we still have a problem of vertical
scale but in practice it's rarely an issue and when it is, we can
usually split that single customer into multiple, at the application layer.
Mladen Gogala wrote on 11/30/21 7:52 PM:
To my knowledge PostgreSQL doesn't support sharding, which is well and
good because sharding is mostly useless, at least in my opinion.
OK I'll bite.
Not only does PostgreSQL natively support table partitioning (which is
absolutely a form of sharding), there multiple well-regarded extensions
that can help with sharding, all of which are orthogonal to how you can
configure your application to use Postgres in the first place. So to say
Postgres doesn't support sharding is.... misleading, at best.
Also, the general concept of sharding to move your scaling challenges
from vertical ones to horizontal ones has multiple self-evident
advantages. If your work history has all happened to fit on a single
server, then bully for you, but not everybody has it so easy.
To my knowledge PostgreSQL doesn't support sharding, which is well and
good because sharding is mostly useless, at least in my opinion.
Not only does PostgreSQL natively support table partitioning (which is
absolutely a form of sharding), there multiple well-regarded extensions
that can help with sharding, all of which are orthogonal to how you can
configure your application to use Postgres in the first place. So to say
Postgres doesn't support sharding is.... misleading, at best.
Also, the general concept of sharding to move your scaling challenges
from vertical ones to horizontal ones has multiple self-evident
advantages. If your work history has all happened to fit on a single
server, then bully for you, but not everybody has it so easy.
It supports partitioning out of the box - not sharding where different tables reside on different machines!
CitusData and TimescaleDB provide sharding as extensions - both of which appear useful for TimeSeries data. There was PostgresXL which was a general sharding (multi-machine) solution that appears to have died.
SQLP!