Table partitioning for cloud service?
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data, but would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.
How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter
On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a
postgresql database. Currently I only have a single customer, but of
course I want to design with the possibility of multiple customers in
mind. In that vein, I’m wondering what is “typical” in terms of
designing the DB structure to make sure that one customer doesn’t
“accidentally" get data for another customer? At the moment I am leaning
towards giving each customer their own set of tables, with a unique
prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)
Or set up a separate database for each in the cluster.
but would also increase maintenance efforts, as if I needed to add a
field I would have to add it to every table. On the other hand, keeping
everything in the same set of tables would mean having to be VERY
careful with my SQL to make sure no customer could access another’s data.How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter
--
Adrian Klaver
adrian.klaver@aklaver.com
On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)
Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.
Or set up a separate database for each in the cluster.
Same as above - no permissions.
At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter
Show quoted text
but would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.
How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter--
Adrian Klaver
adrian.klaver@aklaver.com
On 5/21/20 8:53 AM, Israel Brewster wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a
postgresql database. Currently I only have a single customer, but of
course I want to design with the possibility of multiple customers in
mind. In that vein, I’m wondering what is “typical” in terms of
designing the DB structure to make sure that one customer doesn’t
“accidentally" get data for another customer? At the moment I am
leaning towards giving each customer their own set of tables, with a
unique prefix for each. This would provide a “hard” separation for
the data,^^^^^^^^^^^^ I think that is called a schema:)
Ok. That’s probably an option. Although it looks from a cursory perusal
that for that to work, I would have to have separate DB users with
different permissions. Which would be fine, except that I don’t have
permissions to create users.Or set up a separate database for each in the cluster.
Same as above - no permissions.
At the moment, I am running on Heroku, which gives me a postgresql
database, but not full control over the cluster. I may need to move to
something more robust, if having completely separate databases is the
best option. I was hoping to avoid SysAdmin stuff as much as possible,
and focus on the software side, but obviously some sys admin is required.
You can't use this?:
https://devcenter.heroku.com/articles/heroku-postgresql-credentials
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com
Home of EZPunch and Lyrics Presenterbut would also increase maintenance efforts, as if I needed to add a
field I would have to add it to every table. On the other hand,
keeping everything in the same set of tables would mean having to be
VERY careful with my SQL to make sure no customer could access
another’s data.
How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
On Thu, 21 May 2020 at 11:53, Israel Brewster <israel@brewstersoft.com>
wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a
postgresql database. Currently I only have a single customer, but of course
I want to design with the possibility of multiple customers in mind. In
that vein, I’m wondering what is “typical” in terms of designing the DB
structure to make sure that one customer doesn’t “accidentally" get data
for another customer? At the moment I am leaning towards giving each
customer their own set of tables, with a unique prefix for each. This would
provide a “hard” separation for the data,^^^^^^^^^^^^ I think that is called a schema:)
Ok. That’s probably an option. Although it looks from a cursory perusal
that for that to work, I would have to have separate DB users with
different permissions. Which would be fine, except that I don’t have
permissions to create users.Or set up a separate database for each in the cluster.
Same as above - no permissions.
At the moment, I am running on Heroku, which gives me a postgresql
database, but not full control over the cluster. I may need to move to
something more robust, if having completely separate databases is the best
option. I was hoping to avoid SysAdmin stuff as much as possible, and focus
on the software side, but obviously some sys admin is required.
There's a whole lot of "that depends" to this.
If there is not much data shared across customers, then it's a pretty good
answer to create a database for each one. This is especially good if they
are only occasionally connected.
If there is a LOT of shared data, then "life gets more complicated."
It's a decently well documented problem out there; I just searched for
"multitenant database design" which showed up a number of decent
(not-Postgres-specific) answers
https://www.google.com/search?client=firefox-b-d&q=multitenant+database+design
Some common patterns include:
- A database per tenant (on Postgres, that means that PGDATABASE and/or
connection URIs change for each tenant)
- A schema per tenant (on Postgres, that means each time a tenant is added,
you need "CREATE NAMESPACE" to establish the tenancy and "CREATE TABLE" for
each table in that tenancy, and connections use "set
search_path=tenantname;" to select data from the right tenant)
- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need
to specify the tenant)
An interesting answer, if there needs to be shared data, is for the shared
data to go in its own database, and use a Foreign Data Wrapper to have each
tenants' database access it <
https://www.postgresql.org/docs/12/postgres-fdw.html>
There are lots of tradeoffs involved in each case; each of the above
patterns has merits and demerits particularly as the number of tenants
scales, as well as when you discover there are both tiny and large tenants
with differing requirements.
You need to look at it from various perspectives:
- How do application schema changes get handled as the application evolves?
- What are the security concerns about data sharing across tenants?
- What issues come up when managing storage across tenants? (Some
approaches are easier to cope with than others)
If you don't have a fair bit of technical expertise locally, then
sophisticated choices will cause you problems that you won't be able to
solve.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
On May 21, 2020, at 7:57 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/21/20 8:53 AM, Israel Brewster wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)
Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.
Or set up a separate database for each in the cluster.
Same as above - no permissions.
At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.You can't use this?:
https://devcenter.heroku.com/articles/heroku-postgresql-credentials <https://devcenter.heroku.com/articles/heroku-postgresql-credentials>
Wasn’t aware of that. I *did* mention this is my first cloud project. Done plenty of DB/web/application development, but not cloud/multi-customer. Thanks for the pointer.
Show quoted text
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com
Home of EZPunch and Lyrics Presenterbut would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.
How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> <mailto:adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
An interesting answer, if there needs to be shared data, is for the
shared data to go in its own database, and use a Foreign Data Wrapper to
have each tenants' database access it <
https://www.postgresql.org/docs/12/postgres-fdw.html>
For my application I went the schema-per-tenant route, but I have a need to
have a single login which will work for all tenants you've been given
access to. Not all tenants are required to be on the same database host, so
I broke that piece out into it's own database and used postgres fdw to make
it seem local to each tenant.
So i've got first hand experience with this for the past ~5 years, but this
approach has serious tradeoffs. Queries that need to access the remote
table can just fall on their face sometimes. You will also need to deal
with practically every connection spawning 1-or-more new connections which
will stay open taking resources the first time a query is issued that
accesses foreign data.
As an optimization I just worked on for my database earlier this week, I
decided to logically replicate that table from my main authentication
database into a each cluster, and I replaced all references to the FDW for
read-only queries to use the logically replicated table. All write queries
still hit the FDW as before.
This was acceptable for my use case, and drastically improved performance
for some queries where I had previously had to use a CTE to force
materialization to get acceptable performance due to the nature of going
over the FDW for that data.
It's a very cool tool, just be careful about how it can impact performance
if you don't measure for your specific use case.
On Thu, May 21, 2020 at 11:41 AM Adam Brusselback <adambrusselback@gmail.com>
wrote:
As an optimization I just worked on for my database earlier this week, I
decided to logically replicate that table from my main authentication
database into a each cluster, and I replaced all references to the FDW for
read-only queries to use the logically replicated table. All write queries
still hit the FDW as before.
Perhaps you considered this, but if you had not wanted to deal with the
administration side with replication, and the centralized data is changed
infrequently from only one application/source perhaps, then updating the
source and then refreshing a materialized view on each local db that pulls
in the foreign data could be a good option. Some chance of stale data since
the refresh must be triggered, but for some use cases it may be the
simplest setup.
Hi,
On Thu, 21 May 2020 at 18:14, Christopher Browne <cbbrowne@gmail.com> wrote:
On Thu, 21 May 2020 at 11:53, Israel Brewster <israel@brewstersoft.com>
wrote:- Table-based tenancy (e.g. - each table has a "tenant_id" and queries
need to specify the tenant)
The database/schema per tenant solution can be tedious when you want to
modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation
but with a slight twist. One of the biggest issue of this solution is that
if you forget to add the tenant_id to the where clause you are going to
reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for
accessing the base tables. Instead of that I generate views for each tenant
and they can access their own data in the underlying table through these
views. Now if forget to address the right tenant in my client code(it still
happens sometimes) and try to directly access the base tables I get a
strongly worded reminder from the server.
Regards,
Sándor
On May 21, 2020, at 12:12 PM, Sándor Daku <daku.sandor@gmail.com> wrote:
Hi,
On Thu, 21 May 2020 at 18:14, Christopher Browne <cbbrowne@gmail.com <mailto:cbbrowne@gmail.com>> wrote:
On Thu, 21 May 2020 at 11:53, Israel Brewster <israel@brewstersoft.com <mailto:israel@brewstersoft.com>> wrote:- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need to specify the tenant)
The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.
Nice solution! I think I may go to something like that once I upgrade to a cloud solution that lets me add multiple users to the DB (the free tier of Heroku does not). In the meantime, while I just have the single customer, I can fake it easily enough.
Is there any shortcuts for referencing the proper views, or do you just append/prepend something to every table reference in your SQL? One nice thing about the database/schema approach is that I can just specify the search_path (or database) in the connection command, and then all the table references remain the same for all tenants. Also helps avoid the situation you mentioned where you forget to address the right tenant, since you only have to do it in one place. Of course, as you said, it can be tedious when you want to modify the structure. I’ll have to think about that a bit more.
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter
Show quoted text
Regards,
Sándor
The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.
1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. I am not sure
whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade. It is typical to do rolling upgrades
in a multi tenant databases, starting with least risky tenant.
On 5/21/20 1:23 PM, Israel Brewster wrote:
On May 21, 2020, at 12:12 PM, Sándor Daku <daku.sandor@gmail.com
<mailto:daku.sandor@gmail.com>> wrote:Hi,
On Thu, 21 May 2020 at 18:14, Christopher Browne <cbbrowne@gmail.com
<mailto:cbbrowne@gmail.com>> wrote:On Thu, 21 May 2020 at 11:53, Israel Brewster
<israel@brewstersoft.com <mailto:israel@brewstersoft.com>> wrote:- Table-based tenancy (e.g. - each table has a "tenant_id" and
queries need to specify the tenant)The database/schema per tenant solution can be tedious when you want
to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar
situation but with a slight twist. One of the biggest issue of this
solution is that if you forget to add the tenant_id to the where
clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges
for accessing the base tables. Instead of that I generate views for
each tenant and they can access their own data in the underlying table
through these views. Now if forget to address the right tenant in my
client code(it still happens sometimes) and try to directly access the
base tables I get a strongly worded reminder from the server.Nice solution! I think I may go to something like that once I upgrade to
a cloud solution that lets me add multiple users to the DB (the free
tier of Heroku does not). In the meantime, while I just have the single
customer, I can fake it easily enough.Is there any shortcuts for referencing the proper views, or do you just
append/prepend something to every table reference in your SQL? One nice
thing about the database/schema approach is that I can just specify the
search_path (or database) in the connection command, and then all the
table references remain the same for all tenants. Also helps avoid the
situation you mentioned where you forget to address the right tenant,
since you only have to do it in one place. Of course, as you said, it
can be tedious when you want to modify the structure. I’ll have to think
about that a bit more.
If you want to take the tedium out of it take a look at Sqitch:
Then all you have to do is create the change once and deploy to the
targets.
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics PresenterRegards,
Sándor
--
Adrian Klaver
adrian.klaver@aklaver.com
------ Mensagem original------
De: Ravi Krishna
Data: qui, 21 de mai de 2020 18:50
Para:
Cc:PostgreSQL Mailing Lists;
Assunto:Re: Table partitioning for cloud service?
The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.
1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. I am not sure
whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade. It is typical to do rolling upgrades
in a multi tenant databases, starting with least risky tenant.
Be carefull to plan your backup/recovery strategy. How do you plan to recover one customer from backup without interfering with the others?
What will be your disaster recover strategy? Have you considered replication?
Large databases take longer to backup.
Keep schemas in sync can be easily solved.
Regards,
Edson
Import Notes
Resolved by subject fallback