Experience with many schemas vs many databases

Started by undisclosed userover 16 years ago14 messagesgeneral
Jump to latest
#1undisclosed user
lovetodrinkpepsi@gmail.com

Hello everyone,

I have hit a wall on completing a solution I am working on. Originally, the
app used a db per user (on MyIsam)....the solution did not fair so well in
reliability and performance. I have been increasingly interested in Postgres
lately.

Currently, I have about 30-35k users/databases. The general table layout is
the same....only the data is different. I don't need to share data across
databases. Very similar to a multi-tenant design.

Here are a few questions I have:

1. Could postgres support this many DBs? Are there any weird things that
happen when the postgres is used this way?
2. Is the schema method better? Performance, maintainability, backups,
vacuum? Weird issues?

Any incite is greatly appreciated.

Thanks.
Frank

#2John R Pierce
pierce@hogranch.com
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

undisclosed user wrote:

Hello everyone,

I have hit a wall on completing a solution I am working on.
Originally, the app used a db per user (on MyIsam)....the solution did
not fair so well in reliability and performance. I have been
increasingly interested in Postgres lately.

Currently, I have about 30-35k users/databases. The general table
layout is the same....only the data is different. I don't need to
share data across databases. Very similar to a multi-tenant design.

35000 users with separate databases isn't going to scale well on ANY
conventional system I'm familiar with

#3Johan Nel
johan.nel@xsinet.co.za
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

undisclosed user wrote:

I have hit a wall on completing a solution I am working on. Originally,
the app used a db per user (on MyIsam)....the solution did not fair so
well in reliability and performance. I have been increasingly interested
in Postgres lately.

Currently, I have about 30-35k users/databases. The general table layout
is the same....only the data is different. I don't need to share data
across databases. Very similar to a multi-tenant design.

Here are a few questions I have:

1. Could postgres support this many DBs? Are there any weird things that
happen when the postgres is used this way?

As John indicated, not any traditional environment that will handle that
well..

2. Is the schema method better? Performance, maintainability, backups,
vacuum? Weird issues?

I would rather use schemas to logically group tables together. Insert a
user_id column in the tables and ensure each user can only see the rows he
has access to via query design to limit user access. Something in the
line of:

CREATE OR REPLACE VIEW SomeTableQuery AS
SELECT * FROM SomeTable WHERE user_id = current_user;

Where SomeTable has a column user_id that defaults to current_user.

Johan Nel
Pretoria, South Africa.

#4Jorge Godoy
jgodoy@gmail.com
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

Frank,

I had the same questioning a while ago and another thing that made me think
was the amount of data per user.

In the end, I decided on using a single DB and single schema and add a
clause to split everything by each customer (customer_id).

I then added an index on that column and my code became simpler and fast
enough.

This also allowed me to work with some other aggregates that provided very
useful "global" statistics.

--
Jorge Godoy <jgodoy@gmail.com>

On Sun, Nov 15, 2009 at 04:28, undisclosed user
<lovetodrinkpepsi@gmail.com>wrote:

Show quoted text

Hello everyone,

I have hit a wall on completing a solution I am working on. Originally, the
app used a db per user (on MyIsam)....the solution did not fair so well in
reliability and performance. I have been increasingly interested in Postgres
lately.

Currently, I have about 30-35k users/databases. The general table layout is
the same....only the data is different. I don't need to share data across
databases. Very similar to a multi-tenant design.

Here are a few questions I have:

1. Could postgres support this many DBs? Are there any weird things that
happen when the postgres is used this way?
2. Is the schema method better? Performance, maintainability, backups,
vacuum? Weird issues?

Any incite is greatly appreciated.

Thanks.
Frank

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

undisclosed user <lovetodrinkpepsi@gmail.com> writes:

I have hit a wall on completing a solution I am working on. Originally, the
app used a db per user (on MyIsam)....the solution did not fair so well in
reliability and performance. I have been increasingly interested in Postgres
lately.

Currently, I have about 30-35k users/databases. The general table layout is
the same....only the data is different. I don't need to share data across
databases. Very similar to a multi-tenant design.

Use multiple schemas, not multiple databases. If you had it working in
mysql then what you were using was more nearly schemas than databases
anyway --- it's unfortunate that the two systems use the same word
"database" for what are really different structures.

regards, tom lane

#6Merlin Moncure
mmoncure@gmail.com
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

On Sun, Nov 15, 2009 at 1:28 AM, undisclosed user
<lovetodrinkpepsi@gmail.com> wrote:

Hello everyone,
I have hit a wall on completing a solution I am working on. Originally, the
app used a db per user (on MyIsam)....the solution did not fair so well in
reliability and performance. I have been increasingly interested in Postgres
lately.
Currently, I have about 30-35k users/databases. The general table layout is
the same....only the data is different. I don't need to share data across
databases. Very similar to a multi-tenant design.
Here are a few questions I have:
1. Could postgres support this many DBs? Are there any weird things that
happen when the postgres is used this way?
2. Is the schema method better? Performance, maintainability, backups,
vacuum? Weird issues?

Use schema. Here's a pro tip: if you have any sql or pl/pgsql
functions you can use the same function body across all the schema as
long as you discard the plans when you want to move from schema to
schema.

I'm curious if those suggesting there is a practical upper limit of
the number of schema postgres can handle have any hard information to
back that up...

merlin

#7undisclosed user
lovetodrinkpepsi@gmail.com
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

If I were to switch to a single DB/single schema format shared among all
users , how can I backup each user individually?

Frank

On Sat, Nov 14, 2009 at 10:28 PM, undisclosed user <
lovetodrinkpepsi@gmail.com> wrote:

Show quoted text

Hello everyone,

I have hit a wall on completing a solution I am working on. Originally, the
app used a db per user (on MyIsam)....the solution did not fair so well in
reliability and performance. I have been increasingly interested in Postgres
lately.

Currently, I have about 30-35k users/databases. The general table layout is
the same....only the data is different. I don't need to share data across
databases. Very similar to a multi-tenant design.

Here are a few questions I have:

1. Could postgres support this many DBs? Are there any weird things that
happen when the postgres is used this way?
2. Is the schema method better? Performance, maintainability, backups,
vacuum? Weird issues?

Any incite is greatly appreciated.

Thanks.
Frank

#8Scott Marlowe
scott.marlowe@gmail.com
In reply to: Merlin Moncure (#6)
Re: Experience with many schemas vs many databases

On Sun, Nov 15, 2009 at 11:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

Use schema.  Here's a pro tip: if you have any sql or pl/pgsql
functions you can use the same function body across all the schema as
long as you discard the plans when you want to move from schema to
schema.

I too vote for schemas.

I'm curious if those suggesting there is a practical upper limit of
the number of schema postgres can handle have any hard information to
back that up...

The real limit is performance of certain things over the catalogs, not
the number of schemas, but how many objects are in the db seem to
impact me more, and that's only with slony. Everything else runs fine
with ~40k objects in my db.

#9John R Pierce
pierce@hogranch.com
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

undisclosed user wrote:

Currently, I have about 30-35k users/databases. The general table
layout is the same....only the data is different. I don't need to
share data across databases. Very similar to a multi-tenant design.

Do these users make their own arbitrary SQL queries? Or is all the
coding canned, and they are simply running applications? in the latter
case, I would definitely suggest using a single database and schema, and
one set of tables and having CustomerID be a field that you index in
these tables.

#10John R Pierce
pierce@hogranch.com
In reply to: undisclosed user (#7)
Re: Experience with many schemas vs many databases

undisclosed user wrote:

If I were to switch to a single DB/single schema format shared among
all users , how can I backup each user individually?

depending on how many tables, etc, I suppose you could use a seperate
series of SELECT statements ...
but if this is a requirement, it certainly puts constraints on how you
organize your data. without a much deeper knowlege of your
application, data, and requirements, its kind of hard to give any sort
of recommendations. you mentioned myISAM, so I gather this data isn't
at all transactional, nor is relational integrity a priority.

#11undisclosed user
lovetodrinkpepsi@gmail.com
In reply to: John R Pierce (#10)
Re: Experience with many schemas vs many databases

The app is very similar to wordpress MU. Each user has the same schema but
different data. The app uses the same codebase for every user. Users do not
have direct access to data. Currently, the DB is 90% r / 10% w and about
80GB MyISAM. Most of the queries are simple (75%)...the rest are joins
(25%). I am using myisam but I have too many concurrency and table crash
issues... Mysql Fulltext search is horrible and causes a lot of
lockups....tsearch2 seems like a good solution for us.

Basically, I want:
1. Good concurrency / decent performance
2. Data integrity
3. Fast Search
4. Ability to backup per user

Backing up data by user is required for my solution. A lot of times, users
screw up and they want to rollback to a previous state.

If I were to do a database per user, the backup/restore would be very
straight-forward. I believe backup/restore procedure is similar for schemas
(let me know if I am wrong here)? If I were to do a single schema/database,
is it possible to get data per user and back it up? Select user rows, copy
to a temp table/db, backup?

Thanks,
Frank

On Sun, Nov 15, 2009 at 1:11 PM, John R Pierce <pierce@hogranch.com> wrote:

Show quoted text

undisclosed user wrote:

If I were to switch to a single DB/single schema format shared among all
users , how can I backup each user individually?

depending on how many tables, etc, I suppose you could use a seperate
series of SELECT statements ...
but if this is a requirement, it certainly puts constraints on how you
organize your data. without a much deeper knowlege of your application,
data, and requirements, its kind of hard to give any sort of
recommendations. you mentioned myISAM, so I gather this data isn't at all
transactional, nor is relational integrity a priority.

#12Loyal
lbarbertexas@gmail.com
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

On Nov 15, 1:07 pm, lovetodrinkpe...@gmail.com (undisclosed user)
wrote:

If I were to switch to a single DB/single schema format shared among all
users , how can I backup each user individually?

Frank
<snip>

I would love to understand why that would be a requirement. I would
much
prefer backing up one database/schema. If I needed to restore a
single user
I then have options such as restoring to my test database then
extracting
that single user's data, though it is still unclear to me why that
would be a
requirement. If it is so single users can get a backup their own
data, I would
do it using a generic script or function. Since your users would see
their own
views only that should work fine.

Loyal

#13Peter Hunsberger
peter.hunsberger@gmail.com
In reply to: undisclosed user (#11)
Re: Experience with many schemas vs many databases

On Sun, Nov 15, 2009 at 3:45 PM, undisclosed user
<lovetodrinkpepsi@gmail.com> wrote:

Backing up data by user is required for my solution. A lot of times, users
screw up and they want to rollback to a previous state.

In that case, being able to revert the state of an application should
be part of the application (and database) design and not rely on
database: history tables, etc...

--
Peter Hunsberger

#14Loyal
lbarbertexas@gmail.com
In reply to: undisclosed user (#1)
Re: Experience with many schemas vs many databases

On Nov 15, 3:45 pm, lovetodrinkpe...@gmail.com (undisclosed user)
wrote:

The app is very similar to wordpress MU. Each user has the same schema but
different data.
<snip>

4. Ability to backup per user

Backing up data by user is required for my solution. A lot of times, users
screw up and they want to rollback to a previous state.
<snip>

I would still lean toward single schema, but that is just me. To make
that work, I would need the following capabilities:
1. Ability to backup the entire database
2. Ability to restore the entire database elsewhere
3. Ability to restore a single user

1. The software exists as you know. BTW, the software also exists
from PG if you wish to make one user per schema. You have to have the
space to back it up. 80 GB might come out much larger in fact I am
certain it would.
2. This requires that you have a test database you can restore to with
enough room for the full data set plus the external file for the
restore. The software is already available from PG.
3. This one depends on how automated you need the solution. If it has
to be highly automated (read repeatable and less prone to error) that
will require you to write the code to extract the data for a single
user from the backup, delete the data for the single user, and restore
the backup for the single user.
3a. If the occurrence can have less automation, you can use a tool to
do most of the heavy lifting for you. I use phpPgAdmin. It can do
the selects you need then allow you to do the deletes. It can extract
the data for the user from the backup either in copy or distinct
insert statements. Finally, if you want to get a start on the code,
phpPgAdmin will in essence do that for you. When you do a query, it
will usually give you an option to "edit" the SQL. This gives you
copy/paste access to the code the tool has written.

Loyal