Smaller multiple tables or one large table?

Started by Benedict Hollandalmost 14 years ago5 messagesgeneral

Benedict Holland

benedict.m.holland@gmail.com

almost 14 years ago

Hi All,

I am on postgres 9.0. I don't know the answer to what should be a fairly
straight forward question. I have several static tables which are very
large (around the order of 14 million rows and about 10GB). They are all
linked together through foreign keys and indexed on rows which are queried
and used most often. While they are more or less static, update operations
do occur. This is not on a super fast computer. It has 2 cores with 8gb of
ram so I am not expecting queries against them to be very fast but I am
wondering in a structural sense if I should be dividing up the tables into
1 million row tables through constraints and a view. The potential speedup
I could see being quite large where postgresql would split off all of the
queries into n table chucks running on k cores and then aggregate all of
the data for display or operation. Is there any documentation to make
postgesql do this and is it worth it?

Also, is there a benefit to have one large table or many small tables as
far indexes go?

Thanks,
~Ben

pierce@hogranch.com

almost 14 years ago

In reply to: Benedict Holland (#1)

Re: Smaller multiple tables or one large table?

On 06/15/12 11:34 AM, Benedict Holland wrote:

I am on postgres 9.0. I don't know the answer to what should be a
fairly straight forward question. I have several static tables which
are very large (around the order of 14 million rows and about 10GB).
They are all linked together through foreign keys and indexed on rows
which are queried and used most often. While they are more or less
static, update operations do occur. This is not on a super fast
computer. It has 2 cores with 8gb of ram so I am not expecting queries
against them to be very fast but I am wondering in a structural sense
if I should be dividing up the tables into 1 million row tables
through constraints and a view. The potential speedup I could see
being quite large where postgresql would split off all of the queries
into n table chucks running on k cores and then aggregate all of the
data for display or operation. Is there any documentation to make
postgesql do this and is it worth it?

postgres won't do that, one query is one process. your application
could conceivably run multiple threads, each with a seperate postgres
connection, and execute multiple queries in parallel, but it would have
to do any aggregation of the results itself.

Also, is there a benefit to have one large table or many small tables
as far indexes go?

small tables only help if you can query the specific table you 'know'
has your data, for instance, if you have time based data, and you put a
month in each table, and you know that this query only needs to look at
the current month, so you just query that one month's table.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

Benedict Holland

benedict.m.holland@gmail.com

almost 14 years ago

In reply to: John R Pierce (#2)

Re: Smaller multiple tables or one large table?

Will the processes know that I have n tables which are constrained in their
definition on primary keys? I am thinking a table constraint specifying
that the primary key on that table is within some boundary. That way the
single process can spawn one thread per n table and leave the thread
management to the OS. Assuming it is well behaved, this should use every
ounce of resource I throw at it and instead of sequentially going though
one large table, it will sequentially go through 1 of n short tables in
parallel with k other tables. The results of this would have to be
aggregated but with a large enough table, the aggregation would pale in
comparison to the run time of the query split between several smaller
tables.

The tables would have to be specified with a table pk constraint falling
between two ranges. A view would then be created to manage all of the small
tables with triggers handling insert and update operations. Select would
have to be view specific but that is really cheap compared to updates. That
should have the additional benefit of only hitting a specific table(s) with
an update.

Basically, I don't see how this particular configuration breaks and if
PostgreSQL already has the ability to do this as it seems very useful to
manage very large data sets.

Thanks,
~Ben

On Fri, Jun 15, 2012 at 2:42 PM, John R Pierce <pierce@hogranch.com> wrote:

Show quoted text

On 06/15/12 11:34 AM, Benedict Holland wrote:

I am on postgres 9.0. I don't know the answer to what should be a fairly
straight forward question. I have several static tables which are very
large (around the order of 14 million rows and about 10GB). They are all
linked together through foreign keys and indexed on rows which are queried
and used most often. While they are more or less static, update operations
do occur. This is not on a super fast computer. It has 2 cores with 8gb of
ram so I am not expecting queries against them to be very fast but I am
wondering in a structural sense if I should be dividing up the tables into
1 million row tables through constraints and a view. The potential speedup
I could see being quite large where postgresql would split off all of the
queries into n table chucks running on k cores and then aggregate all of
the data for display or operation. Is there any documentation to make
postgesql do this and is it worth it?

postgres won't do that, one query is one process. your application could
conceivably run multiple threads, each with a seperate postgres connection,
and execute multiple queries in parallel, but it would have to do any
aggregation of the results itself.

Also, is there a benefit to have one large table or many small tables as
far indexes go?

small tables only help if you can query the specific table you 'know' has
your data, for instance, if you have time based data, and you put a month
in each table, and you know that this query only needs to look at the
current month, so you just query that one month's table.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>

Gabriele Bartolini

gabriele.bartolini@2ndQuadrant.it

almost 14 years ago

In reply to: Benedict Holland (#3)

Re: Smaller multiple tables or one large table?

Hi Benedict,

Il 15/06/12 20:58, Benedict Holland ha scritto:

The tables would have to be specified with a table pk constraint
falling between two ranges. A view would then be created to manage all
of the small tables with triggers handling insert and update
operations. Select would have to be view specific but that is really
cheap compared to updates. That should have the additional benefit of
only hitting a specific table(s) with an update.

Basically, I don't see how this particular configuration breaks and if
PostgreSQL already has the ability to do this as it seems very useful
to manage very large data sets.

What you are looking for is called 'partitioning' (horizontal
partitioning). I suggest that you read this chapter:
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

Cheers,
Gabriele

--
Gabriele Bartolini - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it

Benedict Holland

benedict.m.holland@gmail.com

almost 14 years ago

In reply to: Gabriele Bartolini (#4)

Re: Smaller multiple tables or one large table?

Hi all,

I am curious if there is a significant speed up with doing this if most of
the queries run against it are going to be table wide. I won't drop the
data and the data won't really grow. Do I get better speedup with one large
table and large indexes or many small tables with many small indexes?

Thanks,
~Ben

On Sat, Jun 16, 2012 at 2:13 AM, Gabriele Bartolini <
gabriele.bartolini@2ndquadrant.it> wrote:

Show quoted text

Hi Benedict,

Il 15/06/12 20:58, Benedict Holland ha scritto:

The tables would have to be specified with a table pk constraint falling

between two ranges. A view would then be created to manage all of the small
tables with triggers handling insert and update operations. Select would
have to be view specific but that is really cheap compared to updates. That
should have the additional benefit of only hitting a specific table(s) with
an update.

Basically, I don't see how this particular configuration breaks and if
PostgreSQL already has the ability to do this as it seems very useful to
manage very large data sets.

What you are looking for is called 'partitioning' (horizontal
partitioning). I suggest that you read this chapter:
http://www.postgresql.org/**docs/9.1/static/ddl-**partitioning.html<http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html>

Cheers,
Gabriele

--
Gabriele Bartolini - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gabriele.bartolini@**2ndQuadrant.it | www.2ndQuadrant.it