Partitioning V schema

Started by Dave Pottsover 12 years ago7 messagesgeneral

dave.potts@pinan.co.uk

over 12 years ago

Hi List

I am looking for some general advice about the best was of splitting a
large data table,I have 2 different choices, partitioning or different
schemas.

The data table refers to the number of houses that can be include in a
city, as such there are large number of records.

I am wondering if decided to partition the table if the update
speed/access might be faster that just declaring a different schema per
city.

Under the partition the data table would appear to be smaller, so I
should get an increase in speed, but the database still have to do some
sort of indexing.

If I used different schemas, it resolves data protection issues, but
doing a backup might become a nightmare

In general which is the fast access method?

regards

Dave.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Agustin Larreinegabe

alarreine@gmail.com

over 12 years ago

In reply to: Dave Potts (#1)

Re: Partitioning V schema

If I were you I will use partitioning. In my experience, partitioning is
easier and transparent. I just have to set it up and then refers just to
one table and done.
About speed, if you have the value "constraint_exclusion" = partition,
postgres will examine constraints only for inheritance child tables and
UNION ALL subqueries and will improve the perfomance of your query

http://www.postgresql.org/docs/9.2/static/ddl-partitioning.html
http://www.postgresql.org/docs/9.2/static/runtime-config-query.html

On Thu, Sep 19, 2013 at 3:02 AM, Dave Potts <dave.potts@pinan.co.uk> wrote:

Hi List

I am looking for some general advice about the best was of splitting a
large data table,I have 2 different choices, partitioning or different
schemas.

The data table refers to the number of houses that can be include in a
city, as such there are large number of records.

I am wondering if decided to partition the table if the update
speed/access might be faster that just declaring a different schema per
city.

Under the partition the data table would appear to be smaller, so I should
get an increase in speed, but the database still have to do some sort of
indexing.

If I used different schemas, it resolves data protection issues, but doing
a backup might become a nightmare

In general which is the fast access method?

regards

Dave.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>

--
Gracias
-----------------
Agustín Larreinegabe

Julian

tempura@internode.on.net

over 12 years ago

In reply to: Dave Potts (#1)

Re: Partitioning V schema

Hi Dave,
How many rows of data are we talking here and how much information? (GiB)

Are you able to provide the table definition? (can normalisation
partition off some of this data?).

Have you addressed dedicated options for lookup data, tune the database
appropriately and keep that single table?

With postgres we have schemas, so it can provide some convenience and
design options.

So look at it in terms how your query will look.

SELECT schema.table.column FROM schema.table;

SELECT schema_table.column FROM schema_table;

Not much in it?

However, I tend to go with partitions when required to be generated on
demand dynamically and automatically (which probably isn't the case
here). SCHEMAs have other uses, provide a level of security (GRANT) and
useful in design when partitioning off blocks of related datasets
completely.

Regards,
Julian

On 19/09/13 17:02, Dave Potts wrote:

Hi List

I am looking for some general advice about the best was of splitting
a large data table,I have 2 different choices, partitioning or
different schemas.

The data table refers to the number of houses that can be include in a
city, as such there are large number of records.

I am wondering if decided to partition the table if the update
speed/access might be faster that just declaring a different schema
per city.

Under the partition the data table would appear to be smaller, so I
should get an increase in speed, but the database still have to do
some sort of indexing.

If I used different schemas, it resolves data protection issues, but
doing a backup might become a nightmare

In general which is the fast access method?

regards

Dave.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeff Janes

jeff.janes@gmail.com

over 12 years ago

In reply to: Dave Potts (#1)

Re: Partitioning V schema

On Thu, Sep 19, 2013 at 12:02 AM, Dave Potts <dave.potts@pinan.co.uk> wrote:

Hi List

I am looking for some general advice about the best was of splitting a
large data table,I have 2 different choices, partitioning or different
schemas.

I don't think there is much of a choice there. If you put them in
different schemas, then you are inherently partitioning the data. It just
a question of how you name your partitions, which is more of a naming issue
than a performance issue.

The data table refers to the number of houses that can be include in a
city, as such there are large number of records.

I am wondering if decided to partition the table if the update
speed/access might be faster that just declaring a different schema per
city.

If you partition based on city, then there should be no meaningful
difference. If you partition based on something else, you would have to
describe what it is partitioned on, and what your access patterns are like.

Cheers,

Jeff

Gregory Haase

haaseg@onefreevoice.com

over 12 years ago

In reply to: Jeff Janes (#4)

Re: Partitioning V schema

I would look towards how PostGis handles the Tiger census data for
guidance. It's a similar, massive data set.

Greg Haase
On Sep 20, 2013 9:47 AM, "Jeff Janes" <jeff.janes@gmail.com> wrote:

Show quoted text

On Thu, Sep 19, 2013 at 12:02 AM, Dave Potts <dave.potts@pinan.co.uk>wrote:

Hi List

I am looking for some general advice about the best was of splitting a
large data table,I have 2 different choices, partitioning or different
schemas.

I don't think there is much of a choice there. If you put them in
different schemas, then you are inherently partitioning the data. It just
a question of how you name your partitions, which is more of a naming issue
than a performance issue.

The data table refers to the number of houses that can be include in a
city, as such there are large number of records.

I am wondering if decided to partition the table if the update
speed/access might be faster that just declaring a different schema per
city.

If you partition based on city, then there should be no meaningful
difference. If you partition based on something else, you would have to
describe what it is partitioned on, and what your access patterns are like.

Cheers,

Jeff

Julian

tempura@internode.on.net

over 12 years ago

In reply to: Gregory Haase (#5)

Re: Partitioning V schema

On 21/09/13 02:51, Gregory Haase wrote:

I would look towards how PostGis handles the Tiger census data for
guidance. It's a similar, massive data set.

Greg Haase

I'm not sure why it wouldn't handle it fine?
The question is at what point would third party "imported" datasets,
required for local lookup require their own dedicated solution (database
and/or server)?
Especially when we are talking about a large amounts of data that come
with their own unique global identifiers. Something like ISO 3166, even
small, would do well with its own database - but not large enough to
require a dedicated server.

This is the question I put to the OP, I would be interested to know.

Regards,
Julian

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Luca Ferrari

fluca1978@infinito.it

over 12 years ago

In reply to: Julian (#3)

Re: Partitioning V schema

On Fri, Sep 20, 2013 at 4:38 AM, Julian <tempura@internode.on.net> wrote:

However, I tend to go with partitions when required to be generated on
demand dynamically and automatically (which probably isn't the case
here). SCHEMAs have other uses, provide a level of security (GRANT) and
useful in design when partitioning off blocks of related datasets
completely.

I would do a partition in this case for the same reason: schemas are
much more logical divisions of data due to different grantings, search
paths, users, and so on. Partition is more a data-level split so it
does make sense when you want the data to stay in a single point but
for performance reason split it across multiple tables.

Luca

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general