Sizing of LARGE databases.

Started by Michael Miyabara-McCaskeyabout 25 years ago8 messagesgeneral
Jump to latest
#1Michael Miyabara-McCaskey
mykarz@miyabara.com

Hello there,

I have a DB that I am attempting to buy "the right hardware" for.

This DB today is about 25GB total, with typical tables in the area of 3GB in
size, and up to about 20 million records per table for the big tables.

Is there any set of recommendations for sizing a DB using PostgreSQL? (I
have all the docs but have not found anything of any use).

This DB itself will be mostly reads, for a decision system. However, there
will be a lot more data added in the future where the DB could easily grow
to about 150GB in size, and up to about 60millions records for the large
tables.

My current setup is Linux RH7 w/PostgreSQL 7.0.2, my intention is to build a
large array of Linux Clusters, but I also can not find any documentation on
how if at all PostgreSQL handles a clustered environment. (Although I did
see something that said you could have multiple "databases" and use a create
view to stich together multiple tables from different DBs in a HOWTO on the
www.kernel.org/LDP I have also noticed that the doc is longer there, and
people who have posted about how to do this on these forums have been told
this method does not work).

Any thoughts would be greatly appreciated.

Michael Miyabara-McCaskey
Email: mykarz@miyabara.com
Web: http://www.miyabara.com/mykarz/
Mobile: +1 408 504 9014

#2Justin Clift
justin@postgresql.org
In reply to: Michael Miyabara-McCaskey (#1)
Re: Sizing of LARGE databases.

Hi Michael,

I have recently been working on a database which is presently 14 GB in
size and will grow to nearly 1/2 terabyte in the next 12 months. This
database is mostly large static two column tables needing fast searches.

So far what I've done is split the table into various subtables - for
example the entries starting with 'AAA' go in one table, the entries
starting with 'AAB' go in another table, etc. After this is done I then
index the individual tables.

As each table and each index is it's own file, I then stop the
PostgreSQL daemon and move the individual files & indexes onto separate
RAID drives, then create soft links from the
<postgres>/data/base/<database> directory to the files.

i.e.

/opt/postgres/data/base/<database>/AAA -> /drives/AAA/AAA
/opt/postgres/data/base/<database>/AAA_idx -> /drives/AAA/AAA_idx
/opt/postgres/data/base/<database>/AAB -> /drives/AAB/AAB
/opt/postgres/data/base/<database>/AAB_idx -> /drives/AAB/AAB_idx

and so on.

The biggest limitation I have found with this is the
/opt/postgres/data/pg_log file seeming to need to log (write) bunches of
data, even when just doing searches (reads) on indexes on other tables.
No matter how fast all the other disk subsystems are, the speed of the
disk system the pg_log file is on creates an 'artificial' upper
throughput limit.

My recommendation would be (for mostly static data just doing look-ups)
to split things into many logical tables and move these tables onto
seperate RAID subsystems. Then put the pg_log file onto the fastest
disk subsystem you can buy. I haven't yet moved the pg_log file to a
different disk than the main postgresql installation and created a soft
link to it (this is the next step) but hopefully it won't be a problem.

The system in question is PostgreSQL 7.0.3 running on Solaris 7.

Regards and best wishes,

Justin Clift
Database Administrator

Michael Miyabara-McCaskey wrote:

Show quoted text

Hello there,

I have a DB that I am attempting to buy "the right hardware" for.

This DB today is about 25GB total, with typical tables in the area of 3GB in
size, and up to about 20 million records per table for the big tables.

Is there any set of recommendations for sizing a DB using PostgreSQL? (I
have all the docs but have not found anything of any use).

This DB itself will be mostly reads, for a decision system. However, there
will be a lot more data added in the future where the DB could easily grow
to about 150GB in size, and up to about 60millions records for the large
tables.

My current setup is Linux RH7 w/PostgreSQL 7.0.2, my intention is to build a
large array of Linux Clusters, but I also can not find any documentation on
how if at all PostgreSQL handles a clustered environment. (Although I did
see something that said you could have multiple "databases" and use a create
view to stich together multiple tables from different DBs in a HOWTO on the
www.kernel.org/LDP I have also noticed that the doc is longer there, and
people who have posted about how to do this on these forums have been told
this method does not work).

Any thoughts would be greatly appreciated.

Michael Miyabara-McCaskey
Email: mykarz@miyabara.com
Web: http://www.miyabara.com/mykarz/
Mobile: +1 408 504 9014

#3Francisco Reyes
fran@reyes.somos.net
In reply to: Justin Clift (#2)
Re: [NOVICE] Re: Sizing of LARGE databases.

On Wed, 31 Jan 2001, Justin Clift wrote:

The biggest limitation I have found with this is the
/opt/postgres/data/pg_log file seeming to need to log (write) bunches of
data, even when just doing searches (reads) on indexes on other tables.

Is there a way to turn off this logging?
How about setting a virtual disk big enough to hold 5 to 10 minutes worth
of logs and then moving/deleting the log before that time (if it is
possible to delete/move the log).

#4Michael Miyabara-McCaskey
mykarz@miyabara.com
In reply to: Justin Clift (#2)
RE: Sizing of LARGE databases.

Justin,

Thank you for the response.

Out of curosity, what type of memory usage are you experiencing with the
current setup (14GB)?

And have you figured out what your upgrade path may be? For instance, I
have not been able to find any evidence on how to make PostgreSQL scale
(clustered nodes, parallel fail-over, high availability etc)

How has your site decided to handle this?

-Michael

Show quoted text

-----Original Message-----
From: website@neptune.he.net
[mailto:website@neptune.he.net]On Behalf Of
Justin Clift
Sent: Tuesday, January 30, 2001 9:31 PM
To: mykarz@miyabara.com
Cc: pgsql-novice@postgresql.org; pgsql-general@postgresql.org
Subject: Re: Sizing of LARGE databases.

Hi Michael,

I have recently been working on a database which is presently 14 GB in
size and will grow to nearly 1/2 terabyte in the next 12 months. This
database is mostly large static two column tables needing
fast searches.

So far what I've done is split the table into various subtables - for
example the entries starting with 'AAA' go in one table, the entries
starting with 'AAB' go in another table, etc. After this is
done I then
index the individual tables.

As each table and each index is it's own file, I then stop the
PostgreSQL daemon and move the individual files & indexes
onto separate
RAID drives, then create soft links from the
<postgres>/data/base/<database> directory to the files.

i.e.

/opt/postgres/data/base/<database>/AAA -> /drives/AAA/AAA
/opt/postgres/data/base/<database>/AAA_idx -> /drives/AAA/AAA_idx
/opt/postgres/data/base/<database>/AAB -> /drives/AAB/AAB
/opt/postgres/data/base/<database>/AAB_idx -> /drives/AAB/AAB_idx

and so on.

The biggest limitation I have found with this is the
/opt/postgres/data/pg_log file seeming to need to log (write)
bunches of
data, even when just doing searches (reads) on indexes on
other tables.
No matter how fast all the other disk subsystems are, the speed of the
disk system the pg_log file is on creates an 'artificial' upper
throughput limit.

My recommendation would be (for mostly static data just doing
look-ups)
to split things into many logical tables and move these tables onto
seperate RAID subsystems. Then put the pg_log file onto the fastest
disk subsystem you can buy. I haven't yet moved the pg_log file to a
different disk than the main postgresql installation and
created a soft
link to it (this is the next step) but hopefully it won't be
a problem.

The system in question is PostgreSQL 7.0.3 running on Solaris 7.

Regards and best wishes,

Justin Clift
Database Administrator

Michael Miyabara-McCaskey wrote:

Hello there,

I have a DB that I am attempting to buy "the right hardware" for.

This DB today is about 25GB total, with typical tables in

the area of 3GB in

size, and up to about 20 million records per table for the

big tables.

Is there any set of recommendations for sizing a DB using

PostgreSQL? (I

have all the docs but have not found anything of any use).

This DB itself will be mostly reads, for a decision system.

However, there

will be a lot more data added in the future where the DB

could easily grow

to about 150GB in size, and up to about 60millions records

for the large

tables.

My current setup is Linux RH7 w/PostgreSQL 7.0.2, my

intention is to build a

large array of Linux Clusters, but I also can not find any

documentation on

how if at all PostgreSQL handles a clustered environment.

(Although I did

see something that said you could have multiple "databases"

and use a create

view to stich together multiple tables from different DBs

in a HOWTO on the

www.kernel.org/LDP I have also noticed that the doc is

longer there, and

people who have posted about how to do this on these forums

have been told

this method does not work).

Any thoughts would be greatly appreciated.

Michael Miyabara-McCaskey
Email: mykarz@miyabara.com
Web: http://www.miyabara.com/mykarz/
Mobile: +1 408 504 9014

#5Francisco Reyes
fran@reyes.somos.net
In reply to: Michael Miyabara-McCaskey (#4)
Re: [NOVICE] RE: Sizing of LARGE databases.

On Thu, 1 Feb 2001, Michael Miyabara-McCaskey wrote:

And have you figured out what your upgrade path may be? For instance, I
have not been able to find any evidence on how to make PostgreSQL scale
(clustered nodes, parallel fail-over, high availability etc)

I believe 7.1 will have some level of replication, although it seems 7.2
will be the "replication" release.

I don't know how failover would work, but one possibility may be to use a
program like Understudy which monitors an IP address and if a machine dies
then it seems requests to the secondary machine. That combined with
replication from the database may help prevent down time and may also be
used to increase performance by using round-robin with Understudy.

#6Michael Miyabara-McCaskey
mykarz@miyabara.com
In reply to: Francisco Reyes (#5)
RE: [NOVICE] RE: Sizing of LARGE databases.

Francisco,

Excellent idea.

Thanks for the info. Is this what you are doing now? And if so, since the
current version does not appear to have replication, have you found a
workaround?

-Michael

Show quoted text

-----Original Message-----
From: Francisco Reyes [mailto:fran@reyes.somos.net]
Sent: Thursday, February 01, 2001 10:20 PM
To: Michael Miyabara-McCaskey
Cc: 'Justin Clift'; pgsql-novice@postgresql.org;
pgsql-general@postgresql.org
Subject: Re: [NOVICE] RE: Sizing of LARGE databases.

On Thu, 1 Feb 2001, Michael Miyabara-McCaskey wrote:

And have you figured out what your upgrade path may be?

For instance, I

have not been able to find any evidence on how to make

PostgreSQL scale

(clustered nodes, parallel fail-over, high availability etc)

I believe 7.1 will have some level of replication, although
it seems 7.2
will be the "replication" release.

I don't know how failover would work, but one possibility may
be to use a
program like Understudy which monitors an IP address and if a
machine dies
then it seems requests to the secondary machine. That combined with
replication from the database may help prevent down time and
may also be
used to increase performance by using round-robin with Understudy.

#7Francisco Reyes
fran@reyes.somos.net
In reply to: Michael Miyabara-McCaskey (#6)
RE: [NOVICE] RE: Sizing of LARGE databases.

On Fri, 2 Feb 2001, Michael Miyabara-McCaskey wrote:

Francisco,

Excellent idea.

Thanks for the info. Is this what you are doing now? And if so, since the
current version does not appear to have replication, have you found a
workaround?
-Michael

I am new to PostgreSQL so I haven't tried that yet.
As I understand 7.1 is not far from been released. Maybe weeks, maybe a
month or two.

I am right now in the process of doing some initial testing. In particular
I am going to be testing the best way to do bulk uploads.

do you have any experience with this?
I created a test table in Foxpro which I then dumped to tab delimited
file. I am going to try 10,000,000 records.

I plan to test:
- copy from
- inserts
- Mass updates by using deletes first of existing rows and inserting
rest
- Mass updates by using updates of existing rows and inserting rest

Speed of import/updates is important to me because I am going to create a
reporting server as my first project. I will need to do either nightly
dumps of the whole thing or nightly completes and daily updates from the
Foxpro tables.

#8Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Francisco Reyes (#3)
Re: Re: [NOVICE] Re: Sizing of LARGE databases.

On Wed, 31 Jan 2001, Justin Clift wrote:

The biggest limitation I have found with this is the
/opt/postgres/data/pg_log file seeming to need to log (write) bunches of
data, even when just doing searches (reads) on indexes on other tables.

This is not true.

1) We do not write "bunches of data" into pg_log. We write just 2 bits
per transaction.

2) Simple read-only SELECT does not write pg_log at all.
--
Tatsuo Ishii