Optimizing a read-only database

Started by François Battailalmost 11 years ago14 messagesgeneral

francois.battail@sipibox.fr

almost 11 years ago

Dear List,

I would like to know if somebody is aware of tricks for optimizing
PostgreSQL settings for a read-only database.
I have a big read-only database (> 1.10^9 records splitted into ~ 10
tables) using GiST and Btree indexes, no foreign keys on tables at all.

I believe that not doing locks on pages could save some time as there's
a lot of simultaneaous readers, but so far I've found nothing about this
specific case in official documentation...

Best regards

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Andreas Kretschmer

andreas@a-kretschmer.de

almost 11 years ago

In reply to: François Battail (#1)

Re: Optimizing a read-only database

François Battail <francois.battail@sipibox.fr> hat am 18. Mai 2015 um 16:07
geschrieben:

Dear List,

I would like to know if somebody is aware of tricks for optimizing
PostgreSQL settings for a read-only database.

you can set fillfactor to 100

alter table ... set (fillfactor = 100), see
http://www.postgresql.org/docs/9.4/static/sql-altertable.html and
http://www.postgresql.org/docs/9.4/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS.

And you can disable VACUUM.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

François Battail

francois.battail@sipibox.fr

almost 11 years ago

In reply to: Andreas Kretschmer (#2)

Re: Optimizing a read-only database

Le 18/05/2015 16:20, Andreas Kretschmer a écrit :

Thank you Andreas,

you can set fillfactor to 100

Yes, but it's already the default value according to documentation.

And you can disable VACUUM.

Already done ;-)

I was more dreaming of something like "disable read write locks or
mutexes" when accessing the database in read-only mode, but sadly this
case seems unhandled.

Best regards

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

William Dunn

dunnwjr@gmail.com

almost 11 years ago

In reply to: François Battail (#1)

Re: Optimizing a read-only database

Hello François,

- With read-only work loads you can make shared_buffers very large, like
40% of RAM available to the database. Usually you would keep it lower
because in a write heavy workload large shared_buffers causes checkpoints
to have huge IO, but since you are not making changes in shared_buffers
this will not happen.

- You can also increase checkpoint_timeout to a very large value to
prevent checkpoints, since you don't need them. WAL level can be minimal as
well.

- You can also run a CLUSTER command on one of your indexes to group
data that is frequently accessed together into the same segment of disk so
you can get more of it in a single IO operation.

- You can also run the VACUUM FULL command during off-hours to get your
tables vacuumed and statistics up-to-date. It's usually too much overhead
to be worthwhile but since you are not doing updates you only have to do it
once then don't need to worry about autovacuum being aggressive enough.

- I don't think that removing locks will provide any benefit if your
queries are truly read-only since ordinary read-only transactions do not
require any locks

That's all I can think of right now!
Will

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Mon, May 18, 2015 at 10:07 AM, François Battail <
francois.battail@sipibox.fr> wrote:

Show quoted text

Dear List,

I would like to know if somebody is aware of tricks for optimizing
PostgreSQL settings for a read-only database.
I have a big read-only database (> 1.10^9 records splitted into ~ 10
tables) using GiST and Btree indexes, no foreign keys on tables at all.

I believe that not doing locks on pages could save some time as there's a
lot of simultaneaous readers, but so far I've found nothing about this
specific case in official documentation...

Best regards

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

François Battail

francois.battail@sipibox.fr

almost 11 years ago

In reply to: William Dunn (#4)

Re: Optimizing a read-only database

Le 18/05/2015 16:38, William Dunn a écrit :

Thank you William,

* With read-only work loads you can make shared_buffers very large,
like 40% of RAM available to the database. Usually you would keep it
lower because in a write heavy workload large shared_buffers causes
checkpoints to have huge IO, but since you are not making changes in
shared_buffers this will not happen.

Yes, good idea.

* You can also increase checkpoint_timeout to a very large value to
prevent checkpoints, since you don't need them. WAL level can be
minimal as well.

Already set to 5 min with 50 segments and 0.9 completion target (but
used also for the bulk loading). But of course I will set it to 1 hour
when in read only mode.

* You can also run a CLUSTER command on one of your indexes to group
data that is frequently accessed together into the same segment of
disk so you can get more of it in a single IO operation.

Hum... I was planning to put indexes and data on different disks (SSD) /
controller to maximize bandwith use, am I wrong?

* You can also run the VACUUM FULL command during off-hours to get
your tables vacuumed and statistics up-to-date. It's usually too
much overhead to be worthwhile but since you are not doing updates
you only have to do it once then don't need to worry about
autovacuum being aggressive enough.

Vacuum is done at the end of the import and then set to off.

* I don't think that removing locks will provide any benefit if your
queries are truly read-only since ordinary read-only transactions do
not require any locks

At least a read write lock should be needed, but you're right: better
take a look at the source code to be sure.

Best regards

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

William Dunn

dunnwjr@gmail.com

almost 11 years ago

In reply to: François Battail (#5)

Re: Optimizing a read-only database

On Mon, May 18, 2015 at 10:54 AM, François Battail <
francois.battail@sipibox.fr> wrote:

Le 18/05/2015 16:38, William Dunn a écrit :

* You can also run a CLUSTER command on one of your indexes to group

data that is frequently accessed together into the same segment of
disk so you can get more of it in a single IO operation.

Hum... I was planning to put indexes and data on different disks (SSD) /
controller to maximize bandwith use, am I wrong?

Hello François - the CLUSTER command doesn't have to do with where your
indexes are. What the CLUSTER command does is physically sort the table
data based on the index (Doc:
http://www.postgresql.org/docs/devel/static/sql-cluster.html). So for
example if you have a column called 'region_code' with an index and CLUSTER
the table by that index all the rows for 'region_code'=15 will be located
next to each other on disk and can be read in the same IO operation. The
usual disadvantage of CLUSTER is that it does not maintain itself, but
since your data is read-only that doesn't matter. And yes you can still
have the index on an SSD and the source table on slower storage.

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com>*

On Mon, May 18, 2015 at 10:54 AM, François Battail <
francois.battail@sipibox.fr> wrote:

Show quoted text

Le 18/05/2015 16:38, William Dunn a écrit :

Thank you William,

* With read-only work loads you can make shared_buffers very large,

like 40% of RAM available to the database. Usually you would keep it
lower because in a write heavy workload large shared_buffers causes
checkpoints to have huge IO, but since you are not making changes in
shared_buffers this will not happen.

Yes, good idea.

* You can also increase checkpoint_timeout to a very large value to

prevent checkpoints, since you don't need them. WAL level can be
minimal as well.

Already set to 5 min with 50 segments and 0.9 completion target (but used
also for the bulk loading). But of course I will set it to 1 hour when in
read only mode.

* You can also run a CLUSTER command on one of your indexes to group

data that is frequently accessed together into the same segment of
disk so you can get more of it in a single IO operation.

Hum... I was planning to put indexes and data on different disks (SSD) /
controller to maximize bandwith use, am I wrong?

* You can also run the VACUUM FULL command during off-hours to get

your tables vacuumed and statistics up-to-date. It's usually too
much overhead to be worthwhile but since you are not doing updates
you only have to do it once then don't need to worry about
autovacuum being aggressive enough.

Vacuum is done at the end of the import and then set to off.

* I don't think that removing locks will provide any benefit if your

queries are truly read-only since ordinary read-only transactions do
not require any locks

At least a read write lock should be needed, but you're right: better take
a look at the source code to be sure.

Best regards

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

François Battail

francois.battail@sipibox.fr

almost 11 years ago

In reply to: William Dunn (#6)

Re: Optimizing a read-only database

Le 18/05/2015 17:20, William Dunn a écrit :

Hello William,

Hello François - the CLUSTER command doesn't have to do with where your
indexes are. What the CLUSTER command does is physically sort the table
data based on the index (Doc:
http://www.postgresql.org/docs/devel/static/sql-cluster.html). So for
example if you have a column called 'region_code' with an index and
CLUSTER the table by that index all the rows for 'region_code'=15 will
be located next to each other on disk and can be read in the same IO
operation. The usual disadvantage of CLUSTER is that it does not
maintain itself, but since your data is read-only that doesn't matter.
And yes you can still have the index on an SSD and the source table on
slower storage.

My bad, got it. May be interesting but as I have a lot of indexes it
will be hard to test and to choose the best candidate. No idea of how it
can affect EWKB data indexed by a GiST (PostGIS) index, but it's
something to try just to know.

Thanks a lot, may be I will be able to do something with that!

Best regards

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Harald Fuchs

hari.fuchs@gmail.com

almost 11 years ago

In reply to: François Battail (#1)

Re: Optimizing a read-only database

Franï¿œois Battail <francois.battail@sipibox.fr> writes:

My bad, got it. May be interesting but as I have a lot of indexes it
will be hard to test and to choose the best candidate. No idea of how
it can affect EWKB data indexed by a GiST (PostGIS) index, but it's
something to try just to know.

You could also raise the statistics target and re-analyze. This will
take some time, but then your query plans might be better.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Sameer Thakur

samthakur74@gmail.com

almost 11 years ago

In reply to: François Battail (#1)

Re: Optimizing a read-only database

Hello,
You could disable fsync as write reliability is not relevant
regards
Sameer

--
View this message in context: http://postgresql.nabble.com/Optimizing-a-read-only-database-tp5849746p5850103.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Laurenz Albe

laurenz.albe@cybertec.at

almost 11 years ago

In reply to: Sameer Thakur (#9)

Re: Optimizing a read-only database

Sameer Thakur wrote:

You could disable fsync as write reliability is not relevant

That is bad advice.
If there are no writes, fsync won't hurt anyway.
Never disable fsync for anything but test systems.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Sameer Thakur

samthakur74@gmail.com

almost 11 years ago

In reply to: François Battail (#3)

Re: Optimizing a read-only database

Hello,

I was more dreaming of something like "disable read write locks or
mutexes" when accessing the database in read-only mode, but sadly this
case seems unhandled.

You could use transactions in read only mode. They do not generate
XID's,which reduces the
need to do VACUUM to protect against XID wraparound.

Ref: http://postgresql.nabble.com/read-only-transactions-td3209290.html

--
View this message in context: http://postgresql.nabble.com/Optimizing-a-read-only-database-tp5849746p5850107.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12

Sameer Thakur

samthakur74@gmail.com

almost 11 years ago

In reply to: Laurenz Albe (#10)

Re: Optimizing a read-only database

Hello

That is bad advice.
If there are no writes, fsync won't hurt anyway.
Never disable fsync for anything but test systems.

Yep. Its a bad way to speed up writes. Not relevant to this context and bad
anyway
regards
Sameer

--
View this message in context: http://postgresql.nabble.com/Optimizing-a-read-only-database-tp5849746p5850108.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13

Sameer Thakur

samthakur74@gmail.com

almost 11 years ago

In reply to: François Battail (#1)

Re: Optimizing a read-only database

Hello,

In addition to what has already been suggested

1. Use

VACUUM FREEZE ANALYZE;

Otherwise you will still have some trickle of write-activity going on,
not always efficiently, despite being in read-only mode. It's because
of what's referred to as Hint Bits:
http://wiki.postgresql.org/wiki/Hint_Bits

2. Low value for maintenance_work_mem

Ref:http://postgresql.nabble.com/How-to-configure-a-read-only-database-server-td4311924.html

regards
Sameer

--
View this message in context: http://postgresql.nabble.com/Optimizing-a-read-only-database-tp5849746p5850109.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14

François Battail

francois.battail@sipibox.fr

almost 11 years ago

In reply to: François Battail (#1)

Re: Optimizing a read-only database

Dear List,

Thank you for all for your advices, even if there's not a direct and
magical solution, I've now some paths to try.

I really enjoy the PostgreSQL community.

Wish you a nice day/night, best regards.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general