PostgreSQL Process memory architecture

Started by Ben Zeev, Lioralmost 13 years ago38 messageshackers
Jump to latest
#1Ben Zeev, Lior
lior.ben-zeev@hp.com

Hi,

I have a question regarding the memory consumption per process in PostgreSQL 9.2

Does each PostgreSQL process allocating in its own memory (Not shared memory) a cache of all the database catalog which it access during the SQL execution?
I mean does each process holds all the catalog indexes data which it accessed, all the catalog index statistics etc' accessed

If yes is there a way to avoid this behavior?

(I asked Josh Berkus from PGExperts and he said that each process holds memory for sorts, hashes, temp tables, vaccum, etc')

Thanks,
Lior

#2Atri Sharma
atri.jiit@gmail.com
In reply to: Ben Zeev, Lior (#1)
Re: PostgreSQL Process memory architecture

Does each PostgreSQL process allocating in its own memory (Not shared
memory) a cache of all the database catalog which it access during the SQL
execution?

I mean does each process holds all the catalog indexes data which it
accessed, all the catalog index statistics etc’ accessed

AFAIK, the shared disk buffers are the only part shared between the processes.

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Atri Sharma
atri.jiit@gmail.com
In reply to: Ben Zeev, Lior (#1)
Re: PostgreSQL Process memory architecture

On Mon, May 27, 2013 at 3:41 PM, Ben Zeev, Lior <lior.ben-zeev@hp.com> wrote:

Hi Atri,

Thanks for your answer!
Do you have idea what may be the reason that PostgreSQL process consume more memory when there are more partial indexes on the DB table?

Well, I am not too sure, but indexes always take up more space, so if
your backend has a lot of indexes, it will cause the process to
consume more memory.

Indexes should be used with care, as too many indexes can cause a
memory overhead,which can cause performance degradations.

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Atri Sharma (#3)
Re: PostgreSQL Process memory architecture

Thanks Atri!

Do you know why PostgreSQL store the indexes in memory per process and not in the shared memory?
Is there a way to prevent it store the indexes data per process, and force it storing it in the shared memory?

Lior

-----Original Message-----
From: Atri Sharma [mailto:atri.jiit@gmail.com]
Sent: Monday, May 27, 2013 13:19
To: Ben Zeev, Lior; Pg Hackers
Subject: Re: [HACKERS] PostgreSQL Process memory architecture

On Mon, May 27, 2013 at 3:41 PM, Ben Zeev, Lior <lior.ben-zeev@hp.com> wrote:

Hi Atri,

Thanks for your answer!
Do you have idea what may be the reason that PostgreSQL process consume more memory when there are more partial indexes on the DB table?

Well, I am not too sure, but indexes always take up more space, so if your backend has a lot of indexes, it will cause the process to consume more memory.

Indexes should be used with care, as too many indexes can cause a memory overhead,which can cause performance degradations.

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Atri Sharma
atri.jiit@gmail.com
In reply to: Ben Zeev, Lior (#4)
Re: PostgreSQL Process memory architecture

On Mon, May 27, 2013 at 3:55 PM, Ben Zeev, Lior <lior.ben-zeev@hp.com> wrote:

Thanks Atri!

Do you know why PostgreSQL store the indexes in memory per process and not in the shared memory?
Is there a way to prevent it store the indexes data per process, and force it storing it in the shared memory?

Ok, sorry for a bit of a confusion here.

I am assuming that the multiple processes are accessing the same database.

What happens essentially is that each index is stored as a separate
file in the data directory of the database in the directory of the
cluster in which your database belongs.

So,indexes are essentially stored the same way as tables, in form of
files which are accessed in 8K blocks.

If your index is big/you have too many indexes in your database, it
should affect *all* backends accessing that specific database.

So,my point is that,there is no question of indexes being stored in
shared memory or individually. You can treat indexes the same as your
tables,from the point of view of physical storage.

For more details,you can see

http://www.postgresql.org/docs/8.1/static/storage.html

Regards,

Atri
--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ben Zeev, Lior (#4)
Re: PostgreSQL Process memory architecture

On Mon, May 27, 2013 at 7:25 PM, Ben Zeev, Lior <lior.ben-zeev@hp.com> wrote:

Thanks Atri!

Do you know why PostgreSQL store the indexes in memory per process and not in the shared memory?
Is there a way to prevent it store the indexes data per process, and force it storing it in the shared memory?

An index is built in backend process's local memory, but, when
accessing, index pages are stored in shared memory. That is, for
example, when an index scan is performed, index pages are brought into
shared memory and accessed from there.

--
Amit Langote

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Atri Sharma
atri.jiit@gmail.com
In reply to: Amit Langote (#6)
Re: PostgreSQL Process memory architecture

An index is built in backend process's local memory, but, when
accessing, index pages are stored in shared memory. That is, for
example, when an index scan is performed, index pages are brought into
shared memory and accessed from there.

Yes, brought into the shared disk buffers and read,just like tables are read.

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Stephen Frost
sfrost@snowman.net
In reply to: Ben Zeev, Lior (#1)
Re: PostgreSQL Process memory architecture

Lior,

* Ben Zeev, Lior (lior.ben-zeev@hp.com) wrote:

Does each PostgreSQL process allocating in its own memory (Not shared memory) a cache of all the database catalog which it access during the SQL execution?

PG will look up and cache the catalog information regarding all of the
relations involved, yes. In *accessing* those relations, PG will pull
needed blocks into shared buffers. PG will use backend-local memory to
process through the data (generally on a per-tuple basis).

I mean does each process holds all the catalog indexes data which it accessed, all the catalog index statistics etc' accessed

Catalog information (eg: information in pg_class) is kept, but the
*data* will only be pulled through shared buffers and then processed.
Anything in shared buffers (eg: the data in the tables or indexes) will
be cleaned up as new blocks are needed which push out old ones.

If yes is there a way to avoid this behavior?

Catalog information is only cached- if the information isn't being used
then it should get purged out in favor of new data which is needed. Can
you explain a bit more exactly what the issue is..?

(I asked Josh Berkus from PGExperts and he said that each process holds memory for sorts, hashes, temp tables, vaccum, etc')

Correct, most backend local usage of memory is for running queries and
doing what is required in those queries. Regarding temp tables, you can
control how much memory is used for those with the temp_buffers
parameter.

Thanks,

Stephen

#9Stephen Frost
sfrost@snowman.net
In reply to: Atri Sharma (#2)
Re: PostgreSQL Process memory architecture

* Atri Sharma (atri.jiit@gmail.com) wrote:

Does each PostgreSQL process allocating in its own memory (Not shared
memory) a cache of all the database catalog which it access during the SQL
execution?

This information is pulled into a backend-local cache, but it should
only be cached while it's needed and then purged out to allow for new
data coming in. It would be great if we could understand what the issue
is that you're seeing.

I mean does each process holds all the catalog indexes data which it
accessed, all the catalog index statistics etc’ accessed

Each backend shouldn't try to hold all the data, if there is pressure
for that memory.

AFAIK, the shared disk buffers are the only part shared between the processes.

There's a bit of other information shared, but disk buffers are
certainly the bulk of it.

Thanks,

Stephen

#10Stephen Frost
sfrost@snowman.net
In reply to: Atri Sharma (#3)
Re: PostgreSQL Process memory architecture

* Atri Sharma (atri.jiit@gmail.com) wrote:

On Mon, May 27, 2013 at 3:41 PM, Ben Zeev, Lior <lior.ben-zeev@hp.com> wrote:

Do you have idea what may be the reason that PostgreSQL process consume more memory when there are more partial indexes on the DB table?

It might use a bit more, but it shouldn't be excessive.. What, exactly,
are you seeing and would it be possible for you to provide a repeatable
test case with a small-ish set of data?

Well, I am not too sure, but indexes always take up more space, so if
your backend has a lot of indexes, it will cause the process to
consume more memory.

Indexes require additional disk space, certainly. Having a lot of
indexes, by itself, shouldn't seriously increase memory usage.

Indexes should be used with care, as too many indexes can cause a
memory overhead,which can cause performance degradations.

This is not generally a reason to avoid indexes. Indexes require more
disk space and must be kept up to date, making them expensive to
maintain due to increased disk i/o. Building an index uses as much
memory as it's allowed to- it uses maintenance_work_mem to limit itself.

Thanks,

Stephen

#11Atri Sharma
atri.jiit@gmail.com
In reply to: Stephen Frost (#9)
Re: PostgreSQL Process memory architecture

AFAIK, the shared disk buffers are the only part shared between the processes.

There's a bit of other information shared, but disk buffers are
certainly the bulk of it.

The other information being locks?

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Stephen Frost
sfrost@snowman.net
In reply to: Ben Zeev, Lior (#4)
Re: PostgreSQL Process memory architecture

Lior,

* Ben Zeev, Lior (lior.ben-zeev@hp.com) wrote:

Do you know why PostgreSQL store the indexes in memory per process and not in the shared memory?

The disk blocks from an index are not stored per-process, they are kept
in shared memory. When building an index, PG can only use one process
and so there isn't any point having that be in shared memory.

Thanks,

Stephen

#13Atri Sharma
atri.jiit@gmail.com
In reply to: Stephen Frost (#10)
Re: PostgreSQL Process memory architecture

This is not generally a reason to avoid indexes. Indexes require more
disk space and must be kept up to date, making them expensive to
maintain due to increased disk i/o. Building an index uses as much
memory as it's allowed to- it uses maintenance_work_mem to limit itself.

Yes, too many indexes wont hurt much.BTW,wont making too many indexes
on columns that probably dont have as many values as to deserve
them(so,essentially,indiscriminately making indexes) hurt the
performance/memory usage?

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Stephen Frost
sfrost@snowman.net
In reply to: Atri Sharma (#5)
Re: PostgreSQL Process memory architecture

* Atri Sharma (atri.jiit@gmail.com) wrote:

If your index is big/you have too many indexes in your database, it
should affect *all* backends accessing that specific database.

More indexes will require more disk space, certainly, but tablespaces
can be used to seperate databases, or tables, or indexes on to different
partitions on the host server.

So,my point is that,there is no question of indexes being stored in
shared memory or individually. You can treat indexes the same as your
tables,from the point of view of physical storage.

Correct.

For more details,you can see

http://www.postgresql.org/docs/8.1/static/storage.html

A better place to look would be the documentation for the release of PG
which you are on, or the latest release otherwise, which is:

http://www.postgresql.org/docs/9.2/static/storage.html

Thanks,

Stephen

#15Atri Sharma
atri.jiit@gmail.com
In reply to: Stephen Frost (#14)
Re: PostgreSQL Process memory architecture

A better place to look would be the documentation for the release of PG
which you are on, or the latest release otherwise, which is:

http://www.postgresql.org/docs/9.2/static/storage.html

Oops,yes,sorry about that.

Thanks a ton for pointing that out.

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Atri Sharma (#11)
Re: PostgreSQL Process memory architecture

On Mon, May 27, 2013 at 9:16 PM, Atri Sharma <atri.jiit@gmail.com> wrote:

AFAIK, the shared disk buffers are the only part shared between the processes.

There's a bit of other information shared, but disk buffers are
certainly the bulk of it.

The other information being locks?

CreateSharedMemoryAndSemaphores() (src/backend/storage/ipc/ipci.c)
seems to be the place where we can see what all things reside in
shared memory, since at the beginning of the function, you can see
size being computed for shared memory to hold all the things that need
to be in shared memory.

--
Amit Langote

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Stephen Frost
sfrost@snowman.net
In reply to: Atri Sharma (#11)
Re: PostgreSQL Process memory architecture

* Atri Sharma (atri.jiit@gmail.com) wrote:

There's a bit of other information shared, but disk buffers are
certainly the bulk of it.

The other information being locks?

Depends, but yes. Per-row locks are actually in the disk cache portion
of shared buffers, but heavyweight locks have their own area.

Thanks,

Stephen

#18Stephen Frost
sfrost@snowman.net
In reply to: Atri Sharma (#13)
Re: PostgreSQL Process memory architecture

* Atri Sharma (atri.jiit@gmail.com) wrote:

Yes, too many indexes wont hurt much.BTW,wont making too many indexes
on columns that probably dont have as many values as to deserve
them(so,essentially,indiscriminately making indexes) hurt the
performance/memory usage?

I'd expect the performance issue would be from planner time more than
memory usage- but if there is a serious memory usage issue here, then
it'd be valuable to have a test case showing what's happening. We may
not be releasing the sys cache in some cases or otherwise have a bug in
this area.

Thanks,

Stephen

#19Atri Sharma
atri.jiit@gmail.com
In reply to: Stephen Frost (#18)
Re: PostgreSQL Process memory architecture

I'd expect the performance issue would be from planner time more than
memory usage- but if there is a serious memory usage issue here, then
it'd be valuable to have a test case showing what's happening. We may
not be releasing the sys cache in some cases or otherwise have a bug in
this area.

Right, this does sound interesting. Thanks a ton!

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Stephen Frost (#10)
Re: PostgreSQL Process memory architecture

Hi Stephen,

The case which I'm seeing is that I have an empty table without any rows,
Create table test (
Num Integer,
C1 character varying(512),
C2 character varying(512),
C3 character varying(512));

I create several partial indexes on this table:

Create index(index_1_c1) on test(c1) where Num=1;
Create index(index_2_c1) on test(c1) where Num=2;
Create index(index_1_c2) on test(c1) where Num=1;
Create index(index_2_c2) on test(c1) where Num=2;
...

This doesn't consume much memory on the PostgreSQL backend process,
But if I create 500 indexes It consume several MB of memory.

If I have 10 tables with 500 indexes each PostgreSql backend process consume 20MB,
If I have 100 tables with 500 indexes each PostgreSQL backend process consume 200MB

All tables are empty without data.

If have Connection pool of 100 connections then All this processes consume 100*200MB = 20GB of memory

What is the reason to consume so much memory for empty indexes?

Thanks,
Lior

-----Original Message-----
From: Stephen Frost [mailto:sfrost@snowman.net]
Sent: Monday, May 27, 2013 15:16
To: Atri Sharma
Cc: Ben Zeev, Lior; Pg Hackers
Subject: Re: [HACKERS] PostgreSQL Process memory architecture

* Atri Sharma (atri.jiit@gmail.com) wrote:

On Mon, May 27, 2013 at 3:41 PM, Ben Zeev, Lior <lior.ben-zeev@hp.com> wrote:

Do you have idea what may be the reason that PostgreSQL process consume more memory when there are more partial indexes on the DB table?

It might use a bit more, but it shouldn't be excessive.. What, exactly, are you seeing and would it be possible for you to provide a repeatable test case with a small-ish set of data?

Well, I am not too sure, but indexes always take up more space, so if
your backend has a lot of indexes, it will cause the process to
consume more memory.

Indexes require additional disk space, certainly. Having a lot of indexes, by itself, shouldn't seriously increase memory usage.

Indexes should be used with care, as too many indexes can cause a
memory overhead,which can cause performance degradations.

This is not generally a reason to avoid indexes. Indexes require more disk space and must be kept up to date, making them expensive to maintain due to increased disk i/o. Building an index uses as much memory as it's allowed to- it uses maintenance_work_mem to limit itself.

Thanks,

Stephen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Atri Sharma
atri.jiit@gmail.com
In reply to: Ben Zeev, Lior (#20)
#22Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Atri Sharma (#21)
#23Stephen Frost
sfrost@snowman.net
In reply to: Ben Zeev, Lior (#20)
#24Stephen Frost
sfrost@snowman.net
In reply to: Atri Sharma (#21)
#25Hannu Krosing
hannu@tm.ee
In reply to: Ben Zeev, Lior (#4)
#26Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Stephen Frost (#23)
#27Stephen Frost
sfrost@snowman.net
In reply to: Ben Zeev, Lior (#26)
#28Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Stephen Frost (#27)
#29Stephen Frost
sfrost@snowman.net
In reply to: Ben Zeev, Lior (#26)
#30Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Stephen Frost (#29)
#31Stephen Frost
sfrost@snowman.net
In reply to: Ben Zeev, Lior (#28)
#32Atri Sharma
atri.jiit@gmail.com
In reply to: Stephen Frost (#29)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Atri Sharma (#32)
#34Merlin Moncure
mmoncure@gmail.com
In reply to: Stephen Frost (#18)
#35Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Stephen Frost (#31)
#36Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Atri Sharma (#32)
#37Merlin Moncure
mmoncure@gmail.com
In reply to: Ben Zeev, Lior (#36)
#38Ben Zeev, Lior
lior.ben-zeev@hp.com
In reply to: Merlin Moncure (#37)