PostgreSQL (9.3 and 9.6) eats all memory when using many tables

Started by hubert depesz lubaczewskialmost 10 years ago13 messagesbugs
Jump to latest

Hi,
so, we are running 9.3.10 in production, but I tested it in 9.6, and the
problem seems to be there too in 9.6, though to much lesser extent.

In our database we have ~ 70000 tables (~ 180 tables in ~ 400 schemas).

So far we used application in such a way that each connection could use
only tables from single schema.

But then, we switched situation to where single connection (very long,
as it's reused thanks to pgbouncer) can effectively query tables from
all schemas.

And this caused memory usage to explode, to the point that our server
wasn't able to handle it (64gb of mem gone).

I checked logs, and other stuff, and came with synthetic test, using
perl:

=======================================================
#!/usr/bin/env perl

use strict;
use DBI;
use Data::Dumper;

my $dbh = DBI->connect("dbi:Pg:dbname=depesz");
my $backend_pid = $dbh->selectall_arrayref("select pg_backend_pid()")->[0]->[0];
system("ps nh uww -p $backend_pid");

my $t = [ map { $_->[0] } @{ $dbh->selectall_arrayref("SELECT oid::regclass from pg_class where relkind = 'r'") } ];
my $len = scalar @{ $t };

my $i = 0;
while (1) {
my $use_table = $t->[ $i % $len ];
my $limit = 1 + ( $i % 3 );

$dbh->prepare("select ${i}::int4 as a, ?::int4 as a, * from $use_table limit $limit")->execute( $i );
$i++;
if ( 0 == $i % 1000) {
print "$i:";
system("ps nh uww -p $backend_pid");
}
}
=======================================================

This effectively does:
select 1, 1, * from <table> limit <1..3>
for each table.
on 9.3 after ~ 35000 tables, pg process grew by 1GB, and it wasn't shared
buffers, as smaps showed that the memory was anonymous.

In 9.6, after 35000 tables it grew by ~ 71MB. Which is much better, but still not really optimal.

The same situation happens when I was *not* using prepared statements on server side.

Basically it looks that postgresql "caches" query plans? parsed elements? for
queries, but doesn't put any kind of limit to size of this cache. Which means
that if our app is using LOTS of different queries, the memory usage will grow
in time.

This, plus the fact that this cache is not shared, means that with non-trivial
numbers of tables, and not-trivial numbers of backends, it will use all of
server memory as soon as enough different queries/tables will get used.

What can we do about it, aside from having less tables and moving to
newer Pg?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Jeff Janes
jeff.janes@gmail.com
In reply to: hubert depesz lubaczewski (#1)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Thu, Jun 9, 2016 at 8:46 AM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:

Hi,
so, we are running 9.3.10 in production, but I tested it in 9.6, and the
problem seems to be there too in 9.6, though to much lesser extent.

In our database we have ~ 70000 tables (~ 180 tables in ~ 400 schemas).

So far we used application in such a way that each connection could use
only tables from single schema.

But then, we switched situation to where single connection (very long,
as it's reused thanks to pgbouncer) can effectively query tables from
all schemas.

And this caused memory usage to explode, to the point that our server
wasn't able to handle it (64gb of mem gone).

You should probably use pgbouncer's server_lifetime to force
connections to be discarded and recreated every now and then. That
parameter seems to exist specifically for dealing with this kind of
problem.

....

This effectively does:
select 1, 1, * from <table> limit <1..3>
for each table.

Is all of that necessary? Can't you reproduce the problem just as
well with just "select count(*) from <table>;" ?

on 9.3 after ~ 35000 tables, pg process grew by 1GB, and it wasn't shared
buffers, as smaps showed that the memory was anonymous.

In 9.6, after 35000 tables it grew by ~ 71MB. Which is much better, but still not really optimal.

The same situation happens when I was *not* using prepared statements on server side.

Basically it looks that postgresql "caches" query plans? parsed elements?

It is caching metadata for every table and index touched by the backend.

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Jeff Janes (#2)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Sun, Jun 12, 2016 at 01:50:48PM -0700, Jeff Janes wrote:

You should probably use pgbouncer's server_lifetime to force
connections to be discarded and recreated every now and then. That
parameter seems to exist specifically for dealing with this kind of
problem.

While I know I can deal with it with server_lifetime, I still think it's
a problem in Pg - the amount of memory used for this cache should be
limitable/configurable.

Is all of that necessary? Can't you reproduce the problem just as
well with just "select count(*) from <table>;" ?

No.

I'd rather not count(*) as some of these tables are large'ish, but
I did:
select * from table limit 1
And the results are:

1 | =$ ./bad.pl
2 | 109 24536 0.0 0.0 6821072 6312 ? Ss 09:32 0:00 postgres: depesz-rw dbname 127.0.0.1(45788) idle
3 | 74002 tables; press enter to continue:
4 |
5 | 1000: (87524 kB anon) 109 24536 65.5 0.8 6905176 530844 ? Ss 09:32 0:01 postgres: depesz-rw dbname 127.0.0.1(45788) idle
6 | 2000: (115648 kB anon) 109 24536 90.0 1.0 6937684 667540 ? Ss 09:32 0:01 postgres: depesz-rw dbname 127.0.0.1(45788) idle
7 | 3000: (143460 kB anon) 109 24536 77.3 1.2 6962660 794880 ? Ss 09:32 0:02 postgres: depesz-rw dbname 127.0.0.1(45788) idle
8 | 4000: (170640 kB anon) 109 24536 94.3 1.4 6995196 905052 ? Ss 09:32 0:02 postgres: depesz-rw dbname 127.0.0.1(45788) idle
9 | 5000: (199388 kB anon) 109 24536 58.1 1.6 7020896 1028180 ? Ss 09:32 0:03 postgres: depesz-rw dbname 127.0.0.1(45788) idle
10 | 6000: (226852 kB anon) 109 24536 47.4 1.8 7045140 1159604 ? Ss 09:32 0:04 postgres: depesz-rw dbname 127.0.0.1(45788) idle
11 | 7000: (254836 kB anon) 109 24536 38.9 2.0 7076732 1300960 ? Ss 09:32 0:05 postgres: depesz-rw dbname 127.0.0.1(45788) idle
12 | 8000: (286072 kB anon) 109 24536 37.5 2.2 7103824 1435416 ? Ss 09:32 0:06 postgres: depesz-rw dbname 127.0.0.1(45788) idle
13 | 9000: (312956 kB anon) 109 24536 34.1 2.4 7139348 1545560 ? Ss 09:32 0:06 postgres: depesz-rw dbname 127.0.0.1(45788) idle
14 | 10000: (339100 kB anon) 109 24536 33.0 2.6 7162500 1646176 ? Ss 09:32 0:07 postgres: depesz-rw dbname 127.0.0.1(45788) idle
15 | 11000: (365104 kB anon) 109 24536 32.2 2.7 7185596 1742468 ? Ss 09:32 0:08 postgres: depesz-rw dbname 127.0.0.1(45788) idle
16 | 12000: (391628 kB anon) 109 24536 31.6 2.9 7218820 1838912 ? Ss 09:32 0:09 postgres: depesz-rw dbname 127.0.0.1(45788) idle
17 | 13000: (424096 kB anon) 109 24536 31.4 3.1 7251908 1959756 ? Ss 09:32 0:10 postgres: depesz-rw dbname 127.0.0.1(45788) idle
18 | 14000: (458424 kB anon) 109 24536 30.6 3.3 7277756 2083952 ? Ss 09:32 0:11 postgres: depesz-rw dbname 127.0.0.1(45788) idle

Line #2 shows output of ps nh uww -p <backend_pid> before start of work.
There are, in total, 74002 tables, and then I iterate over list of them,
and for each, I do the select I mentioned.

Every 1000 tables, I get stats - ps output, and (in parent) sum of
"Anonymous:" lines from /proc/<backend_pid>/smaps.

As you can see - we're getting ~ 32kB of cache per table.

While I do appreciate caching of metadata, it is causing serious
problems, which we will alleviate with server_lifetime, but I would much
prefer a setting like:

internal_cache_limit = 256MB

or something similar.

Best regards,

depesz

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Peter Eisentraut
peter_e@gmx.net
In reply to: hubert depesz lubaczewski (#1)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On 6/9/16 11:46 AM, hubert depesz lubaczewski wrote:

Basically it looks that postgresql "caches" query plans? parsed elements? for
queries, but doesn't put any kind of limit to size of this cache. Which means
that if our app is using LOTS of different queries, the memory usage will grow
in time.

If you compile with -DSHOW_MEMORY_STATS, it will print out memory
allocation after every command, so you might be able to see where the
memory is going.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Peter Eisentraut (#4)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Mon, Jun 13, 2016 at 08:56:56AM -0400, Peter Eisentraut wrote:

On 6/9/16 11:46 AM, hubert depesz lubaczewski wrote:

Basically it looks that postgresql "caches" query plans? parsed elements? for
queries, but doesn't put any kind of limit to size of this cache. Which means
that if our app is using LOTS of different queries, the memory usage will grow
in time.

If you compile with -DSHOW_MEMORY_STATS, it will print out memory allocation
after every command, so you might be able to see where the memory is going.

Compiled, am running it now, but it is much slower now. And the output
is huge.

After ~5k queries, it looks like:
TopMemoryContext: 5892000 total in 701 blocks; 18344 free (68 chunks); 5873656 used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
MessageContext: 32768 total in 3 blocks; 6216 free (5 chunks); 26552 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 4186112 total in 9 blocks; 1374144 free (31 chunks); 2811968 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
Relcache by OID: 2088960 total in 8 blocks; 1008496 free (16 chunks); 1080464 used
CacheMemoryContext: 182443144 total in 422 blocks; 2623776 free (1 chunks); 179819368 used
MdSmgr: 1040384 total in 7 blocks; 192512 free (0 chunks); 847872 used
ident parser context: 3072 total in 2 blocks; 1416 free (1 chunks); 1656 used
hba parser context: 130048 total in 7 blocks; 42496 free (2 chunks); 87552 used
LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (6 chunks); 32 used

I'll try to get to the end of the test (~70k tables), and then show the
same info, plus some statistics about "CacheMemoryContext" if it helps
(I'd rather not show *all* of it, though :)

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: hubert depesz lubaczewski (#3)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

hubert depesz lubaczewski <depesz@depesz.com> writes:

While I do appreciate caching of metadata, it is causing serious
problems, which we will alleviate with server_lifetime, but I would much
prefer a setting like:
internal_cache_limit = 256MB

Be careful what you ask for, you might get it.

There used to be exactly such a limit in the catcache logic, which we
ripped out because it caused far more performance problems than it fixed.
See
/messages/by-id/5141.1150327541@sss.pgh.pa.us

While we have no direct experience with limiting the plancache size,
I'd expect a pretty similar issue there: a limit will either do nothing
except impose substantial bookkeeping overhead (if it's more than the
number of plans in your working set) or it will result in a performance
disaster from cache thrashing (if it's less). You can only avoid falling
off the performance cliff if your workload has *very* strong locality of
reference, and that tends not to be the case.

Another problem, if the bloat is being driven by explicit PREPARE commands
as you illustrate here, is that the backend doesn't get to silently
discard prepared statements. We could reduce the amount of memory per
prepared statement by dropping the plan tree, but we'd still have to keep
the parse tree, so there's still bloat.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#6)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Mon, Jun 13, 2016 at 09:36:32AM -0400, Tom Lane wrote:

Another problem, if the bloat is being driven by explicit PREPARE commands
as you illustrate here, is that the backend doesn't get to silently

I did the same test with
$dbh->{pg_server_prepare} = 0;
(in pg logs, it changes logged lines from:
prepare <unnamed>:...
bind <unnamed>:...
execute <unnamed>:...
into:
statement:...

afterwards memory problems seemed to be the same.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#6)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Mon, Jun 13, 2016 at 6:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Another problem, if the bloat is being driven by explicit PREPARE commands
as you illustrate here, is that the backend doesn't get to silently
discard prepared statements.

In the perl script he showed, the handle returned by $dbh->prepare is
not stored anywhere, so it should automatically get garbage collected
at the end of the Perl statement. The garbage collection should
trigger a destructor to send a "DEALLOCATE".

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: hubert depesz lubaczewski (#5)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Mon, Jun 13, 2016 at 03:22:17PM +0200, hubert depesz lubaczewski wrote:

On Mon, Jun 13, 2016 at 08:56:56AM -0400, Peter Eisentraut wrote:

On 6/9/16 11:46 AM, hubert depesz lubaczewski wrote:

Basically it looks that postgresql "caches" query plans? parsed elements? for
queries, but doesn't put any kind of limit to size of this cache. Which means
that if our app is using LOTS of different queries, the memory usage will grow
in time.

If you compile with -DSHOW_MEMORY_STATS, it will print out memory allocation
after every command, so you might be able to see where the memory is going.

Compiled, am running it now, but it is much slower now. And the output
is huge.

After ~5k queries, it looks like:
TopMemoryContext: 5892000 total in 701 blocks; 18344 free (68 chunks); 5873656 used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
MessageContext: 32768 total in 3 blocks; 6216 free (5 chunks); 26552 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 4186112 total in 9 blocks; 1374144 free (31 chunks); 2811968 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
Relcache by OID: 2088960 total in 8 blocks; 1008496 free (16 chunks); 1080464 used
CacheMemoryContext: 182443144 total in 422 blocks; 2623776 free (1 chunks); 179819368 used
MdSmgr: 1040384 total in 7 blocks; 192512 free (0 chunks); 847872 used
ident parser context: 3072 total in 2 blocks; 1416 free (1 chunks); 1656 used
hba parser context: 130048 total in 7 blocks; 42496 free (2 chunks); 87552 used
LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (6 chunks); 32 used

I'll try to get to the end of the test (~70k tables), and then show the
same info, plus some statistics about "CacheMemoryContext" if it helps
(I'd rather not show *all* of it, though :)

OK. I can't really wait to get it all done.

Ran it for ~ 44000 queries.
Sum on anonymous memory from smaps is 1337912 kB.

Main information looks like:

TopMemoryContext: 28073888 total in 3408 blocks; 18264 free (71 chunks); 28055624 used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
MessageContext: 65536 total in 4 blocks; 34464 free (5 chunks); 31072 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 16769024 total in 11 blocks; 2725888 free (38 chunks); 14043136 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 8160 free (1 chunks); 32 used
Relcache by OID: 8380416 total in 10 blocks; 2996848 free (21 chunks); 5383568 used
CacheMemoryContext: 734127048 total in 2791 blocks; 5143808 free (7 chunks); 728983240 used
<101491 lines removed>
MdSmgr: 8380416 total in 10 blocks; 4141120 free (0 chunks); 4239296 used
ident parser context: 3072 total in 2 blocks; 1416 free (1 chunks); 1656 used
hba parser context: 130048 total in 7 blocks; 42496 free (2 chunks); 87552 used
LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (5 chunks); 32 used

The 101491 lines that I removed were looking like:
index_context_module_progressions_on_context_module_id: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
removing index/table name, and making a summary, I got:
34859 1024 total in 1 blocks; 152 free (0 chunks); 872 used
31877 1024 total in 1 blocks; 200 free (0 chunks); 824 used
10294 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
5791 1024 total in 1 blocks; 64 free (0 chunks); 960 used
3221 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
3165 3072 total in 2 blocks; 2008 free (2 chunks); 1064 used
3146 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
2240 3072 total in 2 blocks; 1960 free (1 chunks); 1112 used
1791 3072 total in 2 blocks; 1872 free (0 chunks); 1200 used
934 3072 total in 2 blocks; 1408 free (1 chunks); 1664 used
793 3072 total in 2 blocks; 1824 free (0 chunks); 1248 used
304 3072 total in 2 blocks; 1488 free (1 chunks); 1584 used
224 3072 total in 2 blocks; 1528 free (1 chunks); 1544 used
218 3072 total in 2 blocks; 1488 free (0 chunks); 1584 used
217 3072 total in 2 blocks; 1536 free (0 chunks); 1536 used
188 3072 total in 2 blocks; 736 free (1 chunks); 2336 used
164 1024 total in 1 blocks; 40 free (0 chunks); 984 used
158 3072 total in 2 blocks; 1216 free (0 chunks); 1856 used
156 3072 total in 2 blocks; 1328 free (0 chunks); 1744 used
128 3072 total in 2 blocks; 1576 free (1 chunks); 1496 used
121 3072 total in 2 blocks; 1648 free (1 chunks); 1424 used
121 3072 total in 2 blocks; 1296 free (0 chunks); 1776 used
118 7168 total in 3 blocks; 3064 free (1 chunks); 4104 used
115 3072 total in 2 blocks; 1632 free (0 chunks); 1440 used
115 3072 total in 2 blocks; 1336 free (1 chunks); 1736 used
110 3072 total in 2 blocks; 1792 free (0 chunks); 1280 used
108 3072 total in 2 blocks; 928 free (0 chunks); 2144 used
107 3072 total in 2 blocks; 792 free (1 chunks); 2280 used
105 3072 total in 2 blocks; 1184 free (1 chunks); 1888 used
94 3072 total in 2 blocks; 1192 free (1 chunks); 1880 used
91 3072 total in 2 blocks; 1032 free (1 chunks); 2040 used
79 3072 total in 2 blocks; 1736 free (0 chunks); 1336 used
64 3072 total in 2 blocks; 1080 free (1 chunks); 1992 used
49 3072 total in 2 blocks; 1440 free (1 chunks); 1632 used
48 3072 total in 2 blocks; 2008 free (1 chunks); 1064 used
42 3072 total in 2 blocks; 1240 free (1 chunks); 1832 used
39 3072 total in 2 blocks; 1784 free (0 chunks); 1288 used
38 3072 total in 2 blocks; 1136 free (1 chunks); 1936 used
31 3072 total in 2 blocks; 1264 free (0 chunks); 1808 used
8 3072 total in 2 blocks; 784 free (1 chunks); 2288 used
8 3072 total in 2 blocks; 1696 free (1 chunks); 1376 used
4 3072 total in 2 blocks; 1744 free (0 chunks); 1328 used
3 3072 total in 2 blocks; 1680 free (0 chunks); 1392 used
3 3072 total in 2 blocks; 1384 free (1 chunks); 1688 used
2 3072 total in 2 blocks; 1376 free (0 chunks); 1696 used

First number is how many lines end with given information.

Does it help in any way?

depesz

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Jeff Janes
jeff.janes@gmail.com
In reply to: hubert depesz lubaczewski (#3)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Mon, Jun 13, 2016 at 2:39 AM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:

On Sun, Jun 12, 2016 at 01:50:48PM -0700, Jeff Janes wrote:

Line #2 shows output of ps nh uww -p <backend_pid> before start of work.
There are, in total, 74002 tables, and then I iterate over list of them,
and for each, I do the select I mentioned.

Every 1000 tables, I get stats - ps output, and (in parent) sum of
"Anonymous:" lines from /proc/<backend_pid>/smaps.

As you can see - we're getting ~ 32kB of cache per table.

What hardware and OS are you using? I only get a bit over 8kB per
table, and that amount doesn't change much between 9.3 and 9.6.

Can you share a representative table definition, including constraints?

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Jeff Janes (#10)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Mon, Jun 13, 2016 at 09:27:40AM -0700, Jeff Janes wrote:

On Mon, Jun 13, 2016 at 2:39 AM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:

On Sun, Jun 12, 2016 at 01:50:48PM -0700, Jeff Janes wrote:

Line #2 shows output of ps nh uww -p <backend_pid> before start of work.
There are, in total, 74002 tables, and then I iterate over list of them,
and for each, I do the select I mentioned.

Every 1000 tables, I get stats - ps output, and (in parent) sum of
"Anonymous:" lines from /proc/<backend_pid>/smaps.

As you can see - we're getting ~ 32kB of cache per table.

What hardware and OS are you using? I only get a bit over 8kB per
table, and that amount doesn't change much between 9.3 and 9.6.

These are 64bit aws virtual boxes (ec2, not rds) using ubuntu trusty.
Pg is from ubuntu repo.

Can you share a representative table definition, including constraints?

There are ~ 180 different tables, each looking different, and they are
copied (schema, not data) across ~ 400 schemas.

some tables have 1 index, some have up to 17.

The 17 index table looks like this (sorry, had to redact it heavily):

Column | Type | Modifiers
--------------------------------------+-----------------------------+---------------------------------------------------------------------------
id | bigint | not null default nextval('...................................'::regclass)
.... | character varying(255) |
.......... | bigint | not null
...................... | character varying(255) |
.............. | character varying(255) | not null
.... | character varying(255) |
........ | timestamp without time zone |
........... | timestamp without time zone |
................... | bigint |
......... | boolean |
........................ | boolean |
.......... | timestamp without time zone |
.......... | timestamp without time zone |
............................ | boolean |
............. | text |
............................... | boolean | default false
.......................... | character varying(255) |
....... | bigint |
.............................. | boolean | default true
........... | character varying(255) |
............ | character varying(255) | default '....'::character varying
.................. | bigint |
............... | bigint | not null
.................. | bigint | not null
............. | character varying(255) |
............ | bigint |
............... | boolean |
............. | bigint |
................. | text |
................... | boolean |
................. | text |
............... | boolean |
....... | character varying(255) |
....... | boolean |
.................................... | boolean |
.................. | bigint |
...... | character varying(255) |
........ | text |
..................... | bigint |
................ | text |
.................. | text |
.................... | character varying(255) |
..................... | integer |
.............. | character varying(255) |
......... | character varying(255) |
.............. | character varying(255) |
........... | bigint |

and on this there are 17 indexes, 7 fkeys, and it is being referenced by 15 other tables.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#6)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

On Mon, Jun 13, 2016 at 6:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

hubert depesz lubaczewski <depesz@depesz.com> writes:

While I do appreciate caching of metadata, it is causing serious
problems, which we will alleviate with server_lifetime, but I would much
prefer a setting like:
internal_cache_limit = 256MB

Be careful what you ask for, you might get it.

There used to be exactly such a limit in the catcache logic, which we
ripped out because it caused far more performance problems than it fixed.
See
/messages/by-id/5141.1150327541@sss.pgh.pa.us

While we have no direct experience with limiting the plancache size,
I'd expect a pretty similar issue there: a limit will either do nothing
except impose substantial bookkeeping overhead (if it's more than the
number of plans in your working set) or it will result in a performance
disaster from cache thrashing (if it's less).

We don't need to keep a LRU list or do a clock sweep or anything. We
could go really simple and just toss the whole thing into /dev/null
when it gets too large, and start over.

The accounting overhead should be about as close to zero as you can get.

There would be no performance hit for people who don't set a limit, or
set a high one which is never exceeded.

For people who do exceed the limit, the performance hit would
certainly be no worse than if they have to gratuitously close and
re-open the connection. And it would be far better than swapping to
death, or incurring the wrath of OOM.

You can only avoid falling
off the performance cliff if your workload has *very* strong locality of
reference, and that tends not to be the case.

If you have a weak locality of reference, than there is a pretty good
chance you aren't getting much help from the cache in the first place.
Periodically tossing it won't cost you much.

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#12)
Re: PostgreSQL (9.3 and 9.6) eats all memory when using many tables

Jeff Janes <jeff.janes@gmail.com> writes:

On Mon, Jun 13, 2016 at 6:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

While we have no direct experience with limiting the plancache size,
I'd expect a pretty similar issue there: a limit will either do nothing
except impose substantial bookkeeping overhead (if it's more than the
number of plans in your working set) or it will result in a performance
disaster from cache thrashing (if it's less).

We don't need to keep a LRU list or do a clock sweep or anything. We
could go really simple and just toss the whole thing into /dev/null
when it gets too large, and start over.

Color me skeptical as heck. To the extent that you do have locality
of reference, this would piss it away.

Also, you can't just flush the plan cache altogether, not for PREPARE'd
statements and not for internally-prepared ones either, because there
are references being held for both of those. You could drop the plan
tree, certainly, but that only goes so far in terms of reducing the
amount of space needed. Dropping more than that risks subtle semantic
changes, and would break API expectations of external PLs too.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs