BUG #16363: Memory increases for each table accessed until connection is closed
The following bug has been logged on the website:
Bug reference: 16363
Logged by: Eduardo Barreto Alenxadre
Email address: sezdocs@gmail.com
PostgreSQL version: 12.2
Operating system: Ubuntu Linux 19.04
Description:
I have a specific use case when I need a lot of similar tables in my
database, in one case, for example, I have more than 700 tables.
Whenever I insert data in one of these tables within the same connection,
the memory used by that connection process will increase and never be freed
(unless I close the connection), because of that I'm getting Out Of Memory
errors in my backend since the connection eventually consumes the entire
memory available in my system.
Looking around I found this issue in StackOverflow which I think is exactly
the issue I'm having:
https://stackoverflow.com/questions/5587830/postgresql-backend-process-high-memory-usage-issue
I'm not sure if this is a memory leak or simply a cache mechanism of
Postgres, if the later, can I control it somehow to force it to free
memory?
just for completeness, this is a table of the memory usage of a single
Postgres connection process running a script that will simply insert new
data randomly in these 700 tables one at a time.
# Elapsed time Real (MB) Virtual (MB)
0.000 26.453 328.066
20.017 421.367 766.164
40.037 889.867 1302.520
60.057 1305.645 1772.336
80.077 1681.555 2243.105
100.093 2001.133 2679.195
120.109 2333.535 3131.918
140.129 2622.051 3518.801
160.141 2887.207 3886.355
180.157 3209.109 4323.109
200.177 3414.656 4606.566
220.197 3414.656 4606.566
240.218 3414.656 4606.566
After the last line, I got OOM error.
PG Bug reporting form <noreply@postgresql.org> writes:
I have a specific use case when I need a lot of similar tables in my
database, in one case, for example, I have more than 700 tables.
Whenever I insert data in one of these tables within the same connection,
the memory used by that connection process will increase and never be freed
(unless I close the connection), because of that I'm getting Out Of Memory
errors in my backend since the connection eventually consumes the entire
memory available in my system.
Yes, Postgres caches some information about every table your session
has accessed, and no you don't have a lot of control over that, and
no it isn't a bug.
Having said that, the amount of memory consumed this way is in the
vicinity of a few tens of KB per table, according to some simple
experiments I just did. I couldn't get a process touching 700 simple
tables to eat more than ~70MB of space. So either you are talking about
tables with enormously complex schemas, or there's something else going
on. (Please note that the SO post you are referencing describes an
attempt to use circa 100000 tables/views in one session, not 700.)
The memory consumption curve you showed looks suspiciously like you have
circa 4GB shared buffers allocated and the reported number is mostly about
how many of those buffers the process has touched so far. This is a
common misunderstanding; "top" and similar tools tend to be really bad
about accounting for shared memory usage in any useful way. In any case,
the process you've shown us stopped growing its space consumption
some time ago, so I wonder where the OOM complaint actually came from.
The short answer is that you're probably barking up the wrong tree,
but there's not enough information here to provide any useful guesses
about which is the right tree. Please see
https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
for some hints about submitting a trouble report that's complete
enough to garner useful responses.
regards, tom lane
Sending the reply again since it seems that it didn't go through the
mailing list
Hello Tom, first of all, thanks for the response!
Sorry to not have added a lot of the needed information, I will try to
rectify that in this message.
So, first is what I'm trying to do, in my backend, I store financial data
(candlesticks and trades) in Postgres + TimescaleDB, the way I do it is
separating each exchange in one database and each market and timeframe in
one table, for example, giving that I have `ExchangeA` with 1 market
EUR/USD and 4 timeframes, I would get these 5 tables:
candles_eur_usd_one_minute
candles_eur_usd_five_minutes
candles_eur_usd_third_minutes
candles_eur_usd_one_hour
trades_eur_usd
Normally in my backend, I have, per database (exchange) ~750 markets, and
~16 timeframes, so 750 * 16 = 12000 tables.
I noticed that Postgres would continuously increase my memory usage per
connection until I got an OOM error, which, in my backend terms, means
inserting candles and trades, ~2000 rows per minute. For a system with
16GB, it would take like 1 to 3 hours and then I would get the OOM. For my
other system which has 64GB, it would take like 2 days but it would come a
time when Postgres would use everything and crash too.
For versions, I was using Postgres 11 with latest TimescaleDB, since I was
not sure if this was a TimescaleDB issue, I first created a minimal example
that triggered the issue faster (the one I talked in the first e-mail that
uses 700 tables) and then tested without TimescaleDB in Postgres 11 and 12.
I got the same result in both, so removing TimescaleDB out of the equation.
For the operational system, I'm using ubuntu, but I do test it in my Gentoo
machine with the same results, and I also tested it using Postgres in
docker.
For postgresql.conf, I was using first one tunned by the TimescaleDB tuner
software, but then changed to a default one to see if it would make a
change.
My backend is written in Elixir, so I'm using Ecto library to connect with
Postgres, Ecto will create a pool of connections that will be alive as long
as my backend is alive too. To make it easier for me to test the issue, I
limited the number of connections to 1, so I can keep an eye on that
connection process memory usage, etc.
Yes, Postgres caches some information about every table your session
has accessed, and no you don't have a lot of control over that, and
no it isn't a bug.
Having said that, the amount of memory consumed this way is in the
vicinity of a few tens of KB per table, according to some simple
experiments I just did. I couldn't get a process touching 700 simple
tables to eat more than ~70MB of space. So either you are talking about
tables with enormously complex schemas, or there's something else going
on. (Please note that the SO post you are referencing describes an
attempt to use circa 100000 tables/views in one session, not 700.)
Here are the details of one of the tables I'm using to do my test (all of
then are equal, just the name is different):
test_dev=# \d+ trades_eur_usd
Table "public.trades_eur_usd"
Column | Type | Collation | Nullable | Default |
Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
id | integer | | not null | |
plain | |
timestamp | timestamp without time zone | | not null | |
plain | |
amount | numeric | | not null | |
main | |
price | numeric | | not null | |
main | |
Indexes:
"trades_eur_usd_id_desc_index" btree (id DESC)
"trades_eur_usd_timestamp_desc_index" btree ("timestamp" DESC)
Access method: heap
One thing that I forgot to explain better is what I'm doing to touch those
tables. Basically I created a little Elixir backend with Ecto which will
keep 1 connection alive for the pool and start a bulk insertion of rows to
one of the 700 tables, so, basically, the steps are:
1) Choose one of the tables randomly;
2) Insert in one query 13000 rows to that table;
3) Go back to step 1 and repeat.
This will trigger the issue instantly and will make the memory usage of
that connection grow very fast as shown in the last e-mail.
One very important thing to note is that if I change the above algorithm to
do the same thing, but always choose the same table in step 1 (so I will do
the bulk insertion always in 1 table instead of one of the 700), I will not
trigger the issue, memory usage will not go up at all. it only triggers it
if I'm doing the insertion in different tables each time.
The memory consumption curve you showed looks suspiciously like you have
circa 4GB shared buffers allocated and the reported number is mostly about
how many of those buffers the process has touched so far. This is a
common misunderstanding; "top" and similar tools tend to be really bad
about accounting for shared memory usage in any useful way. In any case,
the process you've shown us stopped growing its space consumption
some time ago, so I wonder where the OOM complaint actually came from.
The short answer is that you're probably barking up the wrong tree,
but there's not enough information here to provide any useful guesses
about which is the right tree.
I guess you are right, maybe the issue is with some shared buffers cache or
something, I tried lowering to the minimum value each possible option in
postgresql.conf that I found but still got the same OOM result.
Thanks a lot for the help, and feel free to ask for more information!
Sending the reply again since it seems that it didn't go through the mailing list
BQ_BEGIN
Hello Tom, first of all, thanks for the response!
Sorry to not have added a lot of the needed information, I will try to rectify that in this message.
So, first is what I'm trying to do, in my backend, I store financial data (candlesticks and trades) in Postgres + TimescaleDB, the way I do it is separating each exchange in one database and each market and timeframe in one table, for example, giving that I have `ExchangeA` with 1 market EUR/USD and 4 timeframes, I would get these 5 tables:
candles_eur_usd_one_minute
candles_eur_usd_five_minutes
candles_eur_usd_third_minutes
candles_eur_usd_one_hour
trades_eur_usd
Normally in my backend, I have, per database (exchange) ~750 markets, and ~16 timeframes, so 750 * 16 = 12000 tables.
I noticed that Postgres would continuously increase my memory usage per connection until I got an OOM error, which, in my backend terms, means inserting candles and trades, ~2000 rows per minute. For a system with 16GB, it would take like 1 to 3 hours and then I would get the OOM. For my other system which has 64GB, it would take like 2 days but it would come a time when Postgres would use everything and crash too.
For versions, I was using Postgres 11 with latest TimescaleDB, since I was not sure if this was a TimescaleDB issue, I first created a minimal example that triggered the issue faster (the one I talked in the first e-mail that uses 700 tables) and then tested without TimescaleDB in Postgres 11 and 12. I got the same result in both, so removing TimescaleDB out of the equation.
For the operational system, I'm using ubuntu, but I do test it in my Gentoo machine with the same results, and I also tested it using Postgres in docker.
For postgresql.conf, I was using first one tunned by the TimescaleDB tuner software, but then changed to a default one to see if it would make a change.
My backend is written in Elixir, so I'm using Ecto library to connect with Postgres, Ecto will create a pool of connections that will be alive as long as my backend is alive too. To make it easier for me to test the issue, I limited the number of connections to 1, so I can keep an eye on that connection process memory usage, etc.
Yes, Postgres caches some information about every table your session
has accessed, and no you don't have a lot of control over that, and
no it isn't a bug.
Having said that, the amount of memory consumed this way is in the
vicinity of a few tens of KB per table, according to some simple
experiments I just did. I couldn't get a process touching 700 simple
tables to eat more than ~70MB of space. So either you are talking about
tables with enormously complex schemas, or there's something else going
on. (Please note that the SO post you are referencing describes an
attempt to use circa 100000 tables/views in one session, not 700.)
Here are the details of one of the tables I'm using to do my test (all of then are equal, just the name is different):
test_dev=# \d+ trades_eur_usd
Table "public.trades_eur_usd"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
id | integer | | not null | | plain | |
timestamp | timestamp without time zone | | not null | | plain | |
amount | numeric | | not null | | main | |
price | numeric | | not null | | main | |
Indexes:
"trades_eur_usd_id_desc_index" btree (id DESC)
"trades_eur_usd_timestamp_desc_index" btree ("timestamp" DESC)
Access method: heap
One thing that I forgot to explain better is what I'm doing to touch those tables. Basically I created a little Elixir backend with Ecto which will keep 1 connection alive for the pool and start a bulk insertion of rows to one of the 700 tables, so, basically, the steps are:
1) Choose one of the tables randomly;
2) Insert in one query 13000 rows to that table;
3) Go back to step 1 and repeat.
This will trigger the issue instantly and will make the memory usage of that connection grow very fast as shown in the last e-mail.
One very important thing to note is that if I change the above algorithm to do the same thing, but always choose the same table in step 1 (so I will do the bulk insertion always in 1 table instead of one of the 700), I will not trigger the issue, memory usage will not go up at all. it only triggers it if I'm doing the insertion in different tables each time.
The memory consumption curve you showed looks suspiciously like you have
circa 4GB shared buffers allocated and the reported number is mostly about
how many of those buffers the process has touched so far. This is a
common misunderstanding; "top" and similar tools tend to be really bad
about accounting for shared memory usage in any useful way. In any case,
the process you've shown us stopped growing its space consumption
some time ago, so I wonder where the OOM complaint actually came from.
The short answer is that you're probably barking up the wrong tree,
but there's not enough information here to provide any useful guesses
about which is the right tree.
I guess you are right, maybe the issue is with some shared buffers cache or something, I tried lowering to the minimum value each possible option in postgresql.conf that I found but still got the same OOM result.
Thanks a lot for the help, and feel free to ask for more information!
BQ_END
Can you try to run it without a connection pool? just a simple connection to the backend.
Unfortunately, same result :(
On Thu, Apr 16, 2020 at 11:43 AM <luis.roberto@siscobra.com.br> wrote:
Sending the reply again since it seems that it didn't go through the
mailing listHello Tom, first of all, thanks for the response!
Sorry to not have added a lot of the needed information, I will try to
rectify that in this message.
So, first is what I'm trying to do, in my backend, I store financial data
(candlesticks and trades) in Postgres + TimescaleDB, the way I do it is
separating each exchange in one database and each market and timeframe in
one table, for example, giving that I have `ExchangeA` with 1 market
EUR/USD and 4 timeframes, I would get these 5 tables:
candles_eur_usd_one_minute
candles_eur_usd_five_minutes
candles_eur_usd_third_minutes
candles_eur_usd_one_hour
trades_eur_usd
Normally in my backend, I have, per database (exchange) ~750 markets, and
~16 timeframes, so 750 * 16 = 12000 tables.
I noticed that Postgres would continuously increase my memory usage per
connection until I got an OOM error, which, in my backend terms, means
inserting candles and trades, ~2000 rows per minute. For a system with
16GB, it would take like 1 to 3 hours and then I would get the OOM. For my
other system which has 64GB, it would take like 2 days but it would come a
time when Postgres would use everything and crash too.
For versions, I was using Postgres 11 with latest TimescaleDB, since I was
not sure if this was a TimescaleDB issue, I first created a minimal example
that triggered the issue faster (the one I talked in the first e-mail that
uses 700 tables) and then tested without TimescaleDB in Postgres 11 and 12.
I got the same result in both, so removing TimescaleDB out of the equation.
For the operational system, I'm using ubuntu, but I do test it in my
Gentoo machine with the same results, and I also tested it using Postgres
in docker.
For postgresql.conf, I was using first one tunned by the TimescaleDB tuner
software, but then changed to a default one to see if it would make a
change.
My backend is written in Elixir, so I'm using Ecto library to connect with
Postgres, Ecto will create a pool of connections that will be alive as long
as my backend is alive too. To make it easier for me to test the issue, I
limited the number of connections to 1, so I can keep an eye on that
connection process memory usage, etc.Yes, Postgres caches some information about every table your session
has accessed, and no you don't have a lot of control over that, and
no it isn't a bug.Having said that, the amount of memory consumed this way is in the
vicinity of a few tens of KB per table, according to some simple
experiments I just did. I couldn't get a process touching 700 simple
tables to eat more than ~70MB of space. So either you are talking about
tables with enormously complex schemas, or there's something else going
on. (Please note that the SO post you are referencing describes an
attempt to use circa 100000 tables/views in one session, not 700.)Here are the details of one of the tables I'm using to do my test (all of
then are equal, just the name is different):
test_dev=# \d+ trades_eur_usd
Table "public.trades_eur_usd"
Column | Type | Collation | Nullable | Default
| Storage | Stats target | Description-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
id | integer | | not null |
| plain | |
timestamp | timestamp without time zone | | not null |
| plain | |
amount | numeric | | not null |
| main | |
price | numeric | | not null |
| main | |
Indexes:
"trades_eur_usd_id_desc_index" btree (id DESC)
"trades_eur_usd_timestamp_desc_index" btree ("timestamp" DESC)
Access method: heap
One thing that I forgot to explain better is what I'm doing to touch those
tables. Basically I created a little Elixir backend with Ecto which will
keep 1 connection alive for the pool and start a bulk insertion of rows to
one of the 700 tables, so, basically, the steps are:
1) Choose one of the tables randomly;
2) Insert in one query 13000 rows to that table;
3) Go back to step 1 and repeat.
This will trigger the issue instantly and will make the memory usage of
that connection grow very fast as shown in the last e-mail.
One very important thing to note is that if I change the above algorithm
to do the same thing, but always choose the same table in step 1 (so I will
do the bulk insertion always in 1 table instead of one of the 700), I will
not trigger the issue, memory usage will not go up at all. it only triggers
it if I'm doing the insertion in different tables each time.The memory consumption curve you showed looks suspiciously like you have
circa 4GB shared buffers allocated and the reported number is mostlyabout
how many of those buffers the process has touched so far. This is a
common misunderstanding; "top" and similar tools tend to be really bad
about accounting for shared memory usage in any useful way. In any case,
the process you've shown us stopped growing its space consumption
some time ago, so I wonder where the OOM complaint actually came from.The short answer is that you're probably barking up the wrong tree,
but there's not enough information here to provide any useful guesses
about which is the right tree.I guess you are right, maybe the issue is with some shared buffers cache
or something, I tried lowering to the minimum value each possible option in
postgresql.conf that I found but still got the same OOM result.
Thanks a lot for the help, and feel free to ask for more information!Can you try to run it without a connection pool? just a simple connection
to the backend.
--
Eduardo Barreto Alexandre
I think you need to fix your mail system, because clearly it's broken in
some strange way. If you look at [1]/messages/by-id/16363-a66916ab04716e58@postgresql.org you'll see the messages indeed got
to the mailing list so either you're not receiving messages from the
list or it's getting stashed in a spam folder or something.
[1]: /messages/by-id/16363-a66916ab04716e58@postgresql.org
On Thu, Apr 16, 2020 at 06:23:33PM -0300, Eduardo Barreto Alexandre wrote:
Unfortunately, same result :(
On Thu, Apr 16, 2020 at 11:43 AM <luis.roberto@siscobra.com.br> wrote:
Sending the reply again since it seems that it didn't go through the
mailing listHello Tom, first of all, thanks for the response!
Sorry to not have added a lot of the needed information, I will try to
rectify that in this message.
So, first is what I'm trying to do, in my backend, I store financial data
(candlesticks and trades) in Postgres + TimescaleDB, the way I do it is
separating each exchange in one database and each market and timeframe in
one table, for example, giving that I have `ExchangeA` with 1 market
EUR/USD and 4 timeframes, I would get these 5 tables:
candles_eur_usd_one_minute
candles_eur_usd_five_minutes
candles_eur_usd_third_minutes
candles_eur_usd_one_hour
trades_eur_usd
Normally in my backend, I have, per database (exchange) ~750 markets, and
~16 timeframes, so 750 * 16 = 12000 tables.
I noticed that Postgres would continuously increase my memory usage per
connection until I got an OOM error, which, in my backend terms, means
inserting candles and trades, ~2000 rows per minute. For a system with
16GB, it would take like 1 to 3 hours and then I would get the OOM. For my
other system which has 64GB, it would take like 2 days but it would come a
time when Postgres would use everything and crash too.
For versions, I was using Postgres 11 with latest TimescaleDB, since I was
not sure if this was a TimescaleDB issue, I first created a minimal example
that triggered the issue faster (the one I talked in the first e-mail that
uses 700 tables) and then tested without TimescaleDB in Postgres 11 and 12.
I got the same result in both, so removing TimescaleDB out of the equation.
For the operational system, I'm using ubuntu, but I do test it in my
Gentoo machine with the same results, and I also tested it using Postgres
in docker.
For postgresql.conf, I was using first one tunned by the TimescaleDB tuner
software, but then changed to a default one to see if it would make a
change.
My backend is written in Elixir, so I'm using Ecto library to connect with
Postgres, Ecto will create a pool of connections that will be alive as long
as my backend is alive too. To make it easier for me to test the issue, I
limited the number of connections to 1, so I can keep an eye on that
connection process memory usage, etc.Yes, Postgres caches some information about every table your session
has accessed, and no you don't have a lot of control over that, and
no it isn't a bug.Having said that, the amount of memory consumed this way is in the
vicinity of a few tens of KB per table, according to some simple
experiments I just did. I couldn't get a process touching 700 simple
tables to eat more than ~70MB of space. So either you are talking about
tables with enormously complex schemas, or there's something else going
on. (Please note that the SO post you are referencing describes an
attempt to use circa 100000 tables/views in one session, not 700.)Here are the details of one of the tables I'm using to do my test (all of
then are equal, just the name is different):
test_dev=# \d+ trades_eur_usd
Table "public.trades_eur_usd"
Column | Type | Collation | Nullable | Default
| Storage | Stats target | Description-----------+-----------------------------+-----------+----------+---------+---------+--------------+-------------
id | integer | | not null |
| plain | |
timestamp | timestamp without time zone | | not null |
| plain | |
amount | numeric | | not null |
| main | |
price | numeric | | not null |
| main | |
Indexes:
"trades_eur_usd_id_desc_index" btree (id DESC)
"trades_eur_usd_timestamp_desc_index" btree ("timestamp" DESC)
Access method: heap
One thing that I forgot to explain better is what I'm doing to touch those
tables. Basically I created a little Elixir backend with Ecto which will
keep 1 connection alive for the pool and start a bulk insertion of rows to
one of the 700 tables, so, basically, the steps are:
1) Choose one of the tables randomly;
2) Insert in one query 13000 rows to that table;
3) Go back to step 1 and repeat.
This will trigger the issue instantly and will make the memory usage of
that connection grow very fast as shown in the last e-mail.
One very important thing to note is that if I change the above algorithm
to do the same thing, but always choose the same table in step 1 (so I will
do the bulk insertion always in 1 table instead of one of the 700), I will
not trigger the issue, memory usage will not go up at all. it only triggers
it if I'm doing the insertion in different tables each time.The memory consumption curve you showed looks suspiciously like you have
circa 4GB shared buffers allocated and the reported number is mostlyabout
how many of those buffers the process has touched so far. This is a
common misunderstanding; "top" and similar tools tend to be really bad
about accounting for shared memory usage in any useful way. In any case,
the process you've shown us stopped growing its space consumption
some time ago, so I wonder where the OOM complaint actually came from.The short answer is that you're probably barking up the wrong tree,
but there's not enough information here to provide any useful guesses
about which is the right tree.I guess you are right, maybe the issue is with some shared buffers cache
or something, I tried lowering to the minimum value each possible option in
postgresql.conf that I found but still got the same OOM result.
Thanks a lot for the help, and feel free to ask for more information!Can you try to run it without a connection pool? just a simple connection
to the backend.--
Eduardo Barreto Alexandre
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Actually, I take it back, I did the 1 connection wrong.. Doing the
connection directly didn't trigger the problem.
Thanks Luis, your suggestion made me close the scope in the issue! It was
actually a cache done by default by Ecto (Elixir's database library), I
just created an issue in Ecto's github
https://github.com/elixir-ecto/ecto/issues/3284 and got the response
confirming it and also the fix to disable it.
Thanks you all very much for the help!