BUG #14845: postgres_fdw and mysql_fdw can cause crash of instance
The following bug has been logged on the website:
Bug reference: 14845
Logged by: Josef Machytka
Email address: josef.machytka@gmail.com
PostgreSQL version: 9.6.5
Operating system: Debian 8
Description:
Hi,
I work from time to time with postgres_fdw and mysql_fdw and they both seem
to have the same problem - they use too much memory if I query huge remote
tables.
Since I have seen this behavior before on our monitoring I decided to test
it on testing instance with really huge amount of remote data which would
not fit into memory.
Behavior was as expected - postgesql started to use more and more memory
until the whole available memory was used.
After that in first case postgresql crashed (which was more or less OK
because it restarted) but in second case the whole Debian instance crashed
and did not want to start again even after repeated attempts. I did not
fiddle with it and just recreated it but I even do not want to imagine if it
would happen on some other hugely used instance.
This behavior limits usage of those fdw very significantly. I cannot allow
other people to use foreign tables on really huge tables we have and I have
to check sizes before I use some query which is expected to be "heavy"... Of
course I would love to hear from someone that I should not use "heavy
queries"... :-)
So is there a way to for example force fdw to swap data into some temp file
and not use so much memory? This behavior seems to not depend on work_mem
setting...
Thanks
Josef
(Config: postgresql 9.6.5, latest versions of both fdw, Debian 8)
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 6 October 2017 at 13:24, <josef.machytka@gmail.com> wrote:
After that in first case postgresql crashed (which was more or less OK
because it restarted) but in second case the whole Debian instance crashed
What causes the crash, is it the out-of-memory (OOM) killer?
How have you configured overcommit on your Debian instance?
https://www.postgresql.org/docs/current/static/kernel-resources.html#linux-memory-overcommit
If it is the OOM killer killing your process, you may want to disable
overcommit. Doing that should cause the client to get a decent error
message instead of the the whole cluster being restarted.
regards,
Feike
On Fri, Oct 6, 2017 at 8:24 AM, <josef.machytka@gmail.com> wrote:
The following bug has been logged on the website:
Bug reference: 14845
Logged by: Josef Machytka
Email address: josef.machytka@gmail.com
PostgreSQL version: 9.6.5
Operating system: Debian 8
Description:Hi,
I work from time to time with postgres_fdw and mysql_fdw and they both
seem
to have the same problem - they use too much memory if I query huge remote
tables.Since I have seen this behavior before on our monitoring I decided to test
it on testing instance with really huge amount of remote data which would
not fit into memory.Behavior was as expected - postgesql started to use more and more memory
until the whole available memory was used.After that in first case postgresql crashed (which was more or less OK
because it restarted) but in second case the whole Debian instance crashed
and did not want to start again even after repeated attempts. I did not
fiddle with it and just recreated it but I even do not want to imagine if
it
would happen on some other hugely used instance.
OOMKiller?? What about your overcommit settings?
This behavior limits usage of those fdw very significantly. I cannot allow
other people to use foreign tables on really huge tables we have and I
have
to check sizes before I use some query which is expected to be "heavy"...
Of
course I would love to hear from someone that I should not use "heavy
queries"... :-)So is there a way to for example force fdw to swap data into some temp
file
and not use so much memory?
Would be nice if you can provide to us a reproducible test case, so we can
try to figure out what's happen...
This behavior seems to not depend on work_mem
setting...
No, work_mem is used for sorting purposes.
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Show quoted text
Timbira: http://www.timbira.com.br
Blog: http://fabriziomello.github.io
Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Github: http://github.com/fabriziomello
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= <fabriziomello@gmail.com> writes:
I work from time to time with postgres_fdw and mysql_fdw and they both seem
to have the same problem - they use too much memory if I query huge remote
tables.
Would be nice if you can provide to us a reproducible test case, so we can
try to figure out what's happen...
Indeed. I can't say about mysql_fdw, but postgres_fdw is only supposed to
fetch 100 rows at a time (cf fetch_size option), so it shouldn't run out
of memory just because the amount of data to be fetched is large.
One idea is that, because the planner lacks information about the remote
table, it picks some stupid plan that ends up blowing out memory. In that
case, perhaps turning on use_remote_estimate would help. But we're just
guessing here due to lack of useful detail.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
yes, sorry, somehow I forgot description of task....
In this case which went wrong I used postgres_fdw to compare data on local
and remote database using "select all from remote except select all from
local".
I selected the same table on remote and local which has ~200M rows and
total size ~20GB. I needed to see all differences because we get some
erratic differences... Estimation from previous limited queries was that
differences are only in approx 1 to 3% rows. So I decided to try to select
them all to look for some patterns...
Testing instance (on Google compute engine) had 4 CPUs, 26 GB of RAM, as
for OOM killer - I used default setting on Debian 8 without any changes -
so
/proc/sys/vm/overcommit_memory = 0
Monitoring done by telegraf on local + influxDB + grafana on other instance.
Nothing else running on that instance, postgresql on instance contained
only this huge table.
After ~25 minutes of run all memory was used and as I mentioned in first
case postgresql crashed and in second test (in which I lowered work_mem
from 24M to 8M and increased a shared_buffers to 8GB to see if it helps)
the whole instance crashed and did not want to start any more. SSD disk
500GB was almost empty so no problems with disk space.
Since I did not have time to fiddle with it I just dropped crashed instance
and used ansible to create a new one.
Thanks
On 6 October 2017 at 16:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= <fabriziomello@gmail.com> writes:
I work from time to time with postgres_fdw and mysql_fdw and they both
seem
to have the same problem - they use too much memory if I query huge
remote
tables.
Would be nice if you can provide to us a reproducible test case, so we
can
try to figure out what's happen...
Indeed. I can't say about mysql_fdw, but postgres_fdw is only supposed to
fetch 100 rows at a time (cf fetch_size option), so it shouldn't run out
of memory just because the amount of data to be fetched is large.One idea is that, because the planner lacks information about the remote
table, it picks some stupid plan that ends up blowing out memory. In that
case, perhaps turning on use_remote_estimate would help. But we're just
guessing here due to lack of useful detail.regards, tom lane
Josef Machytka <josef.machytka@gmail.com> writes:
In this case which went wrong I used postgres_fdw to compare data on local
and remote database using "select all from remote except select all from
local".
I think my theory B probably applies then: without use_remote_estimate,
the planner would not guess that the remote table is huge, and that could
well allow it to pick a hashed EXCEPT implementation --- which will then
try to collect the entire remote table into an in-memory hash table.
I don't know whether mysql_fdw has a comparable switch, but in any case
if it wasn't turned on then the same failure mode could apply to that FDW.
I wonder if we should rethink the default setting of use_remote_estimate
being "off". Particularly as we keep on allowing postgres_fdw to consider
more different plan types, it seems like the potential for stupid plan
choices is getting worse all the time.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Oct 6, 2017 at 10:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josef Machytka <josef.machytka@gmail.com> writes:
In this case which went wrong I used postgres_fdw to compare data on
local
and remote database using "select all from remote except select all from
local".I think my theory B probably applies then: without use_remote_estimate,
the planner would not guess that the remote table is huge, and that could
well allow it to pick a hashed EXCEPT implementation --- which will then
try to collect the entire remote table into an in-memory hash table.
Unfortunately, use_remote_estimate won't help for this particular case.
While it gets the estimate for the number of rows from the remote estimate,
HashSetOp doesn't care about estimated number of rows, only estimated
number of distinct rows. And use_remote_estimate doesn't provide that, so
it falls back on assuming (I think) 200 distinct rows.
What could help is running 'analyze' locally against the foreign table.
That can cause other problems, though.
Cheers,
Jeff
"What could help is running 'analyze' locally against the foreign table.
That can cause other problems, though."
Yes, this really helped - ANALYZE on foreign table took ~45 minutes but
after that I was able to get results and memory was OK too. Only several
hundreds of MB used...