BUG #14384: pg_dump uses excessive amounts of memory for LOBs

Started by Bolek Ziobrowskiover 9 years ago3 messagesbugs
Jump to latest
#1Bolek Ziobrowski
boleslaw.ziobrowski@yahoo.pl

The following bug has been logged on the website:

Bug reference: 14384
Logged by: Bolek Ziobrowski
Email address: boleslaw.ziobrowski@yahoo.pl
PostgreSQL version: 9.5.4
Operating system: Ubuntu 14.04.5 LTS
Description:

pg_dump seems to allocate memory proportional to the number of rows in
pg_largeobject (not necessarily correlated with size of these objects) ,
e.g. 4 GB for a few millions and about 26 GB for 30 millions of lobs .

Steps to reproduce :
psql :

create table large_object_test( a int, lobj oid );

insert into large_object_test select a.i,lo_from_bytea(0,
E'\\xffffff0000000000') from generate_series(1,5000000) as a(i) ;

cli:

pg_dump postgres > /tmp/dump.sql

top -o "%MEM"

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bolek Ziobrowski (#1)
Re: BUG #14384: pg_dump uses excessive amounts of memory for LOBs

boleslaw.ziobrowski@yahoo.pl writes:

pg_dump seems to allocate memory proportional to the number of rows in
pg_largeobject (not necessarily correlated with size of these objects) ,

Yes, it does. It also allocates memory proportional to the number of,
eg, tables, or any other DB object for that matter.

This is a consequence of the fact that blobs grew owners and privileges
in 9.0. pg_dump uses its usual per-object infrastructure to keep track
of that. The argument was that this'd be okay because if your large
objects are, well, large, then there couldn't be so many of them that
the space consumption would be fatal. I had doubts about that at the
time, but I think we're more or less locked into it now. It would
take a lot of restructuring to change it, and we'd lose functionality
too, because we couldn't have a separate TOC entry per blob. That
means no ability to select out individual blobs during pg_restore.

TL;DR: blobs are not exactly lightweight objects. If you want something
with less overhead, maybe you should just store the data in a plain
bytea column.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Bolek Ziobrowski
boleslaw.ziobrowski@yahoo.pl
In reply to: Tom Lane (#2)
Re: BUG #14384: pg_dump uses excessive amounts of memory for LOBs

Tom Lane wrote:

TL;DR: blobs are not exactly lightweight objects. If you want something
with less overhead, maybe you should just store the data in a plain
bytea column.

Yes, I tested this case with bytea/TOAST in place of LOB column and
pg_dump used just a few MB .

Normally (when dumping a database with hundreds of tables) it uses
insignificant amount of memory so I was surprised when it allocated
about 26GB while processing a 100GB database with 30*10^6 LOBs (caused
swapping and almost triggered OOM killer).

The 'obvious' (for database this size) and easy solution was to use
pg_basebackup - and it sped up the process almost 40 times .

I found a similar case here
(/messages/by-id/524C3163.1050502@iqbuzz.ru) so
I think it would be good to warn users about this behavior in pg_dump
documentation (to save a few hours of the third person who triggers this
issue in ... October 2019).

Thank you,
Bolek Ziobrowski

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs