huge RAM use in multi-command ALTER of table heirarchy

Started by Justin Pryzbyover 8 years ago7 messages
#1Justin Pryzby
pryzby@telsasoft.com

I've seen this before while doing SET STATISTICS on a larger number of columns
using xargs, but just came up while doing ADD of a large number of columns.
Seems to be roughly linear in number of children but superlinear WRT columns.
I think having to do with catalog update / cache invalidation with many
ALTERs*children*columns?

32 cols and 2 children => 12MB
256 cols and 11 children => 74MB
256 cols and 111 children => 582MB
512 cols and 11 children => 229MB

(in our "huge" case, there were ~1600 columns and maybe even more children)

I was testing with this command
PGHOST=/tmp PGPORT=9999 sh -ec 'for maxcols in 512 ; do ~/src/postgresql.install/bin/postgres -D ~/src/postgres.dat -c port=9999 & sleep 4; cols=$(for d in `seq 1 $maxcols`; do echo "ADD c$d int,"; done |xargs); PGOPTIONS="-c client_min_messages=warning" psql postgres -qc "DROP TABLE t CASCADE" || [ $? -eq 1 ]; psql postgres -qc "CREATE TABLE t()"; for c in `seq 1 11`; do psql postgres -qc "CREATE TABLE c$c() INHERITS(t)"; done; for d in `seq 1 $maxcols`; do echo "ALTER TABLE t ADD c$d int;"; done |PGOPTIONS="-c client_min_messages=DEBUG3 -c log_statement_stats=on" psql postgres -c "ALTER TABLE t ${cols%,}" 2>/tmp/pg.err2; ~/src/postgresql.install/bin/pg_ctl -swD ~/src/postgres.dat stop; done'

..and log_statment_stats with a variation on the getrusage patch here
/messages/by-id/20170615145824.GC15684@telsasoft.com

Justin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#1)
Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

Justin Pryzby <pryzby@telsasoft.com> writes:

I've seen this before while doing SET STATISTICS on a larger number of columns
using xargs, but just came up while doing ADD of a large number of columns.
Seems to be roughly linear in number of children but superlinear WRT columns.
I think having to do with catalog update / cache invalidation with many
ALTERs*children*columns?

I poked into this a bit. The operation is necessarily roughly O(N^2) in
the number of columns, because we rebuild the affected table's relcache
entry after each elementary ADD COLUMN operation, and one of the principal
components of that cost is reading all the pg_attribute entries. However,
that should only be a time cost not a space cost. Eventually I traced the
O(N^2) space consumption to RememberToFreeTupleDescAtEOX, which seems to
have been introduced in Simon's commit e5550d5fe, and which strikes me as
a kluge of the first magnitude. Unless I am missing something, that
function's design concept can fairly be characterized as "let's leak
memory like there's no tomorrow, on the off chance that somebody somewhere
is ignoring basic coding rules".

I tried ripping that out, and the peak space consumption of your example
(with 20 child tables and 1600 columns) decreased from ~3GB to ~200MB.
Moreover, the system still passes make check-world, so it's not clear
to me what excuse this code has to live.

It's probably a bit late in the v10 cycle to be taking any risks in
this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
as soon as the v11 cycle opens, unless someone can show an example
of non-broken coding that requires it. (And if so, there ought to
be a regression test incorporating that.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

On 19 July 2017 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

I've seen this before while doing SET STATISTICS on a larger number of

columns

using xargs, but just came up while doing ADD of a large number of

columns.

Seems to be roughly linear in number of children but superlinear WRT

columns.

I think having to do with catalog update / cache invalidation with many
ALTERs*children*columns?

I poked into this a bit. The operation is necessarily roughly O(N^2) in
the number of columns, because we rebuild the affected table's relcache
entry after each elementary ADD COLUMN operation, and one of the principal
components of that cost is reading all the pg_attribute entries. However,
that should only be a time cost not a space cost. Eventually I traced the
O(N^2) space consumption to RememberToFreeTupleDescAtEOX, which seems to
have been introduced in Simon's commit e5550d5fe, and which strikes me as
a kluge of the first magnitude. Unless I am missing something, that
function's design concept can fairly be characterized as "let's leak
memory like there's no tomorrow, on the off chance that somebody somewhere
is ignoring basic coding rules".

I tried ripping that out, and the peak space consumption of your example
(with 20 child tables and 1600 columns) decreased from ~3GB to ~200MB.
Moreover, the system still passes make check-world, so it's not clear
to me what excuse this code has to live.

It's probably a bit late in the v10 cycle to be taking any risks in
this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
as soon as the v11 cycle opens, unless someone can show an example
of non-broken coding that requires it. (And if so, there ought to
be a regression test incorporating that.)

Just FYI, I believe Simon's currently on holiday, so may not notice this
discussion as promptly as usual.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4Greg Stark
stark@mit.edu
In reply to: Tom Lane (#2)
Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

On 19 July 2017 at 00:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's probably a bit late in the v10 cycle to be taking any risks in
this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
as soon as the v11 cycle opens, unless someone can show an example
of non-broken coding that requires it. (And if so, there ought to
be a regression test incorporating that.)

Would it be useful to keep in one of the memory checking assertion builds?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#4)
Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

Greg Stark <stark@mit.edu> writes:

On 19 July 2017 at 00:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's probably a bit late in the v10 cycle to be taking any risks in
this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
as soon as the v11 cycle opens, unless someone can show an example
of non-broken coding that requires it. (And if so, there ought to
be a regression test incorporating that.)

Would it be useful to keep in one of the memory checking assertion builds?

Why? Code that expects to continue accessing a relcache entry's tupdesc
after closing the relcache entry is broken, independently of whether it's
in a debug build or not.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#5)
Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

On 2017/07/20 22:19, Tom Lane wrote:

Greg Stark <stark@mit.edu> writes:

On 19 July 2017 at 00:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's probably a bit late in the v10 cycle to be taking any risks in
this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
as soon as the v11 cycle opens, unless someone can show an example
of non-broken coding that requires it. (And if so, there ought to
be a regression test incorporating that.)

Would it be useful to keep in one of the memory checking assertion builds?

Why? Code that expects to continue accessing a relcache entry's tupdesc
after closing the relcache entry is broken, independently of whether it's
in a debug build or not.

Am I wrong in thinking that TupleDesc refcounting (along with resowner
tracking) allows one to use a TupleDesc without worrying about the
lifetime of its parent relcache entry?

I'm asking this independently of the concerns being discussed in this
thread about having the RememberToFreeTupleDescAtEOX() mechanism on top of
TupleDesc refcounting.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Greg Stark
stark@mit.edu
In reply to: Tom Lane (#5)
Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

On 20 July 2017 at 14:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Greg Stark <stark@mit.edu> writes:

Would it be useful to keep in one of the memory checking assertion builds?

Why? Code that expects to continue accessing a relcache entry's tupdesc
after closing the relcache entry is broken, independently of whether it's
in a debug build or not.

Mea Culpa. I hadn't actually read the code and assumed it would be
sensible to change from something that freed these tupdescs into
something that raised an assertion if they were still unfreed at end
of transaction.

Reading the code I see that it's only there to *avoid* freeing the
tupledesc just in case there's something still using it. If we just
free it then the normal memory checking builds would catch cases where
they're used after being freed.

But what I still don't understand is whether the fact that it still
passes the regression tests means anything. Unless there happened to
be a sinval message at the wrong time the regression tests wouldn't
uncover any problem cases. If it passed the tests on a
CLOBBER_CACHE_ALWAYS build then perhaps that would prove it's safe? Or
perhaps if the columns haven't actually been altered it would still
fail to fail?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers