5 minutes to pg_dump nothing

Started by Nikita The Spider The Spiderover 18 years ago7 messagesgeneral
Jump to latest
#1Nikita The Spider The Spider
nikitathespider@gmail.com

Hi all,
I'm seeing a problem where pg_dump takes at least 5 minutes to execute
no matter what I ask it to dump -- even a non-existent or empty table.
One possible red flag is that pg_type contains 56508 rows. This
doesn't seem excessive to me, but perhaps it should. I've looked
through the source code and a system trace and I can see where the
execution time disappears, but I can't pinpoint the source of the
problem.

Some background: the database contains about 1200 schemas with 30+
tables and 20+ custom types in each. The whole database is only about
12G. An empty table created in another database on the same machine
pg_dumps in < .1sec, as expected. PG version is 8.1.4, OS is FreeBSD.

When I run pg_dump in verbose mode, it stalls after printing "reading
user-defined types". In the systrace, I can see the pg_types query at
the top of getTypes() in pgdump.c followed by 628 reads of 4k blocks
(= 2512k) which only takes a second or so in total. I guess this is
the loop through the tuples in getTypes().

There's then four calls to getDomainConstraints() followed by a
129-second delay during which nothing appears in the system trace.
After that, there's 124 pairs of these:
72263 pg_dump 135.956209 CALL break(0x9b37000)
72263 pg_dump 135.956222 RET break 0

They consume another 118 seconds, and then pg_dump moves on to
"reading procedural languages".

I've VACUUM FULL ANALYZEd everything I can think of. I'd really
appreciate any further hints or help.

PS - Thanks for Postgres; it works like a dream for me except for
infrequent bumps. =)

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nikita The Spider The Spider (#1)
Re: 5 minutes to pg_dump nothing

"Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:

I'm seeing a problem where pg_dump takes at least 5 minutes to execute
no matter what I ask it to dump -- even a non-existent or empty table.
One possible red flag is that pg_type contains 56508 rows. This
doesn't seem excessive to me, but perhaps it should.

That does seem like a lot --- what sort of types are they? Scalar,
composite, what? It's fairly likely that no one has tried to optimize
pg_dump for such a case.

It'd be helpful if you could recompile pg_dump with profiling enabled
(-pg compiler switch) and get a gprof profile to show where the time
is going.

regards, tom lane

#3Nikita The Spider The Spider
nikitathespider@gmail.com
In reply to: Tom Lane (#2)
Re: 5 minutes to pg_dump nothing

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:

I'm seeing a problem where pg_dump takes at least 5 minutes to execute
no matter what I ask it to dump -- even a non-existent or empty table.
One possible red flag is that pg_type contains 56508 rows. This
doesn't seem excessive to me, but perhaps it should.

That does seem like a lot --- what sort of types are they? Scalar,
composite, what? It's fairly likely that no one has tried to optimize
pg_dump for such a case.

Aha, thanks. Didn't realize I was pushing the bounds of what was
reasonable. Here's the type counts:
typrelkind | the_count
------------+-----------
| 114
sequence | 11496
composite | 12290
ordinary | 13844
TOAST | 9215
view | 9699
(6 rows)

It'd be helpful if you could recompile pg_dump with profiling enabled
(-pg compiler switch) and get a gprof profile to show where the time
is going.

Will do. I'm going to try to recreate the problem in my development
environment where I have a bit more freedom to tinker.

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

#4Nikita The Spider The Spider
nikitathespider@gmail.com
In reply to: Nikita The Spider The Spider (#3)
Re: 5 minutes to pg_dump nothing

On 9/21/07, Nikita The Spider The Spider <nikitathespider@gmail.com> wrote:

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:

I'm seeing a problem where pg_dump takes at least 5 minutes to execute
no matter what I ask it to dump -- even a non-existent or empty table.
One possible red flag is that pg_type contains 56508 rows. This
doesn't seem excessive to me, but perhaps it should.

That does seem like a lot --- what sort of types are they? Scalar,
composite, what? It's fairly likely that no one has tried to optimize
pg_dump for such a case.

Aha, thanks. Didn't realize I was pushing the bounds of what was
reasonable. Here's the type counts:
typrelkind | the_count
------------+-----------
| 114
sequence | 11496
composite | 12290
ordinary | 13844
TOAST | 9215
view | 9699
(6 rows)

I think I found the problem. getTypes() calls findFuncByOid() twice
for each item it sees. A comment at the top of findFuncByOid() says,
"should hash this, but just do linear search for now". So that funtion
is O(n)/2 where n = the # of functions found by the query in
getFuncs(), and since it's called twice for each item in getTypes, the
overall performance is O(m * n) where n = # of functions and m = # of
types. As I said before, I have ~56000 items in pg_type and the query
at the top of getFuncs returns ~98000 rows so for me performance is
O(5.4 billion)-ish.

Some experimentation backs up my assertion that this is the
performance bottleneck. I created another database on the same machine
with 10k each of tables, views and custom types. The tables contained
a SERIAL PRIMARY KEY and a text column so that PG created indexes and
TOAST tables to go along with them. The result was ~50000 items in
pg_type. When I run pg_dump against this database, it completes in 4
seconds so simply having a lot of items in pg_type is not enough to
recreate my problem.

I added 100k functions to that database and pg_dump took several
minutes to run (again, trying to dump a non-existent table), which is
what I'm seeing on my production database.

My database design aimed to make each of my 1600 or so schemas its own
little self-sufficent world, hence the proliferation of types and
functions. I didn't realize that at some level they're all effectively
in the same namespace. Now I know.

Sorry I can't offer a patch. It'd be fun to write C code again but my
skills are pretty rusty and I'm afraid I'd butcher it. :-P

Thanks for your help! Given that this problem seems to be triggered by
a sort of edge case and the fix is non-trivial, I guess I should not
expect a new version of pg_dump soon?

Cheers

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nikita The Spider The Spider (#4)
Re: 5 minutes to pg_dump nothing

"Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:

I think I found the problem. getTypes() calls findFuncByOid() twice
for each item it sees. A comment at the top of findFuncByOid() says,
"should hash this, but just do linear search for now". So that funtion
is O(n)/2 where n = the # of functions found by the query in
getFuncs(), and since it's called twice for each item in getTypes, the
overall performance is O(m * n) where n = # of functions and m = # of
types. As I said before, I have ~56000 items in pg_type and the query
at the top of getFuncs returns ~98000 rows so for me performance is
O(5.4 billion)-ish.

Hah, so you need *both* a lot of types and a lot of functions to have
a problem.

Thanks for your help! Given that this problem seems to be triggered by
a sort of edge case and the fix is non-trivial, I guess I should not
expect a new version of pg_dump soon?

We might look into fixing it for 8.3, but I doubt we'd risk back-patching
such a change to older branches.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: 5 minutes to pg_dump nothing

I wrote:

"Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:

Thanks for your help! Given that this problem seems to be triggered by
a sort of edge case and the fix is non-trivial, I guess I should not
expect a new version of pg_dump soon?

We might look into fixing it for 8.3, but I doubt we'd risk back-patching
such a change to older branches.

Actually, it doesn't look that hard --- want to try the attached patch?
I couldn't measure any speed difference at all on the standard PG
regression-test database, but there are not huge numbers of objects
in that.

regards, tom lane

#7Nikita The Spider The Spider
nikitathespider@gmail.com
In reply to: Tom Lane (#6)
Re: 5 minutes to pg_dump nothing

On 9/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

"Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:

Thanks for your help! Given that this problem seems to be triggered by
a sort of edge case and the fix is non-trivial, I guess I should not
expect a new version of pg_dump soon?

We might look into fixing it for 8.3, but I doubt we'd risk back-patching
such a change to older branches.

Actually, it doesn't look that hard --- want to try the attached patch?
I couldn't measure any speed difference at all on the standard PG
regression-test database, but there are not huge numbers of objects
in that.

Tom,
Your patch decreases the runtime of pg_dump by an order of magnitude
for me which means I don't have to roll my own table dumper. Thanks
very much for the swift turnaround and for a great database.

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more