\d on database with a lot of tables is slow

Started by Jim C. Nasbyover 20 years ago15 messages
#1Jim C. Nasby
jnasby@pervasive.com

I have a client with a database that contains 4000 relations according
to vacuum verbose, and \d in psql is painfully slow. In particular...

-> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1)
Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char")) AND pg_table_is_visible(oid))

That's off my laptop, but they're seeing similar issues on an 8-way
Opteron as well...

I've messed around with adding indexes to a copy of pg_class to no
avail. Any ideas on how to improve the performance?

Also, not sure if this matters, but they're occasionally getting errors
like 'Did not find relation named "table that exists"' (where 'table
that exists' is the name of some table that is in the catalog) from \d.
Does anyone know what causes that?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#2Rod Taylor
pg@rbt.ca
In reply to: Jim C. Nasby (#1)
Re: \d on database with a lot of tables is slow

On Sat, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote:

I have a client with a database that contains 4000 relations according
to vacuum verbose, and \d in psql is painfully slow. In particular...

-> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1)
Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char")) AND pg_table_is_visible(oid))

That's off my laptop, but they're seeing similar issues on an 8-way
Opteron as well...

I've messed around with adding indexes to a copy of pg_class to no
avail. Any ideas on how to improve the performance?

It is probably the visibility checks. Is a \d fast if you include the
full name (schema.table)?

I brought this up a while ago and Tom has since rearranged some of the
psql queries to move the visibility check to come after the other where
clause segments.

It would be nice if the cost of the function could be added somehow --
even if it was just a low, medium or high setting. This would allow the
planner to shuffle the where clause executing ordering around in a
reasonable manner.
--

#3Jim C. Nasby
jnasby@pervasive.com
In reply to: Rod Taylor (#2)
Re: \d on database with a lot of tables is slow

On Sat, Sep 24, 2005 at 08:20:47PM -0400, Rod Taylor wrote:

On Sat, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote:

I have a client with a database that contains 4000 relations according
to vacuum verbose, and \d in psql is painfully slow. In particular...

-> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1)
Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char")) AND pg_table_is_visible(oid))

That's off my laptop, but they're seeing similar issues on an 8-way
Opteron as well...

I've messed around with adding indexes to a copy of pg_class to no
avail. Any ideas on how to improve the performance?

It is probably the visibility checks. Is a \d fast if you include the
full name (schema.table)?

I brought this up a while ago and Tom has since rearranged some of the
psql queries to move the visibility check to come after the other where
clause segments.

It would be nice if the cost of the function could be added somehow --
even if it was just a low, medium or high setting. This would allow the
planner to shuffle the where clause executing ordering around in a
reasonable manner.

\d tablename is fast, yes.

Maybe instead of re-arranging the query it would make more sense to roll
the visibility check into the query itself (probably using a new system
view).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#4Hannu Krosing
hannu@skype.net
In reply to: Jim C. Nasby (#1)
Re: \d on database with a lot of tables is slow

On L, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote:

I have a client with a database that contains 4000 relations according
to vacuum verbose, and \d in psql is painfully slow. In particular...

-> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1)
Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char")) AND pg_table_is_visible(oid))

That's off my laptop, but they're seeing similar issues on an 8-way
Opteron as well...

I expext the issue on 8-way opteron to be more of a high load than slow
scan. It seems that sometimes a database with lots of activity slows
down considerably. I suspect some locking issues, but I'm not sure this
is the case.

Also, if a lot of temp tebles are used, then pg_class and pg_attribute
(at least) get bloated quickly and need vacuuming .

I've messed around with adding indexes to a copy of pg_class to no
avail. Any ideas on how to improve the performance?

Also, not sure if this matters, but they're occasionally getting errors
like 'Did not find relation named "table that exists"' (where 'table
that exists' is the name of some table that is in the catalog) from \d.
Does anyone know what causes that?

mostly this happens on temp tables from other connections, which have
managed to disappear by the time their detailed info is requested, and
which would actually not show up in \d output due tu visibility checks.

--
Hannu Krosing <hannu@skype.net>

#5Jim C. Nasby
jnasby@pervasive.com
In reply to: Hannu Krosing (#4)
Re: \d on database with a lot of tables is slow

On Sun, Sep 25, 2005 at 10:10:22AM +0300, Hannu Krosing wrote:

On L, 2005-09-24 at 18:59 -0500, Jim C. Nasby wrote:

I have a client with a database that contains 4000 relations according
to vacuum verbose, and \d in psql is painfully slow. In particular...

-> Seq Scan on pg_class c (cost=0.00..2343.09 rows=6124 width=73) (actual time=0.325..22100.840 rows=16856 loops=1)
Filter: (((relkind = 'r'::"char") OR (relkind = 'v'::"char") OR (relkind = 'S'::"char") OR (relkind = ''::"char")) AND pg_table_is_visible(oid))

That's off my laptop, but they're seeing similar issues on an 8-way
Opteron as well...

I expext the issue on 8-way opteron to be more of a high load than slow
scan. It seems that sometimes a database with lots of activity slows
down considerably. I suspect some locking issues, but I'm not sure this
is the case.

No temp tables in use. The issue didn't appear to be load-dependant,
either.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#6Jim C. Nasby
jnasby@pervasive.com
In reply to: Hannu Krosing (#4)
Re: \d on database with a lot of tables is slow

On Sun, Sep 25, 2005 at 10:10:22AM +0300, Hannu Krosing wrote:

Also, not sure if this matters, but they're occasionally getting errors
like 'Did not find relation named "table that exists"' (where 'table
that exists' is the name of some table that is in the catalog) from \d.
Does anyone know what causes that?

mostly this happens on temp tables from other connections, which have
managed to disappear by the time their detailed info is requested, and
which would actually not show up in \d output due tu visibility checks.

They are seeing this doing \d on regular tables that won't be getting
deleted. Maybe there's some locking that should be happening but isn't?
Or maybe \d should be doing things in one statement instead of a bunch?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim C. Nasby (#5)
Re: \d on database with a lot of tables is slow

"Jim C. Nasby" <jnasby@pervasive.com> writes:

I have a client with a database that contains 4000 relations according
to vacuum verbose, and \d in psql is painfully slow. In particular...

It's presumably mostly in the pg_table_is_visible() calls. Not sure if
we can do much to speed those up, but: how many schemas in your search
path? What's the distribution of pg_class entries among the schemas?

regards, tom lane

#8Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#7)
Re: \d on database with a lot of tables is slow

On Mon, Sep 26, 2005 at 03:00:44PM -0400, Tom Lane wrote:

"Jim C. Nasby" <jnasby@pervasive.com> writes:

I have a client with a database that contains 4000 relations according
to vacuum verbose, and \d in psql is painfully slow. In particular...

It's presumably mostly in the pg_table_is_visible() calls. Not sure if
we can do much to speed those up, but: how many schemas in your search
path? What's the distribution of pg_class entries among the schemas?

db=# show search_path;
decibel, pg_sysviews, information_schema, rrs, public

db=# select schema_name, count(*) from pg_user_tables group by schema_name;
public | 764
ledger | 6
state | 2
_track_replica | 10
repair | 3
summarized | 586
orders | 512
snapshot | 1012
acl | 10

db=#

Also, do you have any idea on the 'Did not find relation named
"table-thats-there"' error? table-thats-there isn't a temp table, and I don't
believe they're even using temp tables, so I don't think that's the issue.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim C. Nasby (#8)
Re: \d on database with a lot of tables is slow

"Jim C. Nasby" <jnasby@pervasive.com> writes:

On Mon, Sep 26, 2005 at 03:00:44PM -0400, Tom Lane wrote:

It's presumably mostly in the pg_table_is_visible() calls. Not sure if
we can do much to speed those up, but: how many schemas in your search
path? What's the distribution of pg_class entries among the schemas?

db=# show search_path;
decibel, pg_sysviews, information_schema, rrs, public

db=# select schema_name, count(*) from pg_user_tables group by schema_name;
public | 764
ledger | 6
state | 2
_track_replica | 10
repair | 3
summarized | 586
orders | 512
snapshot | 1012
acl | 10

Hmm, so lots and lots of tables that aren't visible at all ... that's
definitely the slowest case for pg_table_is_visible. I'll think about
whether we can improve it.

Also, do you have any idea on the 'Did not find relation named
"table-thats-there"' error? table-thats-there isn't a temp table, and I don't
believe they're even using temp tables, so I don't think that's the issue.

Uh, what's the exact error message again? (If it's a backend message,
the verbose form would be useful.)

regards, tom lane

#10Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#9)
Re: \d on database with a lot of tables is slow

On Tue, Sep 27, 2005 at 06:53:13PM -0400, Tom Lane wrote:

Also, do you have any idea on the 'Did not find relation named
"table-thats-there"' error? table-thats-there isn't a temp table, and I don't
believe they're even using temp tables, so I don't think that's the issue.

Uh, what's the exact error message again? (If it's a backend message,
the verbose form would be useful.)

Sorry, remembered it wrong. It's 'Did not find any relation named',
which appears to be in bin/psql/describe.c. It does occur when trying to
do a \d on a specific table.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim C. Nasby (#10)
Re: \d on database with a lot of tables is slow

"Jim C. Nasby" <jnasby@pervasive.com> writes:

Sorry, remembered it wrong. It's 'Did not find any relation named',
which appears to be in bin/psql/describe.c. It does occur when trying to
do a \d on a specific table.

Hmm, no immediate ideas. You haven't provided a lot of context about
this --- when it happens, is it repeatable? Are they giving an exact
table name or a pattern to \d? Is a schema name included in what they
give to \d? What PG version are they running exactly?

regards, tom lane

#12Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#11)
Re: \d on database with a lot of tables is slow

On Wed, Sep 28, 2005 at 11:35:31AM -0400, Tom Lane wrote:

"Jim C. Nasby" <jnasby@pervasive.com> writes:

Sorry, remembered it wrong. It's 'Did not find any relation named',
which appears to be in bin/psql/describe.c. It does occur when trying to
do a \d on a specific table.

Hmm, no immediate ideas. You haven't provided a lot of context about
this --- when it happens, is it repeatable? Are they giving an exact
table name or a pattern to \d? Is a schema name included in what they
give to \d? What PG version are they running exactly?

Sorry, had the error message wrong:

ERROR: cache lookup failed for relation 1906465919

It is on an exact table name. When we retry the describe on a failure,
sometimes it works and sometimes it fails again. When it fails again
the relation number is different. Also, \d schema.table always returns
quickly and never errors. \d table is slow and produces the error fairly
often.

They're using 8.0.x; I'm pretty certain it's 8.0.3.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#13Jon Jensen
jon@endpoint.com
In reply to: Jim C. Nasby (#12)
Re: \d on database with a lot of tables is slow

On Fri, 30 Sep 2005, Jim C. Nasby wrote:

ERROR: cache lookup failed for relation 1906465919

It is on an exact table name. When we retry the describe on a failure,
sometimes it works and sometimes it fails again. When it fails again
the relation number is different. Also, \d schema.table always returns
quickly and never errors. \d table is slow and produces the error fairly
often.

They're using 8.0.x; I'm pretty certain it's 8.0.3.

We have the exact same problem on a 7.4.5 database. Some basic info on the
database: psql's \d returns 424 rows, and the on-disk size of the database
is about 11 GB. A standalone \dt throws the same occasional error.

I've suspected that it may be caused by ongoing periodic (at least once an
hour) rebuilding of certain generated tables with TRUNCATE and then INSERT
INTO the_table SELECT ... inside a transaction. But I don't have any proof
of that; it's just the most obvious different thing going on compared to
other databases we have.

It does seem like I've encountered the error less often since increasing
the max_fsm_pages setting and thus had more effective VACUUM and less
pg_class bloat, but OTOH I trained myself not to do \d there very often
either, since it was so slow and failed so often, so that may be
coincidence. :)

Jon

--
Jon Jensen
End Point Corporation
http://www.endpoint.com/
Software development with Interchange, Perl, PostgreSQL, Apache, Linux, ...

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#9)
Re: \d on database with a lot of tables is slow

I wrote:

It's presumably mostly in the pg_table_is_visible() calls.

I did some profiling on a test case with 10000 tables, and noticed that
a big part of the problem is that the catalog caches become entirely
useless: almost every catcache lookup ends up going to the underlying
tables. This is because MAXCCTUPLES in catcache.c is fixed at 5000,
and that's not an adequate working set for this many tables. If you
are willing to throw memory at the problem, you could try increasing
MAXCCTUPLES (to say 50K or 100K) and see if that helps.

regards, tom lane

#15Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#14)
Re: \d on database with a lot of tables is slow

On Sat, Oct 01, 2005 at 02:00:12PM -0400, Tom Lane wrote:

I wrote:

It's presumably mostly in the pg_table_is_visible() calls.

I did some profiling on a test case with 10000 tables, and noticed that
a big part of the problem is that the catalog caches become entirely
useless: almost every catcache lookup ends up going to the underlying
tables. This is because MAXCCTUPLES in catcache.c is fixed at 5000,
and that's not an adequate working set for this many tables. If you
are willing to throw memory at the problem, you could try increasing
MAXCCTUPLES (to say 50K or 100K) and see if that helps.

Out of curiosity... does catcache cache all pg_* tables? Also, at what
point would it be good to up NCCBUCKETS?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461