Enumeration of tables is very slow in largish database

Started by Kirill Müllerover 14 years ago24 messagesgeneral

kirill.mueller@ivt.baug.ethz.ch

over 14 years ago

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
takes several seconds to e.g. open a schema (click on a schema's "+" in
the tree view).

The problems occurred only after adding that many schemas to the
database. Before, with only 10+ schemas, the performance was acceptable.

Is this a known limitation of Postgres, or perhaps a misconfiguration of
our installation? What would you suggest to improve performance here? We
currently don't have administration rights for the database or login
rights for the server machine (Linux), but I think we'll need to take
care about that.

Best regards

Kirill

Guillaume Lelarge

guillaume@lelarge.info

over 14 years ago

In reply to: Kirill Müller (#1)

Re: Enumeration of tables is very slow in largish database

On Wed, 2012-01-11 at 11:07 +0100, Kirill Müller wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
takes several seconds to e.g. open a schema (click on a schema's "+" in
the tree view).

The problems occurred only after adding that many schemas to the
database. Before, with only 10+ schemas, the performance was acceptable.

Is this a known limitation of Postgres, or perhaps a misconfiguration of
our installation?

I don't think it has anything to do with PostgreSQL. It has to do with
the client. I don't know for QGis but, in the case of pgAdmin, I'm
pretty sure the issue is pgAdmin. When you click on the + sign of a
schema, pgAdmin has to get all the informations on the schema: tables,
columns, constraints, triggers, and all the other objets found in the
schema. It could take some time. Moreover, pgAdmin has to put all this
in the treeview, and my guess would be that it's probably the most time
consuming operation here. We could probably get better performance, but
I didn't have the time to look at that yet.

--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
PostgreSQL Sessions #3: http://www.postgresql-sessions.org

Andres Freund

andres@anarazel.de

over 14 years ago

In reply to: Kirill Müller (#1)

Re: Enumeration of tables is very slow in largish database

On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
takes several seconds to e.g. open a schema (click on a schema's "+" in
the tree view).

Are you actually sure its the database and not just pgadmin thats getting
really slow?

If you connect via psql and use \dt (see \? for a list of commands) and
consorts, is it that slow as well?

Andres

Guillaume Lelarge

guillaume@lelarge.info

over 14 years ago

In reply to: Andres Freund (#3)

Re: Enumeration of tables is very slow in largish database

On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote:

On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
takes several seconds to e.g. open a schema (click on a schema's "+" in
the tree view).

Are you actually sure its the database and not just pgadmin thats getting
really slow?

If you connect via psql and use \dt (see \? for a list of commands) and
consorts, is it that slow as well?

\dt won't be as slow as pgAdmin. \dt only gets the table name, owner,
and stuff like that. Kinda quick. pgAdmin will get also all the other
informations, like columns, triggers, constraints, functions, types,
etc.

--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
PostgreSQL Sessions #3: http://www.postgresql-sessions.org

Andres Freund

andres@anarazel.de

over 14 years ago

In reply to: Guillaume Lelarge (#4)

Re: Enumeration of tables is very slow in largish database

On Wednesday, January 11, 2012 02:53:06 PM Guillaume Lelarge wrote:

On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote:

On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables
in each schema, generated automatically. When adding a new PostGIS
layer in QGis, the application obviously enumerates all tables, and
this takes minutes. Even browsing the database in pgAdmin3 is horribly
slow -- it takes several seconds to e.g. open a schema (click on a
schema's "+" in the tree view).

Are you actually sure its the database and not just pgadmin thats getting
really slow?

If you connect via psql and use \dt (see \? for a list of commands) and
consorts, is it that slow as well?

\dt won't be as slow as pgAdmin. \dt only gets the table name, owner,
and stuff like that. Kinda quick. pgAdmin will get also all the other
informations, like columns, triggers, constraints, functions, types,
etc.

Yes, sure. My guess is that the gui/pgadmin is the bottleneck and not postgres
itself. Its hard to really do all what pgadmin does at once inside psql
though.

Andres

Guillaume Lelarge

guillaume@lelarge.info

over 14 years ago

In reply to: Andres Freund (#5)

Re: Enumeration of tables is very slow in largish database

On Wed, 2012-01-11 at 15:10 +0100, Andres Freund wrote:

On Wednesday, January 11, 2012 02:53:06 PM Guillaume Lelarge wrote:

On Wed, 2012-01-11 at 14:44 +0100, Andres Freund wrote:

On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables
in each schema, generated automatically. When adding a new PostGIS
layer in QGis, the application obviously enumerates all tables, and
this takes minutes. Even browsing the database in pgAdmin3 is horribly
slow -- it takes several seconds to e.g. open a schema (click on a
schema's "+" in the tree view).

Are you actually sure its the database and not just pgadmin thats getting
really slow?

If you connect via psql and use \dt (see \? for a list of commands) and
consorts, is it that slow as well?

\dt won't be as slow as pgAdmin. \dt only gets the table name, owner,
and stuff like that. Kinda quick. pgAdmin will get also all the other
informations, like columns, triggers, constraints, functions, types,
etc.

Yes, sure. My guess is that the gui/pgadmin is the bottleneck and not postgres
itself. Its hard to really do all what pgadmin does at once inside psql
though.

Yeah, sure enough.

--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
PostgreSQL Sessions #3: http://www.postgresql-sessions.org

Adrian Klaver

adrian.klaver@aklaver.com

over 14 years ago

In reply to: Kirill Müller (#1)

Re: Enumeration of tables is very slow in largish database

On Wednesday, January 11, 2012 2:07:23 am Kirill Müller wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
takes several seconds to e.g. open a schema (click on a schema's "+" in
the tree view).

Would seem that they both have the same issue, namely pulling over the table
names and the meta data is resource intensive.

Not a QGis user but I did find this in the manual:

http://download.osgeo.org/qgis/doc/manual/qgis-1.7.0_user_guide_en.pdf

See in-line comment:

4.2.2. Loading a PostGIS Layer

Once you have one or more connections deﬁned, you can load layers from the
PostgreSQL database.
Of course this requires having data in PostgreSQL. See Section 4.2.4 for a
discussion on importing data into
the database.
To load a layer from PostGIS, perform the following steps:
– If the Add PostGIS Table(s) dialog is not already open, click on the Add
PostGIS Layer toolbar button.
– Choose the connection from the drop-down list and click Connect .
– Select or unselect Also list tables with no geometry
– Optionally use some Search Options to deﬁne which features to load from the
layer or use the
Build query icon to start the Query builder dialog.

^^^^^^^^^^^^^^^
Wonder if it would be possible to restrict the dataset(tables) by using the
above?

– Find the layer(s) you wish to add in the list of available layers.
– Select it by clicking on it. You can select multiple layers by holding down
the shift key while clicking.
See Section 4.6 for information on using the PostgreSQL Query Builder to further
deﬁne the layer.
– Click on the Add button to add the layer to the map.

The problems occurred only after adding that many schemas to the
database. Before, with only 10+ schemas, the performance was acceptable.

Is this a known limitation of Postgres, or perhaps a misconfiguration of
our installation? What would you suggest to improve performance here? We
currently don't have administration rights for the database or login
rights for the server machine (Linux), but I think we'll need to take
care about that.

Best regards

Kirill

--
Adrian Klaver
adrian.klaver@gmail.com

Scott Marlowe

scott.marlowe@gmail.com

over 14 years ago

In reply to: Kirill Müller (#1)

Re: Enumeration of tables is very slow in largish database

On Wed, Jan 11, 2012 at 3:07 AM, Kirill Müller
<kirill.mueller@ivt.baug.ethz.ch> wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it takes
several seconds to e.g. open a schema (click on a schema's "+" in the tree
view).

The problems occurred only after adding that many schemas to the database.
Before, with only 10+ schemas, the performance was acceptable.

Is this a known limitation of Postgres, or perhaps a misconfiguration of our
installation? What would you suggest to improve performance here? We
currently don't have administration rights for the database or login rights
for the server machine (Linux), but I think we'll need to take care about
that.

This is a problem I've run into before, but I can't find the previous
post on it. When you run a \d command, if you run top on your server
do you see a single CPU spinning hard on that one command? If so then
it's a pg server side problem, which is what I had on one server with
~40k objects in it.

Off the top of my head I remember something like this helping:

alter function pg_table_is_visible cost 10;

But I'm not sure that's it. Maybe Tom Lane can pipe up on this.

Kirill Müller

kirill.mueller@ivt.baug.ethz.ch

over 14 years ago

In reply to: Adrian Klaver (#7)

Re: Enumeration of tables is very slow in largish database

Adrian,

On 01/11/2012 04:32 PM, Adrian Klaver wrote:

Would seem that they both have the same issue, namely pulling over the table
names and the meta data is resource intensive.

The problem is that the slow part is the "connect", just at the ***** in
the in-line comment :-)

4.2.2. Loading a PostGIS Layer

Once you have one or more connections deﬁned, you can load layers from the
PostgreSQL database.
Of course this requires having data in PostgreSQL. See Section 4.2.4 for a
discussion on importing data into
the database.
To load a layer from PostGIS, perform the following steps:
– If the Add PostGIS Table(s) dialog is not already open, click on the Add
PostGIS Layer toolbar button.
– Choose the connection from the drop-down list and click Connect .

*******

– Select or unselect Also list tables with no geometry
– Optionally use some Search Options to deﬁne which features to load from the
layer or use the
Build query icon to start the Query builder dialog.

^^^^^^^^^^^^^^^
Wonder if it would be possible to restrict the dataset(tables) by using the
above?

Thanks for your feedback anyway, this helps clarifying the issue.

Cheers

Kirill

--
_________________________________________________
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone: +41 44 633 33 17
Fax: +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail: kirill.mueller@ivt.baug.ethz.ch

#10

Kirill Müller

kirill.mueller@ivt.baug.ethz.ch

over 14 years ago

In reply to: Andres Freund (#3)

Re: Enumeration of tables is very slow in largish database

On 01/11/2012 02:44 PM, Andres Freund wrote:

On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
takes several seconds to e.g. open a schema (click on a schema's "+" in
the tree view).

Are you actually sure its the database and not just pgadmin thats getting
really slow?

If you connect via psql and use \dt (see \? for a list of commands) and
consorts, is it that slow as well?

\dt is quick, I haven't tried the other commands, though. I have built
qgis from source and will try to find out this way which query stalls. I
wonder if Postgres has a profiling tool like MS SQL Server that would
allow tracing the queries and their runtime while they are executed. Or
perhaps there are logs? Could you give me some pointers, please?

Kirill

--
_________________________________________________
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone: +41 44 633 33 17
Fax: +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail: kirill.mueller@ivt.baug.ethz.ch

#11

Pavel Stehule

pavel.stehule@gmail.com

over 14 years ago

In reply to: Kirill Müller (#10)

Re: Enumeration of tables is very slow in largish database

2012/1/11 Kirill Müller <kirill.mueller@ivt.baug.ethz.ch>:

On 01/11/2012 02:44 PM, Andres Freund wrote:

On Wednesday, January 11, 2012 11:07:23 AM Kirill Müller wrote:

Hi all,

we have a Postgres/PostGIS database with 100+ schemas and 200+ tables in
each schema, generated automatically. When adding a new PostGIS layer in
QGis, the application obviously enumerates all tables, and this takes
minutes. Even browsing the database in pgAdmin3 is horribly slow -- it
takes several seconds to e.g. open a schema (click on a schema's "+" in
the tree view).

Are you actually sure its the database and not just pgadmin thats getting
really slow?

If you connect via psql and use \dt (see \? for a list of commands) and
consorts, is it that slow as well?

\dt is quick, I haven't tried the other commands, though. I have built qgis
from source and will try to find out this way which query stalls. I wonder
if Postgres has a profiling tool like MS SQL Server that would allow tracing
the queries and their runtime while they are executed. Or perhaps there are
logs? Could you give me some pointers, please?

log_min_duration_statement = 0 in postgresql.conf and after reload, pg
logs all query to log

Regards

Pavel

Show quoted text

Kirill

--
_________________________________________________
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone: +41 44 633 33 17
Fax: +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail: kirill.mueller@ivt.baug.ethz.ch

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12

Adrian Klaver

adrian.klaver@aklaver.com

over 14 years ago

In reply to: Kirill Müller (#9)

Re: Enumeration of tables is very slow in largish database

On 01/11/2012 11:45 AM, Kirill Müller wrote:

Adrian,

On 01/11/2012 04:32 PM, Adrian Klaver wrote:

Would seem that they both have the same issue, namely pulling over the
table
names and the meta data is resource intensive.

The problem is that the slow part is the "connect", just at the ***** in
the in-line comment :-)

More digging:
Looks like QGIS use the geometry_columns table first to determine the
geometry aware tables, failing that it walks the tables looking for
geometry columns. Is the geometry_columns table up to date with the
actual tables?

Cheers

Kirill

--
Adrian Klaver
adrian.klaver@gmail.com

#13

Reid Thompson

Reid.Thompson@ateb.com

over 14 years ago

In reply to: Kirill Müller (#10)

Re: Enumeration of tables is very slow in largish database

On Wed, 2012-01-11 at 20:50 +0100, Kirill Müller wrote:

that would
allow tracing the queries and their runtime while they are executed \

http://www.postgresql.org/docs/8.4/static/auto-explain.html

#14

Kirill Müller

kirill.mueller@ivt.baug.ethz.ch

over 14 years ago

In reply to: Adrian Klaver (#12)

Re: Enumeration of tables is very slow in largish database

On 01/11/2012 09:36 PM, Adrian Klaver wrote:

On 01/11/2012 11:45 AM, Kirill Müller wrote:

On 01/11/2012 04:32 PM, Adrian Klaver wrote:

Would seem that they both have the same issue, namely pulling over the
table
names and the meta data is resource intensive.

The problem is that the slow part is the "connect"...

More digging:
Looks like QGIS use the geometry_columns table first to determine the
geometry aware tables, failing that it walks the tables looking for
geometry columns. Is the geometry_columns table up to date with the
actual tables?

Thank you for the tip. In the current version of QGIS, I can select
"only look in the geometry_columns table" as an option. For 40000+
entries in this table, it took just some seconds to enumerate all
tables. This workaround solves the most urgent problems (I'm currently
rebuilding the geometry_columns table).

Kirill

--
_________________________________________________
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone: +41 44 633 33 17
Fax: +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail: kirill.mueller@ivt.baug.ethz.ch

#15

Kirill Müller

kirill.mueller@ivt.baug.ethz.ch

over 14 years ago

In reply to: Scott Marlowe (#8)

Re: Enumeration of tables is very slow in largish database

On 01/11/2012 07:00 PM, Scott Marlowe wrote:

This is a problem I've run into before, but I can't find the previous
post on it. When you run a \d command, if you run top on your server
do you see a single CPU spinning hard on that one command? If so then
it's a pg server side problem, which is what I had on one server with
~40k objects in it.

Off the top of my head I remember something like this helping:

alter function pg_table_is_visible cost 10;

Thanks for the feedback. I found the relevant parts in the qgis source
code and have been able to trace the problem. It's just a sub-optimal
query issued by qgis:

SELECT
pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind
FROM pg_attribute,pg_class,pg_namespace
WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid =
pg_class.oid
AND ( EXISTS (SELECT * FROM pg_type WHERE
pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN
('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN
(SELECT oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE
a.typbasetype=b.oid AND b.typname IN
('geometry','geography','topogeometry'))))
AND has_schema_privilege( pg_namespace.nspname, 'usage' )
AND has_table_privilege( '"' || pg_namespace.nspname || '"."' ||
pg_class.relname || '"', 'select' )
AND NOT EXISTS (SELECT * FROM geometry_columns WHERE
pg_namespace.nspname=f_table_schema AND pg_class.relname=f_table_name)
AND pg_class.relkind IN ('v','r');

When leaving out the last two "AND NOT EXISTS..." parts, the query
finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if
I understand the execution tree correctly, the time is burnt in repeated
sequential scans of the geometry_columns table (line 38).? Rewriting the
"AND NOT EXISTS" part using WITH solves the performance issues here, but
works only from Postgres 8.4. Any idea how to speed up this query for
older versions? (Creating a temporary table or an index should be avoided.)

Kirill

#16

David G. Johnston

david.g.johnston@gmail.com

over 14 years ago

In reply to: Kirill Müller (#15)

Re: Enumeration of tables is very slow in largish database

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Kirill Müller
Sent: Wednesday, January 11, 2012 6:28 PM
To: pgsql-general@postgresql.org
Cc: Scott Marlowe
Subject: Re: [GENERAL] Enumeration of tables is very slow in largish
database

On 01/11/2012 07:00 PM, Scott Marlowe wrote:

This is a problem I've run into before, but I can't find the previous
post on it. When you run a \d command, if you run top on your server
do you see a single CPU spinning hard on that one command? If so then
it's a pg server side problem, which is what I had on one server with
~40k objects in it.

Off the top of my head I remember something like this helping:

alter function pg_table_is_visible cost 10;

Thanks for the feedback. I found the relevant parts in the qgis source code
and have been able to trace the problem. It's just a sub-optimal query
issued by qgis:

SELECT
pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind
FROM pg_attribute,pg_class,pg_namespace
WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid =
pg_class.oid AND ( EXISTS (SELECT * FROM pg_type WHERE
pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN
('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN (SELECT
oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE
a.typbasetype=b.oid AND b.typname IN
('geometry','geography','topogeometry'))))
AND has_schema_privilege( pg_namespace.nspname, 'usage' ) AND
has_table_privilege( '"' || pg_namespace.nspname || '"."' ||
pg_class.relname || '"', 'select' ) AND NOT EXISTS (SELECT * FROM
geometry_columns WHERE pg_namespace.nspname=f_table_schema AND
pg_class.relname=f_table_name) AND pg_class.relkind IN ('v','r');

When leaving out the last two "AND NOT EXISTS..." parts, the query finishes
in no time. I have attached the output of EXPLAIN ANALYZE -- if I understand
the execution tree correctly, the time is burnt in repeated sequential scans
of the geometry_columns table (line 38).? Rewriting the "AND NOT EXISTS"
part using WITH solves the performance issues here, but works only from
Postgres 8.4. Any idea how to speed up this query for older versions?
(Creating a temporary table or an index should be avoided.)

Kirill

----------------------------------------------------------------------------
---------------------

I only see one (1) "AND NOT EXISTS" in the provided query.

Syntax may be a little off but:

... AND (f_table_schema, f_table_name) NOT IN ( SELECT (nspname, relname)
FROM geometry_columns )

Should work since it is no longer a correlated sub-query; whether the size
of geometry_columns makes this better or worse performing is impossible to
tell without testing but it isn't that much different than using a WITH/CTE.

David J.

#17

Kirill Müller

kirill.mueller@ivt.baug.ethz.ch

over 14 years ago

In reply to: David G. Johnston (#16)

Re: Enumeration of tables is very slow in largish database

I am impressed. Execution time dropped to less than one second. Thanks a
lot!

On 01/12/2012 12:43 AM, David Johnston wrote:

I only see one (1) "AND NOT EXISTS" in the provided query.

Sorry, there used to be two "AND NOT EXISTS", but I edited the query
without updating the text.

Syntax may be a little off but:

... AND (f_table_schema, f_table_name) NOT IN ( SELECT (nspname, relname)
FROM geometry_columns )

Just for the record:

...AND (nspname, relname) NOT IN (SELECT f_table_schema, f_table_name
FROM geometry_columns)

Should work since it is no longer a correlated sub-query; whether the size
of geometry_columns makes this better or worse performing is impossible to
tell without testing but it isn't that much different than using a WITH/CTE.

The execution plan looks much nicer (attached). I'd guess that the
altered query might lose a bit if geometry_columns has only very few
entries.? Apparently it gains a lot if the table is populated.

Thanks again!

Kirill

#18

David G. Johnston

david.g.johnston@gmail.com

over 14 years ago

In reply to: Kirill Müller (#17)

Re: Enumeration of tables is very slow in largish database

-----Original Message-----
From: Kirill Müller [mailto:kirill.mueller@ivt.baug.ethz.ch]
Sent: Wednesday, January 11, 2012 7:11 PM
To: David Johnston
Cc: pgsql-general@postgresql.org; 'Scott Marlowe'
Subject: Re: [GENERAL] Enumeration of tables is very slow in largish
database

I am impressed. Execution time dropped to less than one second. Thanks a
lot!

On 01/12/2012 12:43 AM, David Johnston wrote:

I only see one (1) "AND NOT EXISTS" in the provided query.

Sorry, there used to be two "AND NOT EXISTS", but I edited the query without
updating the text.

Syntax may be a little off but:

... AND (f_table_schema, f_table_name) NOT IN ( SELECT (nspname,
relname) FROM geometry_columns )

Just for the record:

...AND (nspname, relname) NOT IN (SELECT f_table_schema, f_table_name FROM
geometry_columns)

Should work since it is no longer a correlated sub-query; whether the
size of geometry_columns makes this better or worse performing is
impossible to tell without testing but it isn't that much different than

using a WITH/CTE.
The execution plan looks much nicer (attached). I'd guess that the altered
query might lose a bit if geometry_columns has only very few entries.?
Apparently it gains a lot if the table is populated.

Thanks again!

Kirill

--------------------------------------------------------------------

Performance for IN should increase as the results from geometry_columns
decrease since the IN target becomes smaller - thus fewer entries to compare
against. EXISTS works better than IN if the IN target is large AS LONG AS
the query that exists is using can use an Index. Since your query was
performing a sequential scan pretty much any size IN target will be better
performing. For small IN targets and index-using EXISTS it likely matters
very little which one you use.

David J.

#19

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Kirill Müller (#15)

Re: Enumeration of tables is very slow in largish database

=?ISO-8859-1?Q?Kirill_M=FCller?= <kirill.mueller@ivt.baug.ethz.ch> writes:

Thanks for the feedback. I found the relevant parts in the qgis source
code and have been able to trace the problem. It's just a sub-optimal
query issued by qgis:

SELECT
pg_class.relname,pg_namespace.nspname,pg_attribute.attname,pg_class.relkind
FROM pg_attribute,pg_class,pg_namespace
WHERE pg_namespace.oid=pg_class.relnamespace AND pg_attribute.attrelid =
pg_class.oid
AND ( EXISTS (SELECT * FROM pg_type WHERE
pg_type.oid=pg_attribute.atttypid AND pg_type.typname IN
('geometry','geography','topogeometry')) OR pg_attribute.atttypid IN
(SELECT oid FROM pg_type a WHERE EXISTS (SELECT * FROM pg_type b WHERE
a.typbasetype=b.oid AND b.typname IN
('geometry','geography','topogeometry'))))
AND has_schema_privilege( pg_namespace.nspname, 'usage' )
AND has_table_privilege( '"' || pg_namespace.nspname || '"."' ||
pg_class.relname || '"', 'select' )
AND NOT EXISTS (SELECT * FROM geometry_columns WHERE
pg_namespace.nspname=f_table_schema AND pg_class.relname=f_table_name)
AND pg_class.relkind IN ('v','r');

When leaving out the last two "AND NOT EXISTS..." parts, the query
finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if
I understand the execution tree correctly, the time is burnt in repeated
sequential scans of the geometry_columns table (line 38).

Yeah. It wouldn't choose that plan if it weren't for the horrid rowcount
misestimate here:

-> Hash Anti Join (cost=30586.95..37075.41 rows=1 width=133) (actual time=945.911..1760.307 rows=17836 loops=1)

This is probably an indication of eqjoinsel_semi doing the wrong thing;
we've whacked that estimator around a few times now, so it's hard to
know whether this represents an already-fixed bug or not. What PG
version are you using exactly?

Rewriting the
"AND NOT EXISTS" part using WITH solves the performance issues here, but
works only from Postgres 8.4. Any idea how to speed up this query for
older versions? (Creating a temporary table or an index should be avoided.)

Maybe use EXCEPT instead of a WHERE condition to get rid of the
already-present entries?

regards, tom lane

#20

Kirill Müller

kirill.mueller@ivt.baug.ethz.ch

over 14 years ago

In reply to: Tom Lane (#19)

Re: Enumeration of tables is very slow in largish database

On 01/12/2012 01:34 AM, Tom Lane wrote:

=?ISO-8859-1?Q?Kirill_M=FCller?=<kirill.mueller@ivt.baug.ethz.ch> writes:

When leaving out the last two "AND NOT EXISTS..." parts, the query
finishes in no time. I have attached the output of EXPLAIN ANALYZE -- if
I understand the execution tree correctly, the time is burnt in repeated
sequential scans of the geometry_columns table (line 38).

Yeah. It wouldn't choose that plan if it weren't for the horrid rowcount
misestimate here:

-> Hash Anti Join (cost=30586.95..37075.41 rows=1 width=133) (actual time=945.911..1760.307 rows=17836 loops=1)

I have VACUUM ANALYZE-d the table just before testing. Seems that this
didn't help here.

This is probably an indication of eqjoinsel_semi doing the wrong thing;
we've whacked that estimator around a few times now, so it's hard to
know whether this represents an already-fixed bug or not. What PG
version are you using exactly?

muelleki@xxx:~$ psql
psql (8.4.8)

Maybe use EXCEPT instead of a WHERE condition to get rid of the
already-present entries?

Thank you for the hint, I was not aware of the rather elegant EXCEPT.
Using WHERE (..., ...) NOT IN (SELECT ..., ... FROM ...) as suggested by
David Johnston shows excellent performance (and better fits the code
that is generating the SQL), but I'll keep the EXCEPT option in mind.

Regards

Kirill

#21

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Kirill Müller (#20)

#22

Kirill Müller

kirill.mueller@ivt.baug.ethz.ch

over 14 years ago

In reply to: Tom Lane (#21)