slow queries over information schema.tables

Started by Pavel Stehuleabout 7 years ago21 messages
#1Pavel Stehule
Pavel Stehule
pavel.stehule@gmail.com

Hi

one customer reported slow queries over information_schema.tables. There is
newer used a index over relname probably due casting to
information_schema.sql_identifier.

Slow query

select * from information_schema.tables where table_name = 'pg_class';

Usually, there is hard to fix it on application level due corporate
environment.

Regards

Pavel

#2Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#1)
3 attachment(s)
Re: slow queries over information schema.tables

Pavel Stehule <pavel.stehule@gmail.com> writes:

Slow query
select * from information_schema.tables where table_name = 'pg_class';

Yeah. This has been complained of many times before.

The core of the problem, I think, is that we're unable to convert the
condition on table_name into an indexscan on pg_class.relname, because
the view has cast pg_class.relname to the sql_identifier domain.

There are two different issues in that. One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index. Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

The latter half of the problem can be exhibited in simplified form as

regression=# explain select * from pg_class where relname = 'pg_class'::text;
QUERY PLAN
------------------------------------------------------------
Seq Scan on pg_class (cost=0.00..175.41 rows=7 width=791)
Filter: ((relname)::text = 'pg_class'::text)
(2 rows)

which is much stupider than what you get with a name comparison:

regression=# explain select * from pg_class where relname = 'pg_class';
QUERY PLAN
---------------------------------------------------------------------------------------------
Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.29 rows=1 width=791)
Index Cond: (relname = 'pg_class'::name)
(2 rows)

(We've seen complaints about this form of problem, too.) Since there's
no name-vs-text equality operator, we end up applying a cast to text so
that the texteq operator can be used, and now we're out of luck for
matching that to the index. To add insult to injury, we also fail to
match the cast expression to the available statistics, so that we don't
get quite the right selectivity estimate.

The most straightforward way to fix that would be to add some cross-type
comparison operators to the name_ops operator family. While we only
directly need 'name = text' to make this case work, the opr_sanity
regression test will complain if we don't supply a fairly complete set of
operators, and I think not without reason. So I experimented with doing
that, as in the very-much-WIP 0001 patch below. A name index can only
cope with strcmp-style comparison semantics, not strcoll-style, so while
it's okay to call the equality operator = I felt we'd better call the
inequality operators ~<~ and so on. While the patch as presented fixes
the case shown above, it's not all good: the problem with having a new
'text = name' operator is that it will also capture cases where you would
like to have a text index working with a comparison value of type "name".
If 'text = name' isn't part of the text_ops opclass then that doesn't
work. I think that the reason for the join.sql regression test failure
shown in patch 0001 is likewise that since 'text = name' isn't part of the
text_ops opclass, the join elimination stuff is unable to prove that it
can eliminate a join to a unique text column. This could probably be
fixed by creating yet another dozen cross-type operators that implement
text vs name comparisons using strcoll semantics (hence, using the normal
'<' operator names), and adding that set to the text_ops opfamily.
I didn't do that here (yet), because it seems pretty tedious.

Also worth noting is that the name_ops and text_pattern_ops opfamilies
are implementing identical semantics. I wonder whether we could merge
them.

While most of the other regression test output changes shown in the 0001
patch are unsurprising, one that is surprising is that a merge join on
two text columns has started sorting USING ~<~ rather than the normal
text ordering. The reason for this seems to be that the 'text = text'
operator is now a member of name_ops as well as text_ops, and
select_outer_pathkeys_for_merge arbitrarily uses the lowest-number
opfamily OID if it has a choice. We could avoid that by renumbering
name_ops to have an OID higher than text_ops, though that's certainly
klugy. Or we might just decide that we like that plan better anyway,
since strcmp-based comparison is probably faster than strcoll-based
comparison. (It'd be kind of nice if the choice were based on something
more principled than OID order, though I don't especially want to do
anything about that right now.)

Now, 0001 by itself doesn't do much for Pavel's complaint, because
the view is still forcing a cast to sql_identifier to be inserted
atop pg_class.relname, even though we no longer need any cast for
simple name-vs-text cases.

0002 is a really preliminary POC for eliminating CoerceToDomain nodes
at plan time if the domain has no constraints to check. While this is
enough to check the effects on plan selection, it's certainly not
committable as-is, because the resulting plan is broken if someone then
adds a constraint to the domain. (There is a regression test failure,
not shown in 0002, for a case that tests exactly that scenario.)

What we would have to do to make 0002 committable is to add sinval
signaling to invalidate any cached plan that's dependent on an
assumption of no constraints on a domain. This is something that
we probably want anyway, since it would be necessary if we ever want
to compile domain constraint expressions as part of the plan tree
rather than leaving them to run-time.

While the sinval additions per se would be fairly straightforward,
I wonder whether anyone is likely to complain about the race conditions
that will ensue from not taking any locks associated with the domain
type; i.e. it's possible that a query would execute with slightly
out-of-date assumptions about what constraints a domain has. I suspect
that we have race conditions like that already, but they might be worse
if we inspect the domain during planning rather than at executor
startup. Is anyone worried enough about that to want to add locking
overhead to domain usage? (I'm not.)

0001+0002 still don't get the job done: now we strip off the
CoerceToDomain successfully, but we still have "relname::varchar"
being compared to the comparison value, so we still can't match
that to the index. 0003 is a somewhat klugy solution to that part,
allowing indxpath.c to construct a name equality test from a texteq
restriction condition. (This is semantically OK because equality
is equality in both name and text domains. We could not convert
inequalities, at least not as freely; maybe it could be done in
C locale, but I've not done anything about that here.)

With all three patches in place, we get

regression=# explain select * from pg_class where relname::information_schema.sql_identifier = 'foo'::text;
QUERY PLAN
---------------------------------------------------------------------------------------------
Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.30 rows=7 width=781)
Index Cond: (relname = 'foo'::text)
Filter: ((relname)::text = 'foo'::text)
(3 rows)

so we've fixed the lack of indexscan but we're still a bit off about the
statistics. I don't have any immediate proposal about how to fix the
latter, but anyway this is enough to make Pavel's case a lot better.

0003 essentially converts "namecol::text texteq textvalue" into
"namecol nameeqtext textvalue", relying on the new equality
operator introduced by 0001. Another way we could approach
this is to dispense with 0001 altogether, and use a variant of
0003 that converts such a clause to "namecol nameeq textvalue::name".
At first glance it seems like that wouldn't work quite right:
the cast to name will silently truncate overlength text values,
which could allow such a value to be found equal to a name that
it shouldn't be equal to. However, it really would be OK because
the context is that we're assuming the converted clause is lossy
(i.e. might have false matches), and so we'll recheck the original
clause as a filter condition, which will get rid of such false
matches. The recheck behavior is just slightly-annoying overhead
in 0003 as presented, but it'd be essential in the other version.

I'm not entirely sure whether to go with 0001+0003 or this alternative
approach. The alternative is surely far less invasive; 0001 might have
more side effects I haven't thought of. However, my gut feeling is
that 0001 would be a good thing on balance because it'd give the system
considerably more information about name-vs-text comparisons than it has
now. In principle that should lead to better plans for other cases
besides the narrow one we're specifically thinking of.

Comments, complaints, other ideas?

regards, tom lane

Attachments:

0001-extend-name-opfamily-0.1.patchtext/x-diff; charset=us-ascii; name=0001-extend-name-opfamily-0.1.patch
0002-hack-coercetodomain-0.1.patchtext/x-diff; charset=us-ascii; name=0002-hack-coercetodomain-0.1.patch
0003-hack-name-text-comparison-0.1.patchtext/x-diff; charset=us-ascii; name=0003-hack-name-text-comparison-0.1.patch
#3Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: slow queries over information schema.tables

Hi,

On 2018-12-05 12:24:54 -0500, Tom Lane wrote:

The core of the problem, I think, is that we're unable to convert the
condition on table_name into an indexscan on pg_class.relname, because
the view has cast pg_class.relname to the sql_identifier domain.

There are two different issues in that. One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index. Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

Couldn't we make expression simplification replace CoerceToDomain with a
RelabelType if the constraint is simple enough? That's not entirely
trivial because we'd have to look into the typecache to get the
constraints, but that doesn't sound too bad.

The most straightforward way to fix that would be to add some cross-type
comparison operators to the name_ops operator family.

That seems reasonable.

While we only
directly need 'name = text' to make this case work, the opr_sanity
regression test will complain if we don't supply a fairly complete set of
operators, and I think not without reason. So I experimented with doing
that, as in the very-much-WIP 0001 patch below. A name index can only
cope with strcmp-style comparison semantics, not strcoll-style, so while
it's okay to call the equality operator = I felt we'd better call the
inequality operators ~<~ and so on. While the patch as presented fixes
the case shown above, it's not all good: the problem with having a new
'text = name' operator is that it will also capture cases where you would
like to have a text index working with a comparison value of type "name".
If 'text = name' isn't part of the text_ops opclass then that doesn't
work. I think that the reason for the join.sql regression test failure
shown in patch 0001 is likewise that since 'text = name' isn't part of the
text_ops opclass, the join elimination stuff is unable to prove that it
can eliminate a join to a unique text column. This could probably be
fixed by creating yet another dozen cross-type operators that implement
text vs name comparisons using strcoll semantics (hence, using the normal
'<' operator names), and adding that set to the text_ops opfamily.
I didn't do that here (yet), because it seems pretty tedious.

Is there a way we could make that less laborious by allowing a bit more
casting?

0002 is a really preliminary POC for eliminating CoerceToDomain nodes
at plan time if the domain has no constraints to check. While this is
enough to check the effects on plan selection, it's certainly not
committable as-is, because the resulting plan is broken if someone then
adds a constraint to the domain. (There is a regression test failure,
not shown in 0002, for a case that tests exactly that scenario.)

Hah.

What we would have to do to make 0002 committable is to add sinval
signaling to invalidate any cached plan that's dependent on an
assumption of no constraints on a domain. This is something that
we probably want anyway, since it would be necessary if we ever want
to compile domain constraint expressions as part of the plan tree
rather than leaving them to run-time.

Yea, that seems good. I don't like the fact that expression
initialization does the lookups for constraints right now.

While the sinval additions per se would be fairly straightforward,
I wonder whether anyone is likely to complain about the race conditions
that will ensue from not taking any locks associated with the domain
type; i.e. it's possible that a query would execute with slightly
out-of-date assumptions about what constraints a domain has. I suspect
that we have race conditions like that already, but they might be worse
if we inspect the domain during planning rather than at executor
startup. Is anyone worried enough about that to want to add locking
overhead to domain usage? (I'm not.)

I'm not either.

0001+0002 still don't get the job done: now we strip off the
CoerceToDomain successfully, but we still have "relname::varchar"
being compared to the comparison value, so we still can't match
that to the index. 0003 is a somewhat klugy solution to that part,
allowing indxpath.c to construct a name equality test from a texteq
restriction condition. (This is semantically OK because equality
is equality in both name and text domains. We could not convert
inequalities, at least not as freely; maybe it could be done in
C locale, but I've not done anything about that here.)

With all three patches in place, we get

regression=# explain select * from pg_class where relname::information_schema.sql_identifier = 'foo'::text;
QUERY PLAN
---------------------------------------------------------------------------------------------
Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.30 rows=7 width=781)
Index Cond: (relname = 'foo'::text)
Filter: ((relname)::text = 'foo'::text)
(3 rows)

so we've fixed the lack of indexscan but we're still a bit off about the
statistics. I don't have any immediate proposal about how to fix the
latter, but anyway this is enough to make Pavel's case a lot better.

0003 essentially converts "namecol::text texteq textvalue" into
"namecol nameeqtext textvalue", relying on the new equality
operator introduced by 0001.

Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
usable outside of one odd builtin type. I was wondering for a bit
whether we could have logic to move the cast to the other side of an
operator, but I don't see how we could make that generally safe.

Greetings,

Andres Freund

#4Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: slow queries over information schema.tables

Andres Freund <andres@anarazel.de> writes:

On 2018-12-05 12:24:54 -0500, Tom Lane wrote:

There are two different issues in that. One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index. Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

Couldn't we make expression simplification replace CoerceToDomain with a
RelabelType if the constraint is simple enough? That's not entirely
trivial because we'd have to look into the typecache to get the
constraints, but that doesn't sound too bad.

Not following what you have in mind here? My 0002 throws away the
CoerceToDomain if there are *no* constraints, but I can't see any
situation in which we'd likely be able to ignore a constraint,
simple or not.

0003 essentially converts "namecol::text texteq textvalue" into
"namecol nameeqtext textvalue", relying on the new equality
operator introduced by 0001.

Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
usable outside of one odd builtin type. I was wondering for a bit
whether we could have logic to move the cast to the other side of an
operator, but I don't see how we could make that generally safe.

Yeah. It seems like it could be a special case of a more general
expression transform facility, but we have no such facility now.

On the other hand, all of match_special_index_operator is an ugly
single-purpose kluge already, so I'm not feeling that awful about
throwing another special case into it. Someday it would be nice
to replace that code with something more general and extensible,
but today is not that day as far as I'm concerned.

regards, tom lane

#5Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: slow queries over information schema.tables

Hi,

On 2018-12-05 13:22:23 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-12-05 12:24:54 -0500, Tom Lane wrote:

There are two different issues in that. One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index. Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

Couldn't we make expression simplification replace CoerceToDomain with a
RelabelType if the constraint is simple enough? That's not entirely
trivial because we'd have to look into the typecache to get the
constraints, but that doesn't sound too bad.

Not following what you have in mind here? My 0002 throws away the
CoerceToDomain if there are *no* constraints, but I can't see any
situation in which we'd likely be able to ignore a constraint,
simple or not.

Yea, simple probably means nonexistant for now. We could e.g. optimize
some NOT NULL checks away, but it's probably not worth it.

0003 essentially converts "namecol::text texteq textvalue" into
"namecol nameeqtext textvalue", relying on the new equality
operator introduced by 0001.

Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
usable outside of one odd builtin type. I was wondering for a bit
whether we could have logic to move the cast to the other side of an
operator, but I don't see how we could make that generally safe.

Yeah. It seems like it could be a special case of a more general
expression transform facility, but we have no such facility now.

On the other hand, all of match_special_index_operator is an ugly
single-purpose kluge already, so I'm not feeling that awful about
throwing another special case into it. Someday it would be nice
to replace that code with something more general and extensible,
but today is not that day as far as I'm concerned.

Fair enough.

Greetings,

Andres Freund

#6Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#5)
Re: slow queries over information schema.tables

Andres Freund <andres@anarazel.de> writes:

On 2018-12-05 13:22:23 -0500, Tom Lane wrote:

Not following what you have in mind here? My 0002 throws away the
CoerceToDomain if there are *no* constraints, but I can't see any
situation in which we'd likely be able to ignore a constraint,
simple or not.

Yea, simple probably means nonexistant for now. We could e.g. optimize
some NOT NULL checks away, but it's probably not worth it.

Ah, yes, that is a case where we might have enough knowledge to prove
the test redundant --- but considering that we explicitly discourage
domain NOT NULL as bad style and not fully supported, I can't get
excited about it. I suppose in some cases we might be able to use
predtest.c to prove domain CHECK constraints redundant, but I bet that
it's not worth the trouble.

The bigger picture here is that people seem to like to use domains
as simple type aliases, which will never have any constraints, but
we're very dumb about that today. So the patch as presented seems
like it has lots of potential applicability, as long as we fix the
invalidation aspect.

regards, tom lane

#7Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#6)
Re: slow queries over information schema.tables

On 2018-12-05 13:41:32 -0500, Tom Lane wrote:

The bigger picture here is that people seem to like to use domains
as simple type aliases, which will never have any constraints, but
we're very dumb about that today. So the patch as presented seems
like it has lots of potential applicability, as long as we fix the
invalidation aspect.

Entirely agreed.

#8Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: slow queries over information schema.tables

On Wed, Dec 5, 2018 at 1:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ah, yes, that is a case where we might have enough knowledge to prove
the test redundant --- but considering that we explicitly discourage
domain NOT NULL as bad style and not fully supported, I can't get
excited about it. I suppose in some cases we might be able to use
predtest.c to prove domain CHECK constraints redundant, but I bet that
it's not worth the trouble.

The bigger picture here is that people seem to like to use domains
as simple type aliases, which will never have any constraints, but
we're very dumb about that today. So the patch as presented seems
like it has lots of potential applicability, as long as we fix the
invalidation aspect.

I'm not thrilled about depending on sinval without locking,
particularly given that my proposal to make sure we
AcceptInvalidationMessages() at least once per query was shouted down.
That means that in corner cases, you could execute many queries
without flushing the old cache entries. However, I do agree that
locking every type, function, operator, etc. involved in the query is
impractical.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#8)
Re: slow queries over information schema.tables

Robert Haas <robertmhaas@gmail.com> writes:

I'm not thrilled about depending on sinval without locking,
particularly given that my proposal to make sure we
AcceptInvalidationMessages() at least once per query was shouted down.

It's fairly hard to imagine practical cases where we'd not call
AcceptInvalidationMessages at least once per query, so I'm not
very sure what you're on about.

regards, tom lane

#10Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#9)
Re: slow queries over information schema.tables

On Thu, Dec 6, 2018 at 11:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm not thrilled about depending on sinval without locking,
particularly given that my proposal to make sure we
AcceptInvalidationMessages() at least once per query was shouted down.

It's fairly hard to imagine practical cases where we'd not call
AcceptInvalidationMessages at least once per query, so I'm not
very sure what you're on about.

Unless I'm confused, it happens any time you run a query that only
touches tables using lockmodes previously acquired by the current
transaction. Like:

BEGIN;
some query;
the same query again;

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: slow queries over information schema.tables

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 6, 2018 at 11:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's fairly hard to imagine practical cases where we'd not call
AcceptInvalidationMessages at least once per query, so I'm not
very sure what you're on about.

Unless I'm confused, it happens any time you run a query that only
touches tables using lockmodes previously acquired by the current
transaction. Like:

BEGIN;
some query;
the same query again;

In my testing, that still hits AIM() during parserOpenTable().

[ further experimentation... ] It looks like if you prepare
a query and then just execute it repeatedly in one transaction,
you'd not reach AIM (as long as you were getting generic plans).
Possibly that's a gap worth closing.

regards, tom lane

#12Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: slow queries over information schema.tables

On Thu, Dec 6, 2018 at 12:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In my testing, that still hits AIM() during parserOpenTable().

Oh, I see. relation_openrv_extended() calls it.

[ further experimentation... ] It looks like if you prepare
a query and then just execute it repeatedly in one transaction,
you'd not reach AIM (as long as you were getting generic plans).
Possibly that's a gap worth closing.

If we called it at the start of every query, couldn't we dispense with
the call in relation_openrv_extended()? On net, that would actually
mean fewer calls to AcceptInvalidationMessages(), assuming you
sometimes run queries that touch multiple relations. And the behavior
would be more predictable, too, because you'd (hopefully) have no
cases where a command failed to see the results of DDL that committed
before the command was issued.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: slow queries over information schema.tables

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 6, 2018 at 12:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ further experimentation... ] It looks like if you prepare
a query and then just execute it repeatedly in one transaction,
you'd not reach AIM (as long as you were getting generic plans).
Possibly that's a gap worth closing.

If we called it at the start of every query, couldn't we dispense with
the call in relation_openrv_extended()?

No. You need to do AIM *after* obtaining the lock, else you still
have the race condition that you can execute a query on a table
without being aware of recent DDL on it.

What we could possibly do to close the gap cited above is to have
plancache.c's CheckCachedPlan force an AIM call if it notices that
the plan it wants to re-use has a non-empty PlanInvalItems list.
If it does not, then there is nothing that AIM would cause invalidation
for anyway. This still leaves us with a query-duration-sized race
condition window for DDL on functions and domains, but not any larger
than that.

regards, tom lane

#14Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: slow queries over information schema.tables

On Thu, Dec 6, 2018 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If we called it at the start of every query, couldn't we dispense with
the call in relation_openrv_extended()?

No. You need to do AIM *after* obtaining the lock, else you still
have the race condition that you can execute a query on a table
without being aware of recent DDL on it.

Huh? The call in relation_openrv_extended() happens before acquiring the lock.

What we could possibly do to close the gap cited above is to have
plancache.c's CheckCachedPlan force an AIM call if it notices that
the plan it wants to re-use has a non-empty PlanInvalItems list.
If it does not, then there is nothing that AIM would cause invalidation
for anyway. This still leaves us with a query-duration-sized race
condition window for DDL on functions and domains, but not any larger
than that.

That would be a nice place to get. Not perfect, but better than now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
Re: slow queries over information schema.tables

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 6, 2018 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

No. You need to do AIM *after* obtaining the lock, else you still
have the race condition that you can execute a query on a table
without being aware of recent DDL on it.

Huh? The call in relation_openrv_extended() happens before acquiring the lock.

Oh, I was looking at the wrong AIM call :-(. You're talking about this
one:

/*
* Check for shared-cache-inval messages before trying to open the
* relation. This is needed even if we already hold a lock on the
* relation, because GRANT/REVOKE are executed without taking any lock on
* the target relation, and we want to be sure we see current ACL
* information. We can skip this if asked for NoLock, on the assumption
* that such a call is not the first one in the current command, and so we
* should be reasonably up-to-date already. (XXX this all could stand to
* be redesigned, but for the moment we'll keep doing this like it's been
* done historically.)
*/
if (lockmode != NoLock)
AcceptInvalidationMessages();

which is indeed about as bletcherous as it could possibly be.

The ideal thing would be for GRANT/REVOKE to act more like other DDL,
but I'm not sure how we get to that point: REVOKE SELECT, at the very
least, would have to take AccessExclusiveLock so that it'd block SELECT.
People doing things like GRANT/REVOKE ON ALL TABLES would doubtless
complain about the greatly increased risk of deadlock from taking a
pile of AELs. And that still would only fix the problem for table
privileges, not privileges on other object types that we have no
consistent locking scheme for.

I can see the attraction of replacing these AIM calls (and, probably,
the one in AtStart_Cache) with a single call that occurs somewhere
during statement startup. Not sure exactly where that should be;
but there is certainly not anything principled about doing it in
relation_open_xxx().

regards, tom lane

#16Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: slow queries over information schema.tables

I wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

Slow query
select * from information_schema.tables where table_name = 'pg_class';

Yeah. This has been complained of many times before.

The core of the problem, I think, is that we're unable to convert the
condition on table_name into an indexscan on pg_class.relname, because
the view has cast pg_class.relname to the sql_identifier domain.

There are two different issues in that. One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index. Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

After my last few commits, the only issue that's left here is the
cast-to-varchar implied by casting to sql_identifier. Upthread
I showed a possible planner hack to get rid of that, and we could
still solve it that way so far as allowing indexscans on catalogs
is concerned. However, I wonder what people would think of a
more aggressive approach, viz:

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 0fbcfa8..3891e3b 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -216,7 +216,7 @@ CREATE DOMAIN character_data AS character varying COLLATE "C";
  * SQL_IDENTIFIER domain
  */
-CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
+CREATE DOMAIN sql_identifier AS name;

I've not checked to verify that sql_identifier is used for all and only
those view columns that expose "name" catalog columns. If the SQL
committee was sloppy about that, this idea might not work. But assuming
that the length restriction is valid for the columns that have this
type, would this be an OK idea? It does seem to fix the poor-plan-quality
problem at a stroke, with no weird planner hacks.

What I find in the SQL spec is

5.5 SQL_IDENTIFIER domain

Function

Define a domain that contains all valid <identifier body>s and
<delimited identifier body>s.

Definition

CREATE DOMAIN SQL_IDENTIFIER AS
CHARACTER VARYING (L)
CHARACTER SET SQL_IDENTIFIER;

GRANT USAGE ON DOMAIN SQL_IDENTIFIER
TO PUBLIC WITH GRANT OPTION;

Description

1) This domain specifies all variable-length character values that
conform to the rules for formation and representation of an SQL
<identifier body> or an SQL <delimited identifier body>.

NOTE 4 - There is no way in SQL to specify a <domain
constraint> that would be true for the body of any valid SQL
<regular identifier> or <delimited identifier> and false for all
other character string values.

2) L is the implementation-defined maximum length of <identifier
body> and <delimited identifier body>.

So we'd be violating the part of the spec that says that the domain's
base type is varchar, but considering all the other requirements here
that we're blithely ignoring, maybe that's not such a sin. With the
recent collation changes, type name is hard to functionally distinguish
from a domain over varchar anyway. Furthermore, since name's length limit
corresponds to the "implementation-defined maximum length" part of the
spec, you could argue that in some ways this definition is closer to the
spec than what we've got now.

Thoughts?

regards, tom lane

#17Pavel Stehule
Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#16)
Re: slow queries over information schema.tables

čt 20. 12. 2018 v 0:14 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

I wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

Slow query
select * from information_schema.tables where table_name = 'pg_class';

Yeah. This has been complained of many times before.

The core of the problem, I think, is that we're unable to convert the
condition on table_name into an indexscan on pg_class.relname, because
the view has cast pg_class.relname to the sql_identifier domain.

There are two different issues in that. One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index. Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

After my last few commits, the only issue that's left here is the
cast-to-varchar implied by casting to sql_identifier. Upthread
I showed a possible planner hack to get rid of that, and we could
still solve it that way so far as allowing indexscans on catalogs
is concerned. However, I wonder what people would think of a
more aggressive approach, viz:

diff --git a/src/backend/catalog/information_schema.sql
b/src/backend/catalog/information_schema.sql
index 0fbcfa8..3891e3b 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -216,7 +216,7 @@ CREATE DOMAIN character_data AS character varying
COLLATE "C";
* SQL_IDENTIFIER domain
*/
-CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
+CREATE DOMAIN sql_identifier AS name;

I've not checked to verify that sql_identifier is used for all and only
those view columns that expose "name" catalog columns. If the SQL
committee was sloppy about that, this idea might not work. But assuming
that the length restriction is valid for the columns that have this
type, would this be an OK idea? It does seem to fix the poor-plan-quality
problem at a stroke, with no weird planner hacks.

What I find in the SQL spec is

5.5 SQL_IDENTIFIER domain

Function

Define a domain that contains all valid <identifier body>s and
<delimited identifier body>s.

Definition

CREATE DOMAIN SQL_IDENTIFIER AS
CHARACTER VARYING (L)
CHARACTER SET SQL_IDENTIFIER;

GRANT USAGE ON DOMAIN SQL_IDENTIFIER
TO PUBLIC WITH GRANT OPTION;

Description

1) This domain specifies all variable-length character values that
conform to the rules for formation and representation of an SQL
<identifier body> or an SQL <delimited identifier body>.

NOTE 4 - There is no way in SQL to specify a <domain
constraint> that would be true for the body of any valid SQL
<regular identifier> or <delimited identifier> and false for
all
other character string values.

2) L is the implementation-defined maximum length of <identifier
body> and <delimited identifier body>.

So we'd be violating the part of the spec that says that the domain's
base type is varchar, but considering all the other requirements here
that we're blithely ignoring, maybe that's not such a sin. With the
recent collation changes, type name is hard to functionally distinguish
from a domain over varchar anyway. Furthermore, since name's length limit
corresponds to the "implementation-defined maximum length" part of the
spec, you could argue that in some ways this definition is closer to the
spec than what we've got now.

Thoughts?

The very common will be compare with text type - some like

SELECT * FROM information_schema.tables WHERE table_name =
lower('somename');

Show quoted text

regards, tom lane

#18Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#17)
Re: slow queries over information schema.tables

Pavel Stehule <pavel.stehule@gmail.com> writes:

čt 20. 12. 2018 v 0:14 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

After my last few commits, the only issue that's left here is the
cast-to-varchar implied by casting to sql_identifier.  Upthread
I showed a possible planner hack to get rid of that, and we could
still solve it that way so far as allowing indexscans on catalogs
is concerned.  However, I wonder what people would think of a
more aggressive approach, viz:
-CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
+CREATE DOMAIN sql_identifier AS name;

The very common will be compare with text type - some like
SELECT * FROM information_schema.tables WHERE table_name =
lower('somename');

Yeah, that's not really an issue. After applying the above one-liner
to HEAD, I get plans like this:

regression=# explain SELECT * FROM information_schema.tables WHERE table_name =
lower('somename');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=8.76..18.60 rows=1 width=608)
-> Hash Join (cost=8.34..10.07 rows=1 width=141)
Hash Cond: (nc.oid = c.relnamespace)
-> Seq Scan on pg_namespace nc (cost=0.00..1.62 rows=33 width=68)
Filter: (NOT pg_is_other_temp_schema(oid))
-> Hash (cost=8.33..8.33 rows=1 width=77)
-> Index Scan using pg_class_relname_nsp_index on pg_class c (cost=0.28..8.33 rows=1 width=77)
Index Cond: ((relname)::name = 'somename'::text)
Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
-> Nested Loop (cost=0.42..8.46 rows=1 width=132)
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.28..8.29 rows=1 width=72)
Index Cond: (c.reloftype = oid)
-> Index Scan using pg_namespace_oid_index on pg_namespace nt (cost=0.14..0.16 rows=1 width=68)
Index Cond: (oid = t.typnamespace)
(14 rows)

You could surely argue about whether this is too complicated, but it's not
the planner's fault that we've got so many conditions here ...

regards, tom lane

#19Pavel Stehule
Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#18)
Re: slow queries over information schema.tables

čt 20. 12. 2018 v 5:29 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

čt 20. 12. 2018 v 0:14 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

After my last few commits, the only issue that's left here is the
cast-to-varchar implied by casting to sql_identifier.  Upthread
I showed a possible planner hack to get rid of that, and we could
still solve it that way so far as allowing indexscans on catalogs
is concerned.  However, I wonder what people would think of a
more aggressive approach, viz:
-CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
+CREATE DOMAIN sql_identifier AS name;

The very common will be compare with text type - some like
SELECT * FROM information_schema.tables WHERE table_name =
lower('somename');

Yeah, that's not really an issue. After applying the above one-liner
to HEAD, I get plans like this:

regression=# explain SELECT * FROM information_schema.tables WHERE
table_name =
lower('somename');

QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=8.76..18.60 rows=1 width=608)
-> Hash Join (cost=8.34..10.07 rows=1 width=141)
Hash Cond: (nc.oid = c.relnamespace)
-> Seq Scan on pg_namespace nc (cost=0.00..1.62 rows=33
width=68)
Filter: (NOT pg_is_other_temp_schema(oid))
-> Hash (cost=8.33..8.33 rows=1 width=77)
-> Index Scan using pg_class_relname_nsp_index on pg_class
c (cost=0.28..8.33 rows=1 width=77)
Index Cond: ((relname)::name = 'somename'::text)
Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND
(pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT,
INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
-> Nested Loop (cost=0.42..8.46 rows=1 width=132)
-> Index Scan using pg_type_oid_index on pg_type t
(cost=0.28..8.29 rows=1 width=72)
Index Cond: (c.reloftype = oid)
-> Index Scan using pg_namespace_oid_index on pg_namespace nt
(cost=0.14..0.16 rows=1 width=68)
Index Cond: (oid = t.typnamespace)
(14 rows)

You could surely argue about whether this is too complicated, but it's not
the planner's fault that we've got so many conditions here ...

this plan looks great

Pavel

Show quoted text

regards, tom lane

#20Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#16)
Re: slow queries over information schema.tables

I wrote:

... However, I wonder what people would think of a
more aggressive approach, viz:
-CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
+CREATE DOMAIN sql_identifier AS name;

I've not checked to verify that sql_identifier is used for all and only
those view columns that expose "name" catalog columns. If the SQL
committee was sloppy about that, this idea might not work.

I poked into this by instrumenting the parser to see what type conversions
it inserts into the information_schema views. It appears that the vast
majority of conversions to sql_identifier are indeed on "name" columns,
but we have some occurrences of cases like this:

CAST(a.attnum AS sql_identifier) AS dtd_identifier,

and some like this:

CAST(p.proname || '_' || CAST(p.oid AS text) AS sql_identifier) AS specific_name

It doesn't look to me like converting to name rather than varchar would
have any real performance consequence in either case: certainly we're not
going to be able to propagate WHERE conditions on these view columns back
to any existing catalog index, regardless of the cast. However, the
second case offers something else to worry about: what if the
concatenation yields a string longer than NAMEDATALEN? As the code
stands, the view will simply return a string that's too long to be a name,
which arguably is a violation of SQL spec. If we change sql_identifier
to be "name", the cast will silently truncate, which also arguably is a
violation of SQL spec, because I think specific_name is supposed to be
unique. (The point of concatenating the function OID is to make it so.)

Perhaps we could fix this by truncating the p.proname part to ensure
that the concatenation result fits in NAMEDATALEN. I'm not sure about
a good way to get a correct value of NAMEDATALEN into the
information_schema script, though. Worse, I don't think we expose any
convenient way to truncate a string based on byte length rather than
character length (substr() does the latter). So I think that a reasonable
way to tackle this might be to provide a C function along the lines of

nameconcatoid(name, oid) returns name

which contracts to produce "$1 || '_' || $2" while truncating $1 only as
much as needed to make the result fit in NAMEDATALEN.

regards, tom lane

#21Greg Stark
Greg Stark
stark@mit.edu
In reply to: Tom Lane (#20)
Re: slow queries over information schema.tables

Just brainstorming here. Another crazy idea would be to get rid of
"name" data type, at least from the user-visible planner point of
view. It would probably have to be stored as a fixed length data type
like today but with a one-byte length header. That would make it
possible for the planner to use as if it was just another varchar.