Cached plans and statement generalization

Started by Konstantin Knizhnikabout 9 years ago107 messageshackers

k.knizhnik@postgrespro.ru

about 9 years ago

Hi hackers,

There were a lot of discussions about query plan caching in hackers
mailing list, but I failed to find some clear answer for my question and
the current consensus on this question in Postgres community. As far as
I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve
now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded
value for partitioning key.
Only in this case optimizer will be able to construct efficient query
plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman
+ postgres_fdw shows 3 times degrade of performance in case of using
prepared statements.
But without prepared statements substantial amount of time is spent in
query compilation and planning. I was be able to speed up benchmark more
than two time by
sending prepared queries directly to the remote nodes.

So what I am thinking now is implicit query caching. If the same query
with different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters. I am not considering now shared query cache: is seems to be
much harder to implement. But local caching of generalized queries seems
to be not so difficult to implement and requires not so much changes in
Postgres code. And it can be useful not only for sharding, but for many
other cases where prepared statements can not be used.

I wonder if such option was already considered and if it was for some
reasons rejected: can you point me at this reasons?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

aekorotkov@gmail.com

about 9 years ago

In reply to: Konstantin Knizhnik (#1)

Re: Cached plans and statement generalization

Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> wrote:

There were a lot of discussions about query plan caching in hackers
mailing list, but I failed to find some clear answer for my question and
the current consensus on this question in Postgres community. As far as I
understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve now)
is partitioning.
Efficient execution of query to partitioned table requires hardcoded value
for partitioning key.
Only in this case optimizer will be able to construct efficient query plan
which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman +
postgres_fdw shows 3 times degrade of performance in case of using prepared
statements.
But without prepared statements substantial amount of time is spent in
query compilation and planning. I was be able to speed up benchmark more
than two time by
sending prepared queries directly to the remote nodes.

I don't think it's correct to ask PostgreSQL hackers about problem which
arises with pg_pathman while pg_pathman is an extension supported by
Postgres Pro.
Since we have declarative partitioning committed to 10, I think that
community should address this issue in the context of declarative
partitioning.
However, it's unlikely we can spot this issue with declarative partitioning
because it still uses very inefficient constraint exclusion mechanism.
Thus, issues you are writing about would become visible on declarative
partitioning only when constraint exclusion would be replaced with
something more efficient.

Long story short, could you reproduce this issue without pg_pathman?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Alexander Korotkov (#2)

Re: Cached plans and statement generalization

On 24.04.2017 13:24, Alexander Korotkov wrote:

Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:

There were a lot of discussions about query plan caching in
hackers mailing list, but I failed to find some clear answer for
my question and the current consensus on this question in Postgres
community. As far as I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to
solve now) is partitioning.
Efficient execution of query to partitioned table requires
hardcoded value for partitioning key.
Only in this case optimizer will be able to construct efficient
query plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on
pg_pathman + postgres_fdw shows 3 times degrade of performance in
case of using prepared statements.
But without prepared statements substantial amount of time is
spent in query compilation and planning. I was be able to speed up
benchmark more than two time by
sending prepared queries directly to the remote nodes.

I don't think it's correct to ask PostgreSQL hackers about problem
which arises with pg_pathman while pg_pathman is an extension
supported by Postgres Pro.
Since we have declarative partitioning committed to 10, I think that
community should address this issue in the context of declarative
partitioning.
However, it's unlikely we can spot this issue with declarative
partitioning because it still uses very inefficient constraint
exclusion mechanism. Thus, issues you are writing about would become
visible on declarative partitioning only when constraint exclusion
would be replaced with something more efficient.

Long story short, could you reproduce this issue without pg_pathman?

Sorry, I have mentioned pg_pathman just as example.
The same problems takes place with partitioning based on standard
Postgres inheritance mechanism (when I manually create derived tables
and specify constraints for them).
I didn't test yet declarative partitioning committed to 10, but I expect
the that it will also suffer from this problem (because is based on
inheritance).
But as I wrote, I think that the problem with plan caching is wider and
is not bounded just to partitioning.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
<http://www.postgrespro.com/>
The Russian Postgres Company

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Konstantin Knizhnik (#1)

Re: Cached plans and statement generalization

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:

So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
nice if that happens implicitly.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Andres Freund (#4)

Re: Cached plans and statement generalization

On 24.04.2017 21:43, Andres Freund wrote:

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:

So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with
default parameters, scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

So autoprepare is as efficient as explicit prepare and can increase
performance almost two times.

My current implementation is replacing with parameters only string
literals in the query, i.e. select * from T where x='123'; -> select *
from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to
locate '\'' character and then correctly handle pairs of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly
understand whether minus should be treated as part of literal or as
operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where
parameters can not be used, for example "order by 1".

Fully correct substitution can be done by first performing parsing the
query, then transform parse tree, replacing literal nodes with parameter
nodes and finally deparse tree into generalized query. postgres_fdw
already contains such deparse code. It can be moved to postgres core and
reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time
needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.

There is obvious question: how I managed to get this pgbench results if
currently only substitution of string literals is supported and queries
constructed by pgbench don't contain string literals? I just made small
patch in pgbench replaceVariable method wrapping value's representation
in quotes. It has almost no impact on performance (3482 TPS vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.

I attached my patch to this mail. It is just first version of the patch
(based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is
considered to be useful, I will continue work on this patch.

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Serge Rielau

serge@rielau.com

about 9 years ago

In reply to: Konstantin Knizhnik (#5)

Re: Cached plans and statement generalization

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.

You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.

Doug Doole did this work in DB2 LUW and he may be able to point to more places to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with (poorly designed) web apps that just glue together strings.
Protecting it under a GUC would allow to only do the work if it’s deemed likely to help.
Another rule I find useful is to abort any efforts to substitute literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining literals in on purpose.

A follow up feature would be to formalize different flavors of peeking.
I.e. can you produce a generic plan, but still recruit the initial set of bind values/substituted literals to dos costing?

Cheers
Serge Rielau
Salesforce.com <http://salesforce.com/>

PS: FWIW, I like this feature.

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Serge Rielau (#6)

Re: Cached plans and statement generalization

On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:
Another problem is caused by using integer literals in context where
parameters can not be used, for example "order by 1ï¿½.

You will also need to deal with modifiers in types such as
VARCHAR(10). Not sure if there are specific functions which can only
deal with literals (?) as well.

Sorry, I do not completely understand how presence of type modifiers can
affect string literals used in query.
Can you provide me some example?

Doug Doole did this work in DB2 LUW and he may be able to point to
more places to watch out for semantically.

Generally, in my experience, this feature is very valuable when
dealing with (poorly designed) web apps that just glue together strings.

I do not think that this optimization will be useful only for poorly
designed application.
I already pointed on two use cases where prepapred statements can not be
used:
1. pgbouncer without session-level pooling.
2. partitioning

Protecting it under a GUC would allow to only do the work if itï¿½s
deemed likely to help.
Another rule I find useful is to abort any efforts to substitute
literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the
remaining literals in on purpose.

A follow up feature would be to formalize different flavors of peeking.
I.e. can you produce a generic plan, but still recruit the initial set
of bind values/substituted literals to dos costing?

Here situation is the same as for explicitly prepared statements, isn't it?
Sometimes it is preferrable to use specialized plan rather than generic
plan.
I am not sure if postgres now is able to do it.

Cheers
Serge Rielau
Salesforce.com <http://salesforce.com>

PS: FWIW, I like this feature.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

David Fetter

david@fetter.org

about 9 years ago

In reply to: Konstantin Knizhnik (#5)

Re: Cached plans and statement generalization

On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:

On 24.04.2017 21:43, Andres Freund wrote:

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:

So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with
default parameters, scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Serge Rielau

serge@rielau.com

about 9 years ago

In reply to: Konstantin Knizhnik (#7)

Re: Cached plans and statement generalization

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik < k.knizhnik@postgrespro.ru [k.knizhnik@postgrespro.ru] > wrote: Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well. Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example? SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
Also some OLAP syntax like “rows preceding”
It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.
Cheers Serge

#10

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: David Fetter (#8)

Re: Cached plans and statement generalization

On 04/25/2017 07:54 PM, David Fetter wrote:

On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:

On 24.04.2017 21:43, Andres Freund wrote:

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:

So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with
default parameters, scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Sorry, may be I missed something, but I can not understand how security can be violated by extracting string literals from query. I am just copying bytes from one buffer to another. I do not try to somehow interpret this parameters. What I am doing is
very similar with standard prepared statements.
And moreover query is parsed! Only query which was already parsed and executed (but with different values of parameters) can be autoprepared.

Best,
David.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Serge Rielau (#9)

Re: Cached plans and statement generalization

On 04/25/2017 08:09 PM, Serge Rielau wrote:

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.

You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.

Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example?

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Also some OLAP syntax like “rows preceding”

It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.

Cheers
Serge

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#12

Serge Rielau

serge@rielau.com

about 9 years ago

In reply to: Konstantin Knizhnik (#11)

Re: Cached plans and statement generalization

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Cheers
Serge

#13

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Serge Rielau (#12)

Re: Cached plans and statement generalization

On 04/25/2017 11:40 PM, Serge Rielau wrote:

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru <mailto:k.knizhnik@postgrespro.ru>> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Well, actually my primary intention was not to make badly designed programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.

Cheers
Serge

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#14

Doug Doole

ddoole@salesforce.com

about 9 years ago

In reply to: Konstantin Knizhnik (#13)

Re: Cached plans and statement generalization

When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

There are a few limitations to this approach. For example, DB2 allowed you
to cast using function notation like VARCHAR(foo, 10). This meant I would
never replace the second parameter of any VARCHAR function. Now it's
possible that when the statement was fully compiled we'd find that
VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in
cast function. Our thinking that these cases were rare enough that we
wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax
avoids this problem completely.)

Because SQL is so structured, the implementation ended up being quite
simple (a few hundred line of code) with no significant maintenance issues.
(Other developers had no problem adding in new cases where constants had to
be preserved.)

The simple tokenizer was also fairly extensible. I'd prototyped using the
same code to also normalize statements (uppercase all keywords, collapse
whitespace to a single blank, etc.) but that feature was never added to the
product.

- Doug

On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> wrote:

Show quoted text

On 04/25/2017 11:40 PM, Serge Rielau wrote:

On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> wrote:

SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;

You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be
transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Well, actually my primary intention was not to make badly designed
programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared
statements.
If we want to substitute with parameters as much literals as possible,
then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.

Cheers
Serge

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#15

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Doug Doole (#14)

Re: Cached plans and statement generalization

On 2017-04-25 21:11:08 +0000, Doug Doole wrote:

When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

How did you manage plan invalidation and such?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Doug Doole

ddoole@salesforce.com

about 9 years ago

In reply to: Andres Freund (#15)

Re: Cached plans and statement generalization

Plan invalidation was no different than for any SQL statement. DB2 keeps a
list of the objects the statement depends on. If any of the objects changes
in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called
the "compilation environment". This is a collection of everything that
impacts how a statement is compiled (SQL path, optimization level, etc.).
Plan lookup is done using both the statement text and the compilation
environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
path is ANDRES, MYTEAM, SYSIBM we will have different compilation
environments. If we both issue "SELECT * FROM T" we'll end up with
different cache entries even if T in both of our statements resolves to
MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
execute "SELECT * FROM T" again, I have a new compilation environment so
the second invocation of the statement will create a new entry in the
cache. The first entry is not kicked out - it will still be there for
re-use if I change my SQL path back to my original value (modulo LRU for
cache memory management of course).

With literal replacement, the cache entry is on the modified statement
text. Given the modified statement text and the compilation environment,
you're guaranteed to get the right plan entry.

On Tue, Apr 25, 2017 at 2:47 PM Andres Freund <andres@anarazel.de> wrote:

Show quoted text

On 2017-04-25 21:11:08 +0000, Doug Doole wrote:

When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

How did you manage plan invalidation and such?

- Andres

#17

David Fetter

david@fetter.org

about 9 years ago

In reply to: Konstantin Knizhnik (#10)

Re: Cached plans and statement generalization

On Tue, Apr 25, 2017 at 11:35:21PM +0300, Konstantin Knizhnik wrote:

On 04/25/2017 07:54 PM, David Fetter wrote:

On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:

On 24.04.2017 21:43, Andres Freund wrote:

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:

So what I am thinking now is implicit query caching. If the same query with
different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with
default parameters, scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Sorry, may be I missed something, but I can not understand how
security can be violated by extracting string literals from query. I
am just copying bytes from one buffer to another. I do not try to
somehow interpret this parameters. What I am doing is very similar
with standard prepared statements. And moreover query is parsed!
Only query which was already parsed and executed (but with different
values of parameters) can be autoprepared.

I don't have an exploit yet. What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Doug Doole (#16)

Re: Cached plans and statement generalization

Hi,

(FWIW, on this list we don't do top-quotes)

On 2017-04-25 22:21:22 +0000, Doug Doole wrote:

Plan invalidation was no different than for any SQL statement. DB2 keeps a
list of the objects the statement depends on. If any of the objects changes
in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called
the "compilation environment". This is a collection of everything that
impacts how a statement is compiled (SQL path, optimization level, etc.).
Plan lookup is done using both the statement text and the compilation
environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
path is ANDRES, MYTEAM, SYSIBM we will have different compilation
environments. If we both issue "SELECT * FROM T" we'll end up with
different cache entries even if T in both of our statements resolves to
MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
execute "SELECT * FROM T" again, I have a new compilation environment so
the second invocation of the statement will create a new entry in the
cache. The first entry is not kicked out - it will still be there for
re-use if I change my SQL path back to my original value (modulo LRU for
cache memory management of course).

It's not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

David G. Johnston

david.g.johnston@gmail.com

about 9 years ago

In reply to: David Fetter (#17)

Re: Cached plans and statement generalization

On Tue, Apr 25, 2017 at 3:24 PM, David Fetter <david@fetter.org> wrote:

I don't have an exploit yet. What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

If they want to see how it works they can read the source code. In terms
of runtime data it would limited to whatever the session itself created.
In most cases the presence of the cache would be invisible. I suppose it
might appear if one were to explain a query, reset the session, explain
another query and then re-explain the original. If the chosen plan in the
second pass differed because of the presence of the leading query it would
be noticeable but not revealing. Albeit I'm a far cry from a security
expert...

David J.

#20

Doug Doole

ddoole@salesforce.com

about 9 years ago

In reply to: Andres Freund (#18)

Re: Cached plans and statement generalization

(FWIW, on this list we don't do top-quotes)

I know. Forgot and just did "reply all". My bad.

It's not always that simple, at least in postgres, unless you disregard

search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

DB2 does handle this case. Unfortunately I don't know the details of how it
worked though.

A naive option would be to invalidate anything that depends on table or
view *.FOOBAR. You could probably make it a bit smarter by also requiring
that schema A appear in the path.

- Doug

#21

Jim Finnerty

jfinnert@amazon.com

about 9 years ago

In reply to: Andres Freund (#18)

#22

Serge Rielau

serge@rielau.com

about 9 years ago

In reply to: Doug Doole (#20)

#23

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

about 9 years ago

In reply to: Konstantin Knizhnik (#5)

#24

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Andres Freund (#15)

#25

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Andres Freund (#18)

#26

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Tsunakawa, Takayuki (#23)

#27

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Konstantin Knizhnik (#26)

#28

Pavel Stehule

pavel.stehule@gmail.com

about 9 years ago

In reply to: Konstantin Knizhnik (#27)

#29

Doug Doole

ddoole@salesforce.com

about 9 years ago

In reply to: Doug Doole (#20)

#30

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Doug Doole (#29)

#31

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Pavel Stehule (#28)

#32

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Konstantin Knizhnik (#31)

#33

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Robert Haas (#32)

#34

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Konstantin Knizhnik (#33)

#35

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Robert Haas (#34)

#36

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Konstantin Knizhnik (#35)

#37

Douglas Doole

dougdoole@gmail.com

about 9 years ago

In reply to: Bruce Momjian (#36)

#38

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Douglas Doole (#37)

#39

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Bruce Momjian (#38)

#40

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Tom Lane (#39)

#41

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Bruce Momjian (#36)

#42

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Tom Lane (#39)

#43

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Konstantin Knizhnik (#42)

#44

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Andres Freund (#43)

#45

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Konstantin Knizhnik (#41)

#46

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Bruce Momjian (#45)

#47

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Konstantin Knizhnik (#46)

#48

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Bruce Momjian (#47)

#49

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Konstantin Knizhnik (#46)

#50

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Konstantin Knizhnik (#48)

#51

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Konstantin Knizhnik (#35)

#52

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Robert Haas (#51)

#53

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Konstantin Knizhnik (#52)

#54

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 9 years ago

In reply to: Konstantin Knizhnik (#35)

#55

Thomas Munro

thomas.munro@gmail.com

over 8 years ago

In reply to: Konstantin Knizhnik (#54)

#56

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 8 years ago

In reply to: Thomas Munro (#55)

#57

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 8 years ago

In reply to: Konstantin Knizhnik (#56)

#58

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Konstantin Knizhnik (#57)

#59

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

over 8 years ago

In reply to: Michael Paquier (#58)

#60

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Tsunakawa, Takayuki (#59)

#61

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 8 years ago

In reply to: Michael Paquier (#58)

#62

Stephen Frost

sfrost@snowman.net

over 8 years ago

In reply to: Konstantin Knizhnik (#61)

#63

Thomas Munro

thomas.munro@gmail.com

over 8 years ago

In reply to: Stephen Frost (#62)

#64

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 8 years ago

In reply to: Thomas Munro (#63)

#65

David Steele

david@pgmasters.net

about 8 years ago

In reply to: Konstantin Knizhnik (#64)

#66

David Steele

david@pgmasters.net

about 8 years ago

In reply to: David Steele (#65)

#67

Yamaji, Ryo

yamaji.ryo@jp.fujitsu.com

almost 8 years ago

In reply to: Konstantin Knizhnik (#64)

#68

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 8 years ago

In reply to: Yamaji, Ryo (#67)

#69

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 8 years ago

In reply to: Konstantin Knizhnik (#68)

#70

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 8 years ago

In reply to: Yamaji, Ryo (#67)

#71

Yamaji, Ryo

yamaji.ryo@jp.fujitsu.com

almost 8 years ago

In reply to: Konstantin Knizhnik (#70)

#72

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 8 years ago

In reply to: Yamaji, Ryo (#71)

#73

Yamaji, Ryo

yamaji.ryo@jp.fujitsu.com

almost 8 years ago

In reply to: Konstantin Knizhnik (#72)

#74

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 8 years ago

In reply to: Yamaji, Ryo (#73)

#75

Yamaji, Ryo

yamaji.ryo@jp.fujitsu.com

almost 8 years ago

In reply to: Konstantin Knizhnik (#74)

#76

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 8 years ago

In reply to: Yamaji, Ryo (#75)

#77

Yamaji, Ryo

yamaji.ryo@jp.fujitsu.com

over 7 years ago

In reply to: Konstantin Knizhnik (#74)

#78

Dmitry Dolgov

9erthalion6@gmail.com

over 7 years ago

In reply to: Yamaji, Ryo (#77)

#79

Yamaji, Ryo

yamaji.ryo@jp.fujitsu.com

over 7 years ago

In reply to: Dmitry Dolgov (#78)

#80

Dmitry Dolgov

9erthalion6@gmail.com

over 7 years ago

In reply to: Yamaji, Ryo (#79)

#81

Nagaura, Ryohei

nagaura.ryohei@jp.fujitsu.com

over 7 years ago

In reply to: Dmitry Dolgov (#80)

#82

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 7 years ago

In reply to: Nagaura, Ryohei (#81)

#83

Nagaura, Ryohei

nagaura.ryohei@jp.fujitsu.com

over 7 years ago

In reply to: Konstantin Knizhnik (#82)

#84

Yamaji, Ryo

yamaji.ryo@jp.fujitsu.com

about 7 years ago

In reply to: Konstantin Knizhnik (#82)

#85

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 7 years ago

In reply to: Yamaji, Ryo (#84)

#86

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 7 years ago

In reply to: Konstantin Knizhnik (#85)

#87

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 7 years ago

In reply to: Konstantin Knizhnik (#86)

#88

Thomas Munro

thomas.munro@gmail.com

almost 7 years ago

In reply to: Konstantin Knizhnik (#87)

#89

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 7 years ago

In reply to: Thomas Munro (#88)

#90

Thomas Munro

thomas.munro@gmail.com

almost 7 years ago

In reply to: Konstantin Knizhnik (#89)

#91

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 7 years ago

In reply to: Thomas Munro (#90)

#92

Thomas Munro

thomas.munro@gmail.com

almost 7 years ago

In reply to: Konstantin Knizhnik (#91)

#93

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 7 years ago

In reply to: Thomas Munro (#92)

#94

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 7 years ago

In reply to: Konstantin Knizhnik (#93)

#95

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 7 years ago

In reply to: Heikki Linnakangas (#94)

#96

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 7 years ago

In reply to: Heikki Linnakangas (#94)

#97

Daniel Migowski

dmigowski@ikoffice.de

almost 7 years ago

In reply to: Konstantin Knizhnik (#96)

#98

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 7 years ago

In reply to: Daniel Migowski (#97)

#99

Daniel Migowski

dmigowski@ikoffice.de

almost 7 years ago

In reply to: Konstantin Knizhnik (#98)

#100

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

almost 7 years ago

In reply to: Konstantin Knizhnik (#96)

#101

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Konstantin Knizhnik (#100)

#102

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 6 years ago

In reply to: Alvaro Herrera (#101)

#103

Michael Paquier

michael@paquier.xyz

over 6 years ago

In reply to: Konstantin Knizhnik (#102)

#104

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 6 years ago

In reply to: Michael Paquier (#103)

#105

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Konstantin Knizhnik (#104)

#106

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Tom Lane (#105)

#107

Юрий Соколов

funny.falcon@gmail.com

about 5 years ago

In reply to: Tom Lane (#105)

Cached plans and statement generalization

Attachments: