plans for PostgreSQL 12

Started by Pavel Stehuleover 7 years ago13 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

I am sending my ToDo for PostgreSQL 12 - a list of features what I would to
develop or I would to participate on:

1. schema variables - CREATE VARIABLE command
2. using pgbench expressions for psql - review, cooperation
3. JSON related patches - review
4. optimization expression without necessity to create snapshots -
experiments

@4 There are lot of not database expressions in PLpgSQL - like var1 := var1
+ var2 or var1 := var1 + konst. Own calculation needs about 1% of time of
total expression evaluation time. Almost all time get preparing plan cache,
preparing snapshot, .. For this case, when no database object is used, we
don't need use this infrastructure. I would to measure performance impact,
and testing if these optimizations are interesting or not.

I finalizing a article about PostgreSQL 11. It will great release. The
progress in partitioning area is massive. It is hard to believe so Postgres
has JIT. I hope so PostgreSQL will be better.

Regards

Pavel

#2Simon Riggs
simon@2ndquadrant.com
In reply to: Pavel Stehule (#1)
Re: plans for PostgreSQL 12

On 4 June 2018 at 06:08, Pavel Stehule <pavel.stehule@gmail.com> wrote:

4. optimization expression without necessity to create snapshots -
experiments

@4 There are lot of not database expressions in PLpgSQL - like var1 := var1
+ var2 or var1 := var1 + konst. Own calculation needs about 1% of time of
total expression evaluation time. Almost all time get preparing plan cache,
preparing snapshot, .. For this case, when no database object is used, we
don't need use this infrastructure. I would to measure performance impact,
and testing if these optimizations are interesting or not.

Sounds good. I think this would need to be restricted by operator and
datatype, since in general you won't know if the datatype functions
need a snapshot or not. Immutable functions for the operators ought to
do it, but I think that might not be enough.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Simon Riggs (#2)
Re: plans for PostgreSQL 12

2018-06-04 8:35 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:

On 4 June 2018 at 06:08, Pavel Stehule <pavel.stehule@gmail.com> wrote:

4. optimization expression without necessity to create snapshots -
experiments

@4 There are lot of not database expressions in PLpgSQL - like var1 :=

var1

+ var2 or var1 := var1 + konst. Own calculation needs about 1% of time of
total expression evaluation time. Almost all time get preparing plan

cache,

preparing snapshot, .. For this case, when no database object is used, we
don't need use this infrastructure. I would to measure performance

impact,

and testing if these optimizations are interesting or not.

Sounds good. I think this would need to be restricted by operator and
datatype, since in general you won't know if the datatype functions
need a snapshot or not. Immutable functions for the operators ought to
do it, but I think that might not be enough.

It requires introduction of new "safe" functions (& operators). Immutable
functions are not enough safe.

CREATE OR REPLACE FUNCTION public.fx()
RETURNS integer
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
RETURN (SELECT count(*) FROM pg_class);
END;
$function$

postgres=# SELECT fx();
┌─────┐
│ fx │
╞═════╡
│ 343 │
└─────┘
(1 row)

I have not a name for this new class - maybe "pure immutable". The name is
not important in this moment, and I am sure, so native speakers can find
good name if it is necessary.

For start I would to accept as safe functions only buildin immutable
functions (& operators). From practical view it can be good enough for some
first public iterations too.

I am looking to some big real project (migrated from Oracle)

The majority are

1. constant expressions: 0, NULL, '', true, false, 'yes', 'no'
2. simply operators: <, >, <> = again numbers and strings (sometimes
logical predicate and, or are used)
3. + and || operators: var := var + varx
4. IS NULL, IS NOT NULL

The constant expressions can have special optimization - it is really often
pattern

if expr then
a := const;
else
a := other_const;
end if;

Because expressions are not typical bottleneck in PL/pgSQL, then typical
speedup will not be 100%, but The PL/pgSQL can lost bad reputation about
calculation power, and it can be interesting for PostGIS people and some
speedup can be measurable in applications migrated from Oracle - these
people did not write code for PL/pgSQL and didn't count number of
expressions. And any speedup is not bad.

I have not idea, how the code will be ugly - it is a experiment, and I am
thinking so it can be very interesting

Regards

Pavel

Show quoted text

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Pavel Stehule (#3)
Re: plans for PostgreSQL 12

On 04/06/18 09:12, Pavel Stehule wrote:

2018-06-04 8:35 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:

Sounds good. I think this would need to be restricted by operator and
datatype, since in general you won't know if the datatype functions
need a snapshot or not. Immutable functions for the operators ought to
do it, but I think that might not be enough.

It requires introduction of new "safe" functions (& operators). Immutable
functions are not enough safe.

CREATE OR REPLACE FUNCTION public.fx()
RETURNS integer
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
RETURN (SELECT count(*) FROM pg_class);
END;
$function$

postgres=# SELECT fx();
┌─────┐
│ fx │
╞═════╡
│ 343 │
└─────┘
(1 row)

That function is incorrectly marked as IMMUTABLE. In that situation,
it's enough that we throw a sane error like "ERROR: no snapshot available".

- Heikki

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Heikki Linnakangas (#4)
Re: plans for PostgreSQL 12

2018-06-04 9:24 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi>:

On 04/06/18 09:12, Pavel Stehule wrote:

2018-06-04 8:35 GMT+02:00 Simon Riggs <simon@2ndquadrant.com>:

Sounds good. I think this would need to be restricted by operator and
datatype, since in general you won't know if the datatype functions
need a snapshot or not. Immutable functions for the operators ought to
do it, but I think that might not be enough.

It requires introduction of new "safe" functions (& operators). Immutable
functions are not enough safe.

CREATE OR REPLACE FUNCTION public.fx()
RETURNS integer
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
RETURN (SELECT count(*) FROM pg_class);
END;
$function$

postgres=# SELECT fx();
┌─────┐
│ fx │
╞═════╡
│ 343 │
└─────┘
(1 row)

That function is incorrectly marked as IMMUTABLE. In that situation, it's
enough that we throw a sane error like "ERROR: no snapshot available".

Yes, it is incorrect mark. Unfortunately - this is often workaround for
wrong estimations - so I afraid, in this case, your proposed fix breaks lot
of applications.

Regards

Pavel

Show quoted text

- Heikki

#6Vik Fearing
vik.fearing@2ndquadrant.com
In reply to: Pavel Stehule (#5)
Re: plans for PostgreSQL 12

On 04/06/18 09:37, Pavel Stehule wrote:

2018-06-04 9:24 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi
<mailto:hlinnaka@iki.fi>>:

On 04/06/18 09:12, Pavel Stehule wrote:

It requires introduction of new "safe" functions (& operators).
Immutable
functions are not enough safe.

CREATE OR REPLACE FUNCTION public.fx()
  RETURNS integer
  LANGUAGE plpgsql
  IMMUTABLE
AS $function$
BEGIN
   RETURN (SELECT count(*) FROM pg_class);
END;
$function$

postgres=# SELECT fx();
┌─────┐
│ fx  │
╞═════╡
│ 343 │
└─────┘
(1 row)

That function is incorrectly marked as IMMUTABLE. In that situation,
it's enough that we throw a sane error like "ERROR: no snapshot
available".

Yes, it is incorrect mark. Unfortunately - this is often workaround for
wrong estimations - so I afraid, in this case, your proposed fix breaks
lot of applications.

I would say such applications are already broken.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Vik Fearing (#6)
Re: plans for PostgreSQL 12

2018-06-04 9:59 GMT+02:00 Vik Fearing <vik.fearing@2ndquadrant.com>:

On 04/06/18 09:37, Pavel Stehule wrote:

2018-06-04 9:24 GMT+02:00 Heikki Linnakangas <hlinnaka@iki.fi
<mailto:hlinnaka@iki.fi>>:

On 04/06/18 09:12, Pavel Stehule wrote:

It requires introduction of new "safe" functions (& operators).
Immutable
functions are not enough safe.

CREATE OR REPLACE FUNCTION public.fx()
RETURNS integer
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
RETURN (SELECT count(*) FROM pg_class);
END;
$function$

postgres=# SELECT fx();
┌─────┐
│ fx │
╞═════╡
│ 343 │
└─────┘
(1 row)

That function is incorrectly marked as IMMUTABLE. In that situation,
it's enough that we throw a sane error like "ERROR: no snapshot
available".

Yes, it is incorrect mark. Unfortunately - this is often workaround for
wrong estimations - so I afraid, in this case, your proposed fix breaks
lot of applications.

I would say such applications are already broken.

I cannot to agree, not in this moment:

1. there is not any workaround, how to force subselect evaluation in
planning time - what can be correct for once only evaluated queries.

2. what is not prohibited, is enabled. I agree so this trick is ugly - but
I got it from Tom if I remember well maybe more than 10 years ago. Now is
too late change it - I think - probably we find more strange features that
we hold due compatibility. But this discussion is offtopic for this thread.
I am thinking so lot of expressions can be significantly accelerated and I
would to check it.

Regards

Pavel

--

Show quoted text

Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#8Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#2)
Re: plans for PostgreSQL 12

Hi,

On 2018-06-04 07:35:23 +0100, Simon Riggs wrote:

On 4 June 2018 at 06:08, Pavel Stehule <pavel.stehule@gmail.com> wrote:

4. optimization expression without necessity to create snapshots -
experiments

@4 There are lot of not database expressions in PLpgSQL - like var1 := var1
+ var2 or var1 := var1 + konst. Own calculation needs about 1% of time of
total expression evaluation time. Almost all time get preparing plan cache,
preparing snapshot, .. For this case, when no database object is used, we
don't need use this infrastructure. I would to measure performance impact,
and testing if these optimizations are interesting or not.

Can you show your testcase and the corresponding profile? It seems like
this should be solvable without adding a new "snapshotless, really
immutable" class.

Sounds good. I think this would need to be restricted by operator and
datatype, since in general you won't know if the datatype functions
need a snapshot or not. Immutable functions for the operators ought to
do it, but I think that might not be enough.

It'd indeed not be enough. E.g. enum_lt et al are immutable but access
the catalog.

Greetings,

Andres Freund

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#7)
Re: plans for PostgreSQL 12

Pavel Stehule <pavel.stehule@gmail.com> writes:

2018-06-04 9:59 GMT+02:00 Vik Fearing <vik.fearing@2ndquadrant.com>:

On 04/06/18 09:37, Pavel Stehule wrote:

Yes, it is incorrect mark. Unfortunately - this is often workaround for
wrong estimations - so I afraid, in this case, your proposed fix breaks
lot of applications.

I would say such applications are already broken.

I cannot to agree, not in this moment:
1. there is not any workaround, how to force subselect evaluation in
planning time - what can be correct for once only evaluated queries.

There's the sadly-underdocumented trick of writing the expression inside
a sub-select so that it becomes an InitPlan. We could doubtless improve
our support for that --- for instance, teach estimate_expression_value()
how to get an estimated value when the expression is stable --- but I'm
unconvinced that we need new infrastructure for this. Inventing a new
function volatility class would be an enormous mess from users'
standpoint, especially if the reason was only to distinguish cheating
uses from non-cheating uses of the existing class. I am not inclined
to promise that we'll never break cheating uses.

regards, tom lane

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#9)
Re: plans for PostgreSQL 12

2018-06-04 21:53 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2018-06-04 9:59 GMT+02:00 Vik Fearing <vik.fearing@2ndquadrant.com>:

On 04/06/18 09:37, Pavel Stehule wrote:

Yes, it is incorrect mark. Unfortunately - this is often workaround for
wrong estimations - so I afraid, in this case, your proposed fix breaks
lot of applications.

I would say such applications are already broken.

I cannot to agree, not in this moment:
1. there is not any workaround, how to force subselect evaluation in
planning time - what can be correct for once only evaluated queries.

There's the sadly-underdocumented trick of writing the expression inside
a sub-select so that it becomes an InitPlan. We could doubtless improve
our support for that --- for instance, teach estimate_expression_value()
how to get an estimated value when the expression is stable --- but I'm
unconvinced that we need new infrastructure for this. Inventing a new
function volatility class would be an enormous mess from users'
standpoint, especially if the reason was only to distinguish cheating
uses from non-cheating uses of the existing class. I am not inclined
to promise that we'll never break cheating uses.

In this moment it is only one possible solution. The situation will be
different, when there will be cleaner, better substitution. Isn't necessary
to talk about it more now - or it is theme for other thread. When I worked
for GoodData we should to use this technique for optimization of queries
for snow flake schema. Native optimization of star schema or snow flake
schema queries can be the best of.

For my test, and maybe for first releases the good optimization of faster
expression evaluation can be based on immutable functions from pg_catalog
schema. It is not final design, and I don't try to speak about some timing.
I understand well, so working with just this subset of functions can be
frustrating for users, but it can be good for start - the problem with not
too strong immutable functions can be long (there can be some relation to
driver level, some similar optimization cannot be principally possible when
execution plan can be reused. But some drivers use prepared statements
implicitly for only one execution due protection against SQL injection).

Regards

Pavel

Show quoted text

regards, tom lane

#11Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andres Freund (#8)
Re: plans for PostgreSQL 12

2018-06-04 20:55 GMT+02:00 Andres Freund <andres@anarazel.de>:

Hi,

On 2018-06-04 07:35:23 +0100, Simon Riggs wrote:

On 4 June 2018 at 06:08, Pavel Stehule <pavel.stehule@gmail.com> wrote:

4. optimization expression without necessity to create snapshots -
experiments

@4 There are lot of not database expressions in PLpgSQL - like var1 :=

var1

+ var2 or var1 := var1 + konst. Own calculation needs about 1% of time

of

total expression evaluation time. Almost all time get preparing plan

cache,

preparing snapshot, .. For this case, when no database object is used,

we

don't need use this infrastructure. I would to measure performance

impact,

and testing if these optimizations are interesting or not.

Can you show your testcase and the corresponding profile? It seems like
this should be solvable without adding a new "snapshotless, really
immutable" class.

./configure --with-libxml --enable-tap-tests --enable-debug --with-perl
CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"

[pavel@nemesis postgresql]$ gcc --version
gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)

I executed simple script

do $$ declare i bigint = 1; s bigint = 0; begin while i <= 100000000 loop
s := s + i; i := i + 1; end loop; raise notice '%', s; end $$;

7,68% postmaster postgres [.]
GetSnapshotData ▒
7,53% postmaster plpgsql.so [.]
exec_eval_simple_expr ▒
6,49% postmaster postgres [.]
ExecInterpExpr ▒
4,13% postmaster postgres [.]
LWLockRelease ▒
4,12% postmaster postgres [.]
AllocSetAlloc ▒
3,67% postmaster postgres [.]
PopActiveSnapshot ▒
3,39% postmaster postgres [.]
GetCachedPlan ▒
3,28% postmaster postgres [.]
SPI_plan_get_cached_plan ▒
3,11% postmaster postgres [.]
LWLockAttemptLock ▒
2,90% postmaster postgres [.]
OverrideSearchPathMatchesCurrent ▒
2,64% postmaster plpgsql.so [.]
plpgsql_param_eval_var ▒
2,62% postmaster plpgsql.so [.]
exec_assign_value ◆
2,42% postmaster postgres [.]
RevalidateCachedQuery ▒
2,41% postmaster postgres [.]
AcquireExecutorLocks ▒
2,40% postmaster postgres [.]
PushActiveSnapshot ▒
2,30% postmaster postgres [.]
CopySnapshot ▒
2,27% postmaster postgres [.]
CheckCachedPlan ▒
2,18% postmaster postgres [.]
AcquirePlannerLocks ▒
2,01% postmaster postgres [.]
LWLockAcquire ▒
1,98% postmaster plpgsql.so [.]
exec_stmt ▒
1,94% postmaster postgres [.]
MemoryContextAlloc ▒
1,93% postmaster postgres [.]
AllocSetFree ▒
1,77% postmaster postgres [.]
ResourceArrayRemove ▒
1,64% postmaster plpgsql.so [.]
exec_eval_expr ▒
1,62% postmaster postgres [.]
ResourceArrayAdd ▒
1,55% postmaster postgres [.]
ScanQueryForLocks ▒
1,34% postmaster postgres [.]
ReleaseCachedPlan ▒
1,24% postmaster plpgsql.so [.]
exec_cast_value ▒
1,14% postmaster postgres [.]
int84pl ▒
1,14% postmaster postgres [.]
recomputeNamespacePath ▒
1,12% postmaster postgres [.]
GetTransactionSnapshot ▒
1,08% postmaster plpgsql.so [.]
exec_eval_cleanup ▒
0,99% postmaster postgres [.]
MemoryContextReset ▒
0,99% postmaster plpgsql.so [.]
exec_assign_expr ▒
0,83% postmaster plpgsql.so [.]
assign_simple_var ▒
0,82% postmaster postgres [.]
int8pl ▒
0,77% postmaster postgres [.]
ResourceOwnerForgetPlanCacheRef ▒
0,75% postmaster postgres [.]
pfree ▒
0,69% postmaster postgres [.]
ResourceOwnerRememberPlanCacheRef ▒
0,57% postmaster postgres [.]
ResourceOwnerEnlargePlanCacheRefs ▒
0,51% postmaster postgres [.]
ResourceArrayEnlarge ▒
0,44% postmaster postgres [.]
RecoveryInProgress ▒
0,44% postmaster plpgsql.so [.]
exec_stmts ▒
0,39% postmaster plpgsql.so [.]
exec_eval_boolean ▒
0,38% postmaster postgres [.]
TransactionIdPrecedes ▒
0,38% postmaster plpgsql.so [.]
exec_stmt_while ▒
0,29% postmaster postgres [.]
choose_custom_plan ▒
0,29% postmaster plpgsql.so [.]
setup_param_list ▒
0,27% postmaster plpgsql.so [.]
exec_stmt_assign ▒
0,27% postmaster postgres [.]
GetCurrentCommandId

report with call graph - some parts

- 61,06%
exec_assign_expr

- 55,96%
exec_eval_expr

- 54,70%
exec_eval_simple_expr ▒
+ 14,86%
SPI_plan_get_cached_plan ▒
+ 12,72%
GetTransactionSnapshot ▒
+ 8,26% ExecEvalExpr
(inlined) ▒
+ 7,04%
PushActiveSnapshot ▒
+ 4,08%
PopActiveSnapshot ▒
+ 2,13%
ReleaseCachedPlan ▒
0,97% MemoryContextSwitchTo
(inlined) ▒
- 3,81%
exec_assign_value

0,74%
assign_simple_var ▒
0,61%
exec_cast_value ▒
+ 1,14% exec_eval_cleanup

- 11,51% 0,00% postmaster plpgsql.so [.] ExecEvalExpr
(inlined) ▒
- ExecEvalExpr
(inlined) ▒
- 11,03%
ExecInterpExpr

3,10%
plpgsql_param_eval_var ▒
1,11%
int84pl ▒
0,79% int8pl

- 6,90%
GetCachedPlan ▒
- 3,36%
RevalidateCachedQuery ▒
0,96%
OverrideSearchPathMatchesCurrent ▒
0,88%
AcquirePlannerLocks ▒
- 1,31%
CheckCachedPlan ▒
0,71%
AcquireExecutorLocks ▒
0,60%
ResourceOwnerRememberPlanCacheRef ▒
- 6,22%
GetTransactionSnapshot ▒
- 5,58%
GetSnapshotData ▒
+ 1,64%
LWLockAcquire ▒
+ 1,30% LWLockRelease

This example is worst case, but it shows significant overhead of cached
plans and snapshots there.

What do you think about it?

Sounds good. I think this would need to be restricted by operator and
datatype, since in general you won't know if the datatype functions
need a snapshot or not. Immutable functions for the operators ought to
do it, but I think that might not be enough.

It'd indeed not be enough. E.g. enum_lt et al are immutable but access
the catalog.

good to known, thank you

Pavel

Show quoted text

Greetings,

Andres Freund

#12Andres Freund
andres@anarazel.de
In reply to: Pavel Stehule (#11)
Re: plans for PostgreSQL 12

Hi,

On 2018-06-05 06:32:31 +0200, Pavel Stehule wrote:

./configure --with-libxml --enable-tap-tests --enable-debug --with-perl
CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"

[pavel@nemesis postgresql]$ gcc --version
gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)

I executed simple script

do $$ declare i bigint = 1; s bigint = 0; begin while i <= 100000000 loop
s := s + i; i := i + 1; end loop; raise notice '%', s; end $$;

7,68% postmaster postgres [.]
GetSnapshotData ▒
7,53% postmaster plpgsql.so [.]
exec_eval_simple_expr ▒
6,49% postmaster postgres [.]

It seems to me the right fix here isn't a new class of functions, but
rather support for delaying the computation of the snapshot to the point
it's needed. That'll be far more generically applicable and doesn't
require user interaction.

ExecInterpExpr ▒
4,13% postmaster postgres [.]

So we're going to need to optimize this further as well, I've a pending
patch for that, luckily ;)

LWLockRelease ▒
4,12% postmaster postgres [.]

That's also GetSnapshotData()...

Greetings,

Andres Freund

#13Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andres Freund (#12)
Re: plans for PostgreSQL 12

2018-06-05 15:00 GMT+02:00 Andres Freund <andres@anarazel.de>:

Hi,

On 2018-06-05 06:32:31 +0200, Pavel Stehule wrote:

./configure --with-libxml --enable-tap-tests --enable-debug --with-perl
CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"

[pavel@nemesis postgresql]$ gcc --version
gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1)

I executed simple script

do $$ declare i bigint = 1; s bigint = 0; begin while i <= 100000000

loop

s := s + i; i := i + 1; end loop; raise notice '%', s; end $$;

7,68% postmaster postgres [.]
GetSnapshotData ▒
7,53% postmaster plpgsql.so [.]
exec_eval_simple_expr ▒
6,49% postmaster postgres [.]

It seems to me the right fix here isn't a new class of functions, but
rather support for delaying the computation of the snapshot to the point
it's needed. That'll be far more generically applicable and doesn't
require user interaction.

good idea. Can be quick fix.

ExecInterpExpr ▒
4,13% postmaster postgres [.]

So we're going to need to optimize this further as well, I've a pending
patch for that, luckily ;)

nice :)

LWLockRelease ▒
4,12% postmaster postgres [.]

That's also GetSnapshotData()...

there are about 10% locking, unlocking plan cache still.

Regards

Pavel

Show quoted text

Greetings,

Andres Freund