ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Hello
this proposal is related to reported issue
/messages/by-id/E1VQuta-0007Y4-Ip@wrigleys.postgresql.org
We can directly modify any fields of int, float, double arrays (when result
size will be immutable). This trick is used now for acceleration of some
aggregates.
Ideas, comments?
Regards
Pavel
Hi,
On 2013-10-02 18:56:51 +0200, Pavel Stehule wrote:
Hello
this proposal is related to reported issue
/messages/by-id/E1VQuta-0007Y4-Ip@wrigleys.postgresql.orgWe can directly modify any fields of int, float, double arrays (when result
size will be immutable). This trick is used now for acceleration of some
aggregates.Ideas, comments?
A specific array might be located directly on a buffer - starting to
manipulate them inplace will lead to havoc in such scenarios. I don't
think we have the infrastructure for detecting such cases.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/10/2 Andres Freund <andres@2ndquadrant.com>
Hi,
On 2013-10-02 18:56:51 +0200, Pavel Stehule wrote:
Hello
this proposal is related to reported issue
/messages/by-id/E1VQuta-0007Y4-Ip@wrigleys.postgresql.org
We can directly modify any fields of int, float, double arrays (when
result
size will be immutable). This trick is used now for acceleration of some
aggregates.Ideas, comments?
A specific array might be located directly on a buffer - starting to
manipulate them inplace will lead to havoc in such scenarios. I don't
think we have the infrastructure for detecting such cases.
If you can do a update of some array in plpgsql now, then you have to work
with local copy only. It is a necessary precondition, and I am think it is
valid.
Pavel
Show quoted text
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Pavel Stehule <pavel.stehule@gmail.com> writes:
If you can do a update of some array in plpgsql now, then you have to work
with local copy only. It is a necessary precondition, and I am think it is
valid.
If the proposal only relates to assignments to elements of plpgsql local
variables, it's probably safe, but it's also probably not of much value.
plpgsql has enough overhead that I'm doubting you'd get much real-world
speedup. I'm also not very excited about putting even more low-level
knowledge about array representation into plpgsql.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/10/3 Tom Lane <tgl@sss.pgh.pa.us>
Pavel Stehule <pavel.stehule@gmail.com> writes:
If you can do a update of some array in plpgsql now, then you have to
work
with local copy only. It is a necessary precondition, and I am think it
is
valid.
If the proposal only relates to assignments to elements of plpgsql local
variables, it's probably safe, but it's also probably not of much value.
plpgsql has enough overhead that I'm doubting you'd get much real-world
speedup. I'm also not very excited about putting even more low-level
knowledge about array representation into plpgsql.
I looked to code, and I am thinking so this can be done inside array
related routines. We just have to signalize request for inplace update (if
we have a local copy).
I have not idea, how significant speedup can be (if any), but current
behave is not friendly (and for multidimensional arrays there are no
workaround), so it is interesting way - and long time I though about some
similar optimization.
Regards
Pavel
Show quoted text
regards, tom lane
Hello
a very ugly test shows a possibility about 100% speedup on reported
example (on small arrays, a patch is buggy and doesn't work for larger
arrays).
I updated a code to be read only
CREATE OR REPLACE FUNCTION public.fill_2d_array(rows integer, cols integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
DECLARE
img double precision[][];
i integer; j integer;
cont integer; r double precision;
BEGIN
img := ARRAY( SELECT 0 FROM generate_series(1, rows * cols) ) ;
cont:= 0;
For i IN 1..rows LOOP
For j IN 1..cols LOOP r := img[i * cols + j];
r := (i * cols + j)::double precision;
cont := cont + 1; --raise notice '%', img;
END LOOP;
END LOOP;
return cont;
END;
$function$
It exec all expressions
-- original
postgres=# select fill_2d_array(200,200);
fill_2d_array
---------------
40000
(1 row)
Time: 12726.117 ms
-- read only version
postgres=# select fill_2d_array(200,200); fill_2d_array
---------------
40000
(1 row)
Time: 245.894 ms
so there is about 50x slowdown
2013/10/3 Pavel Stehule <pavel.stehule@gmail.com>
Show quoted text
2013/10/3 Tom Lane <tgl@sss.pgh.pa.us>
Pavel Stehule <pavel.stehule@gmail.com> writes:
If you can do a update of some array in plpgsql now, then you have to
work
with local copy only. It is a necessary precondition, and I am think it
is
valid.
If the proposal only relates to assignments to elements of plpgsql local
variables, it's probably safe, but it's also probably not of much value.
plpgsql has enough overhead that I'm doubting you'd get much real-world
speedup. I'm also not very excited about putting even more low-level
knowledge about array representation into plpgsql.I looked to code, and I am thinking so this can be done inside array
related routines. We just have to signalize request for inplace update (if
we have a local copy).I have not idea, how significant speedup can be (if any), but current
behave is not friendly (and for multidimensional arrays there are no
workaround), so it is interesting way - and long time I though about some
similar optimization.Regards
Pavel
regards, tom lane
Attachments:
fast_array_update.patchapplication/octet-stream; name=fast_array_update.patchDownload
*** ./src/backend/utils/adt/arrayfuncs.c.orig 2013-10-03 20:54:36.725985694 +0200
--- ./src/backend/utils/adt/arrayfuncs.c 2013-10-03 20:53:50.700605494 +0200
***************
*** 1841,1846 ****
--- 1841,1935 ----
}
+ char *
+ array_ref_ref(ArrayType *array,
+ int nSubscripts,
+ int *indx,
+ int arraytyplen,
+ int elmlen,
+ bool elmbyval,
+ char elmalign,
+ bool *isNull)
+ {
+ int i,
+ ndim,
+ *dim,
+ *lb,
+ offset,
+ fixedDim[1],
+ fixedLb[1];
+ char *arraydataptr,
+ *retptr;
+ bits8 *arraynullsptr;
+
+ if (arraytyplen > 0)
+ {
+ /*
+ * fixed-length arrays -- these are assumed to be 1-d, 0-based
+ */
+ ndim = 1;
+ fixedDim[0] = arraytyplen / elmlen;
+ fixedLb[0] = 0;
+ dim = fixedDim;
+ lb = fixedLb;
+ arraydataptr = (char *) array;
+ arraynullsptr = NULL;
+ }
+ else
+ {
+ /* detoast input array if necessary */
+ array = DatumGetArrayTypeP(PointerGetDatum(array));
+
+ ndim = ARR_NDIM(array);
+ dim = ARR_DIMS(array);
+ lb = ARR_LBOUND(array);
+ arraydataptr = ARR_DATA_PTR(array);
+ arraynullsptr = ARR_NULLBITMAP(array);
+ }
+
+ /*
+ * Return NULL for invalid subscript
+ */
+ if (ndim != nSubscripts || ndim <= 0 || ndim > MAXDIM)
+ {
+ *isNull = true;
+ return (Datum) 0;
+ }
+ for (i = 0; i < ndim; i++)
+ {
+ if (indx[i] < lb[i] || indx[i] >= (dim[i] + lb[i]))
+ {
+ *isNull = true;
+ return (Datum) 0;
+ }
+ }
+
+ /*
+ * Calculate the element number
+ */
+ offset = ArrayGetOffset(nSubscripts, dim, lb, indx);
+
+ /*
+ * Check for NULL array element
+ */
+ if (array_get_isnull(arraynullsptr, offset))
+ {
+ *isNull = true;
+ return (Datum) 0;
+ }
+
+ /*
+ * OK, get the element
+ */
+ *isNull = false;
+ retptr = array_seek(arraydataptr, 0, arraynullsptr, offset,
+ elmlen, elmbyval, elmalign);
+
+ return retptr;
+ }
+
+
+
/*
* array_get_slice :
* This routine takes an array and a range of indices (upperIndex and
*** ./src/pl/plpgsql/src/pl_exec.c.orig 2013-04-01 20:20:36.000000000 +0200
--- ./src/pl/plpgsql/src/pl_exec.c 2013-10-03 21:49:09.932043358 +0200
***************
*** 3992,3997 ****
--- 3992,4000 ----
SPITupleTable *save_eval_tuptable;
MemoryContext oldcontext;
+ char *refptr;
+ bool fast_update = false;
+
/*
* We need to do subscript evaluation, which might require
* evaluating general expressions; and the caller might have
***************
*** 4139,4153 ****
/*
* Build the modified array value.
*/
! newarrayval = array_set(oldarrayval,
nsubscripts,
subscriptvals,
- coerced_value,
- *isNull,
arrayelem->arraytyplen,
arrayelem->elemtyplen,
arrayelem->elemtypbyval,
! arrayelem->elemtypalign);
MemoryContextSwitchTo(oldcontext);
--- 4142,4180 ----
/*
* Build the modified array value.
*/
!
! /* only for fixed length types and types referenced by val */
! if (arrayelem->elemtyplen > 0
! && target->dtype == PLPGSQL_DTYPE_VAR && ((PLpgSQL_var *)target)->freeval
! && !*isNull)
! {
! refptr = array_ref_ref(oldarrayval,
nsubscripts,
subscriptvals,
arrayelem->arraytyplen,
arrayelem->elemtyplen,
arrayelem->elemtypbyval,
! arrayelem->elemtypalign,
! isNull);
! if (!*isNull)
! {
! //elog(NOTICE, "try to fast update");
! fast_update = true;
! memcpy(refptr, &coerced_value, arrayelem->elemtyplen);
! break;
! }
! }
!
! if (!fast_update)
! newarrayval = array_set(oldarrayval,
! nsubscripts,
! subscriptvals,
! coerced_value,
! *isNull,
! arrayelem->arraytyplen,
! arrayelem->elemtyplen,
! arrayelem->elemtypbyval,
! arrayelem->elemtypalign);
MemoryContextSwitchTo(oldcontext);
Hello
I am sending little bit cleaned patch
there is significant speedup (on example from bug report) - more than 100x
on my Dell D830
postgres=# select fill_2d_array(300,300,1);
fill_2d_array
───────────────
90000
(1 row)
Time: 751.572 ms -- patched
postgres=# \q
bash-4.1$ psql postgres
psql (9.4devel)
Type "help" for help.
postgres=# select fill_2d_array(300,300,2);
fill_2d_array
───────────────
90000
(1 row)
Time: 87453.351 ms -- original
I checked a result and it is same.
Probably there are some issues, because domain tests fails, but these
numbers shows so a significantly faster array update is possible.
Interesting - I did a profiling and I didn't find anything interesting :(
Regards
Pavel
2013/10/3 Tom Lane <tgl@sss.pgh.pa.us>
Show quoted text
Pavel Stehule <pavel.stehule@gmail.com> writes:
If you can do a update of some array in plpgsql now, then you have to
work
with local copy only. It is a necessary precondition, and I am think it
is
valid.
If the proposal only relates to assignments to elements of plpgsql local
variables, it's probably safe, but it's also probably not of much value.
plpgsql has enough overhead that I'm doubting you'd get much real-world
speedup. I'm also not very excited about putting even more low-level
knowledge about array representation into plpgsql.regards, tom lane
Attachments:
fast-array-update.patchapplication/octet-stream; name=fast-array-update.patchDownload
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 798c92a..e99f264 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -2168,7 +2168,8 @@ pg_extension_config_dump(PG_FUNCTION_ARGS)
-1 /* varlena array */ ,
sizeof(Oid) /* OID's typlen */ ,
true /* OID's typbyval */ ,
- 'i' /* OID's typalign */ );
+ 'i' /* OID's typalign */ ,
+ false /* no inplace update */ );
}
repl_val[Anum_pg_extension_extconfig - 1] = PointerGetDatum(a);
repl_repl[Anum_pg_extension_extconfig - 1] = true;
@@ -2206,7 +2207,8 @@ pg_extension_config_dump(PG_FUNCTION_ARGS)
-1 /* varlena array */ ,
-1 /* TEXT's typlen */ ,
false /* TEXT's typbyval */ ,
- 'i' /* TEXT's typalign */ );
+ 'i' /* TEXT's typalign */ ,
+ false /* no inplace update */ );
}
repl_val[Anum_pg_extension_extcondition - 1] = PointerGetDatum(a);
repl_repl[Anum_pg_extension_extcondition - 1] = true;
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..6d0d590 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -456,7 +456,8 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
astate->refattrlength,
astate->refelemlength,
astate->refelembyval,
- astate->refelemalign);
+ astate->refelemalign,
+ false);
else
resultArray = array_set_slice(array_source, i,
upper.indx, lower.indx,
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 7ce1cce..369ea0a 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -147,7 +147,7 @@ array_push(PG_FUNCTION_ARGS)
typalign = my_extra->typalign;
result = array_set(v, 1, &indx, newelem, isNull,
- -1, typlen, typbyval, typalign);
+ -1, typlen, typbyval, typalign, false);
/*
* Readjust result's LB to match the input's. This does nothing in the
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 438c3d0..f0f7071 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -2023,6 +2023,7 @@ array_get_slice(ArrayType *array,
* elmlen: pg_type.typlen for the array's element type
* elmbyval: pg_type.typbyval for the array's element type
* elmalign: pg_type.typalign for the array's element type
+ * inplace_update: true, when is possible try to direct update
*
* Result:
* A new array is returned, just like the old except for the one
@@ -2045,7 +2046,8 @@ array_set(ArrayType *array,
int arraytyplen,
int elmlen,
bool elmbyval,
- char elmalign)
+ char elmalign,
+ bool inplace_update)
{
ArrayType *newarray;
int i,
@@ -2208,11 +2210,22 @@ array_set(ArrayType *array,
}
else
{
+ bool origin_is_null;
+
offset = ArrayGetOffset(nSubscripts, dim, lb, indx);
elt_ptr = array_seek(ARR_DATA_PTR(array), 0, oldnullbitmap, offset,
elmlen, elmbyval, elmalign);
+
+ origin_is_null = array_get_isnull(oldnullbitmap, offset);
+
+ if (inplace_update && !isNull && !origin_is_null)
+ {
+ ArrayCastAndSet(dataValue, elmlen, elmbyval, elmalign, elt_ptr);
+ return array;
+ }
+
lenbefore = (int) (elt_ptr - ARR_DATA_PTR(array));
- if (array_get_isnull(oldnullbitmap, offset))
+ if (origin_is_null)
olditemlen = 0;
else
{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3107f9c..0d85953 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -7959,7 +7959,8 @@ GUCArrayAdd(ArrayType *array, const char *name, const char *value)
-1 /* varlena array */ ,
-1 /* TEXT's typlen */ ,
false /* TEXT's typbyval */ ,
- 'i' /* TEXT's typalign */ );
+ 'i' /* TEXT's typalign */ ,
+ false /* no inplace update */ );
}
else
a = construct_array(&datum, 1,
@@ -8029,7 +8030,8 @@ GUCArrayDelete(ArrayType *array, const char *name)
-1 /* varlenarray */ ,
-1 /* TEXT's typlen */ ,
false /* TEXT's typbyval */ ,
- 'i' /* TEXT's typalign */ );
+ 'i' /* TEXT's typalign */ ,
+ false /* no inplace update */ );
else
newarray = construct_array(&d, 1,
TEXTOID,
@@ -8097,7 +8099,8 @@ GUCArrayReset(ArrayType *array)
-1 /* varlenarray */ ,
-1 /* TEXT's typlen */ ,
false /* TEXT's typbyval */ ,
- 'i' /* TEXT's typalign */ );
+ 'i' /* TEXT's typalign */ ,
+ false /* no inplace update */ );
else
newarray = construct_array(&d, 1,
TEXTOID,
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 95a9249..fcbf84e 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -219,7 +219,8 @@ extern Datum array_ref(ArrayType *array, int nSubscripts, int *indx,
bool *isNull);
extern ArrayType *array_set(ArrayType *array, int nSubscripts, int *indx,
Datum dataValue, bool isNull,
- int arraytyplen, int elmlen, bool elmbyval, char elmalign);
+ int arraytyplen, int elmlen, bool elmbyval, char elmalign,
+ bool inplace_update);
extern ArrayType *array_get_slice(ArrayType *array, int nSubscripts,
int *upperIndx, int *lowerIndx,
int arraytyplen, int elmlen, bool elmbyval, char elmalign);
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 3b2919c..fe4b265 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4175,6 +4175,7 @@ exec_assign_value(PLpgSQL_execstate *estate,
ArrayType *newarrayval;
SPITupleTable *save_eval_tuptable;
MemoryContext oldcontext;
+ bool inplace_update;
/*
* We need to do subscript evaluation, which might require
@@ -4321,6 +4322,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
oldarrayval = (ArrayType *) DatumGetPointer(oldarraydatum);
/*
+ * support fast update for array scalar variable is enabled only
+ * when target is a scalar variable and variable holds a local
+ * copy of some array.
+ */
+ inplace_update = (((PLpgSQL_datum *) target)->dtype == PLPGSQL_DTYPE_VAR
+ && ((PLpgSQL_var *) target)->freeval);
+
+ /*
* Build the modified array value.
*/
newarrayval = array_set(oldarrayval,
@@ -4331,7 +4340,8 @@ exec_assign_value(PLpgSQL_execstate *estate,
arrayelem->arraytyplen,
arrayelem->elemtyplen,
arrayelem->elemtypbyval,
- arrayelem->elemtypalign);
+ arrayelem->elemtypalign,
+ inplace_update);
MemoryContextSwitchTo(oldcontext);
@@ -4340,11 +4350,15 @@ exec_assign_value(PLpgSQL_execstate *estate,
* at this point. Note that if the target is a domain,
* coercing the base array type back up to the domain will
* happen within exec_assign_value.
+ *
+ * The newarrayval and oldarrayval can be identitic as result
+ * of implace_update.
*/
*isNull = false;
- exec_assign_value(estate, target,
- PointerGetDatum(newarrayval),
- arrayelem->arraytypoid, isNull);
+ if (newarrayval != oldarrayval)
+ exec_assign_value(estate, target,
+ PointerGetDatum(newarrayval),
+ arrayelem->arraytypoid, isNull);
break;
}
Hello
I fixed patch - there was a missing cast to domain when it was used (and
all regress tests are ok now).
a some performance tests
set array_fast_update to off;
postgres=# select fill_2d_array(300,300);
fill_2d_array
───────────────
90000
(1 row)
Time: 33570.087 ms
postgres=# set array_fast_update to on;
SET
Time: 0.279 ms
postgres=# select fill_2d_array(300,300);
fill_2d_array
───────────────
90000
(1 row)
Time: 505.202 ms
CREATE OR REPLACE FUNCTION public.quicksort(l integer, r integer, a
integer[])
RETURNS integer[]
LANGUAGE plpgsql
IMMUTABLE STRICT
AS $function$
DECLARE akt int[] = a;
i integer := l; j integer := r; x integer = akt[(l+r) / 2];
w integer;
BEGIN
LOOP
WHILE akt[i] < x LOOP i := i + 1; END LOOP;
WHILE x < akt[j] loop j := j - 1; END LOOP;
IF i <= j THEN
w := akt[i];
akt[i] := akt[j]; akt[j] := w;
i := i + 1; j := j - 1;
END IF;
EXIT WHEN i > j;
END LOOP;
IF l < j THEN akt := quicksort(l,j,akt); END IF;
IF i < r then akt := quicksort(i,r,akt); END IF;
RETURN akt;
END;
$function$
postgres=# set array_fast_update to off;
SET
Time: 0.282 ms
postgres=# SELECT array_upper(quicksort(1,21000,array_agg(a)),1) FROM test;
array_upper
─────────────
21000
(1 row)
Time: 5086.781 ms
postgres=# set array_fast_update to on;
SET
Time: 0.702 ms
postgres=# SELECT array_upper(quicksort(1,21000,array_agg(a)),1) FROM test;
array_upper
─────────────
21000
(1 row)
Time: 1751.920 ms
So in first test - fast update is 66x faster, second test - 3x faster
I found so for small arrays (about 1000 fields) a difference is not
significant.
This code can be cleaned (a domains can be better optimized generally - a
IO cast can be expensive for large arrays and domain_check can be used
there instead), but it is good enough for discussion if we would this
optimization or not.
Regards
Pavel
2013/10/3 Pavel Stehule <pavel.stehule@gmail.com>
Show quoted text
Hello
a very ugly test shows a possibility about 100% speedup on reported
example (on small arrays, a patch is buggy and doesn't work for larger
arrays).I updated a code to be read only
CREATE OR REPLACE FUNCTION public.fill_2d_array(rows integer, cols integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
DECLARE
img double precision[][];
i integer; j integer;
cont integer; r double precision;
BEGIN
img := ARRAY( SELECT 0 FROM generate_series(1, rows * cols) ) ;
cont:= 0;
For i IN 1..rows LOOP
For j IN 1..cols LOOP r := img[i * cols + j];
r := (i * cols + j)::double precision;
cont := cont + 1; --raise notice '%', img;
END LOOP;
END LOOP;
return cont;
END;
$function$It exec all expressions
-- original
postgres=# select fill_2d_array(200,200);
fill_2d_array
---------------
40000
(1 row)Time: 12726.117 ms
-- read only version
postgres=# select fill_2d_array(200,200); fill_2d_array
---------------
40000
(1 row)Time: 245.894 ms
so there is about 50x slowdown
2013/10/3 Pavel Stehule <pavel.stehule@gmail.com>
2013/10/3 Tom Lane <tgl@sss.pgh.pa.us>
Pavel Stehule <pavel.stehule@gmail.com> writes:
If you can do a update of some array in plpgsql now, then you have to
work
with local copy only. It is a necessary precondition, and I am think
it is
valid.
If the proposal only relates to assignments to elements of plpgsql local
variables, it's probably safe, but it's also probably not of much value.
plpgsql has enough overhead that I'm doubting you'd get much real-world
speedup. I'm also not very excited about putting even more low-level
knowledge about array representation into plpgsql.I looked to code, and I am thinking so this can be done inside array
related routines. We just have to signalize request for inplace update (if
we have a local copy).I have not idea, how significant speedup can be (if any), but current
behave is not friendly (and for multidimensional arrays there are no
workaround), so it is interesting way - and long time I though about some
similar optimization.Regards
Pavel
regards, tom lane
Attachments:
fast-array-update.patchapplication/octet-stream; name=fast-array-update.patchDownload
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 798c92a..e99f264 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -2168,7 +2168,8 @@ pg_extension_config_dump(PG_FUNCTION_ARGS)
-1 /* varlena array */ ,
sizeof(Oid) /* OID's typlen */ ,
true /* OID's typbyval */ ,
- 'i' /* OID's typalign */ );
+ 'i' /* OID's typalign */ ,
+ false /* no inplace update */ );
}
repl_val[Anum_pg_extension_extconfig - 1] = PointerGetDatum(a);
repl_repl[Anum_pg_extension_extconfig - 1] = true;
@@ -2206,7 +2207,8 @@ pg_extension_config_dump(PG_FUNCTION_ARGS)
-1 /* varlena array */ ,
-1 /* TEXT's typlen */ ,
false /* TEXT's typbyval */ ,
- 'i' /* TEXT's typalign */ );
+ 'i' /* TEXT's typalign */ ,
+ false /* no inplace update */ );
}
repl_val[Anum_pg_extension_extcondition - 1] = PointerGetDatum(a);
repl_repl[Anum_pg_extension_extcondition - 1] = true;
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 90c2753..6d0d590 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -456,7 +456,8 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
astate->refattrlength,
astate->refelemlength,
astate->refelembyval,
- astate->refelemalign);
+ astate->refelemalign,
+ false);
else
resultArray = array_set_slice(array_source, i,
upper.indx, lower.indx,
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 7ce1cce..369ea0a 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -147,7 +147,7 @@ array_push(PG_FUNCTION_ARGS)
typalign = my_extra->typalign;
result = array_set(v, 1, &indx, newelem, isNull,
- -1, typlen, typbyval, typalign);
+ -1, typlen, typbyval, typalign, false);
/*
* Readjust result's LB to match the input's. This does nothing in the
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 438c3d0..ad7e414 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -28,9 +28,10 @@
/*
- * GUC parameter
+ * GUC parameters
*/
bool Array_nulls = true;
+bool enable_fast_array_update = false;
/*
* Local definitions
@@ -2023,6 +2024,7 @@ array_get_slice(ArrayType *array,
* elmlen: pg_type.typlen for the array's element type
* elmbyval: pg_type.typbyval for the array's element type
* elmalign: pg_type.typalign for the array's element type
+ * inplace_update: true, when is possible try to direct update
*
* Result:
* A new array is returned, just like the old except for the one
@@ -2045,7 +2047,8 @@ array_set(ArrayType *array,
int arraytyplen,
int elmlen,
bool elmbyval,
- char elmalign)
+ char elmalign,
+ bool inplace_update)
{
ArrayType *newarray;
int i,
@@ -2208,11 +2211,22 @@ array_set(ArrayType *array,
}
else
{
+ bool origin_is_null;
+
offset = ArrayGetOffset(nSubscripts, dim, lb, indx);
elt_ptr = array_seek(ARR_DATA_PTR(array), 0, oldnullbitmap, offset,
elmlen, elmbyval, elmalign);
+
+ origin_is_null = array_get_isnull(oldnullbitmap, offset);
+
+ if (inplace_update && !isNull && !origin_is_null && enable_fast_array_update)
+ {
+ ArrayCastAndSet(dataValue, elmlen, elmbyval, elmalign, elt_ptr);
+ return array;
+ }
+
lenbefore = (int) (elt_ptr - ARR_DATA_PTR(array));
- if (array_get_isnull(oldnullbitmap, offset))
+ if (origin_is_null)
olditemlen = 0;
else
{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3107f9c..f4fc263 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1246,6 +1246,16 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
+ /* remove from final version! */
+ {"array_fast_update", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
+ gettext_noop("Enable inplace array update."),
+ NULL
+ },
+ &enable_fast_array_update,
+ true,
+ NULL, NULL, NULL
+ },
+ {
{"default_with_oids", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
gettext_noop("Create new tables with OIDs by default."),
NULL
@@ -7959,7 +7969,8 @@ GUCArrayAdd(ArrayType *array, const char *name, const char *value)
-1 /* varlena array */ ,
-1 /* TEXT's typlen */ ,
false /* TEXT's typbyval */ ,
- 'i' /* TEXT's typalign */ );
+ 'i' /* TEXT's typalign */ ,
+ false /* no inplace update */ );
}
else
a = construct_array(&datum, 1,
@@ -8029,7 +8040,8 @@ GUCArrayDelete(ArrayType *array, const char *name)
-1 /* varlenarray */ ,
-1 /* TEXT's typlen */ ,
false /* TEXT's typbyval */ ,
- 'i' /* TEXT's typalign */ );
+ 'i' /* TEXT's typalign */ ,
+ false /* no inplace update */ );
else
newarray = construct_array(&d, 1,
TEXTOID,
@@ -8097,7 +8109,8 @@ GUCArrayReset(ArrayType *array)
-1 /* varlenarray */ ,
-1 /* TEXT's typlen */ ,
false /* TEXT's typbyval */ ,
- 'i' /* TEXT's typalign */ );
+ 'i' /* TEXT's typalign */ ,
+ false /* no inplace update */ );
else
newarray = construct_array(&d, 1,
TEXTOID,
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 95a9249..e31689b 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -219,7 +219,8 @@ extern Datum array_ref(ArrayType *array, int nSubscripts, int *indx,
bool *isNull);
extern ArrayType *array_set(ArrayType *array, int nSubscripts, int *indx,
Datum dataValue, bool isNull,
- int arraytyplen, int elmlen, bool elmbyval, char elmalign);
+ int arraytyplen, int elmlen, bool elmbyval, char elmalign,
+ bool inplace_update);
extern ArrayType *array_get_slice(ArrayType *array, int nSubscripts,
int *upperIndx, int *lowerIndx,
int arraytyplen, int elmlen, bool elmbyval, char elmalign);
@@ -296,4 +297,7 @@ extern Datum array_agg_finalfn(PG_FUNCTION_ARGS);
*/
extern Datum array_typanalyze(PG_FUNCTION_ARGS);
+/* only for developing purpouses - drop from final version! */
+extern bool enable_fast_array_update;
+
#endif /* ARRAY_H */
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 3b2919c..c938da9 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4175,6 +4175,7 @@ exec_assign_value(PLpgSQL_execstate *estate,
ArrayType *newarrayval;
SPITupleTable *save_eval_tuptable;
MemoryContext oldcontext;
+ bool inplace_update;
/*
* We need to do subscript evaluation, which might require
@@ -4321,6 +4322,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
oldarrayval = (ArrayType *) DatumGetPointer(oldarraydatum);
/*
+ * support fast update for array scalar variable is enabled only
+ * when target is a scalar variable and variable holds a local
+ * copy of some array.
+ */
+ inplace_update = (((PLpgSQL_datum *) target)->dtype == PLPGSQL_DTYPE_VAR
+ && ((PLpgSQL_var *) target)->freeval);
+
+ /*
* Build the modified array value.
*/
newarrayval = array_set(oldarrayval,
@@ -4331,7 +4340,8 @@ exec_assign_value(PLpgSQL_execstate *estate,
arrayelem->arraytyplen,
arrayelem->elemtyplen,
arrayelem->elemtypbyval,
- arrayelem->elemtypalign);
+ arrayelem->elemtypalign,
+ inplace_update);
MemoryContextSwitchTo(oldcontext);
@@ -4340,11 +4350,34 @@ exec_assign_value(PLpgSQL_execstate *estate,
* at this point. Note that if the target is a domain,
* coercing the base array type back up to the domain will
* happen within exec_assign_value.
+ *
+ * The newarrayval and oldarrayval can be identitic as result
+ * of implace_update.
*/
*isNull = false;
- exec_assign_value(estate, target,
- PointerGetDatum(newarrayval),
- arrayelem->arraytypoid, isNull);
+
+ if (newarrayval != oldarrayval)
+ exec_assign_value(estate, target,
+ PointerGetDatum(newarrayval),
+ arrayelem->arraytypoid, isNull);
+ else
+ {
+ PLpgSQL_var *var = (PLpgSQL_var *) target;
+ Datum newvalue;
+
+ /* we should to recheck domain */
+ Assert(inplace_update);
+ Assert(((PLpgSQL_datum *) target)->dtype == PLPGSQL_DTYPE_VAR);
+
+ newvalue = exec_cast_value(estate,
+ var->value,
+ arrayelem->arraytypoid,
+ var->datatype->typoid,
+ &(var->datatype->typinput),
+ var->datatype->typioparam,
+ var->datatype->atttypmod,
+ false);
+ }
break;
}
On 2013-10-07 16:00:54 +0200, Pavel Stehule wrote:
/*
* We need to do subscript evaluation, which might require
@@ -4321,6 +4322,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
oldarrayval = (ArrayType *) DatumGetPointer(oldarraydatum);/* + * support fast update for array scalar variable is enabled only + * when target is a scalar variable and variable holds a local + * copy of some array. + */ + inplace_update = (((PLpgSQL_datum *) target)->dtype == PLPGSQL_DTYPE_VAR + && ((PLpgSQL_var *) target)->freeval); + + /* * Build the modified array value. */
Will this recognize if the local Datum is just a reference to an
external toast Datum (i.e. varattrib_1b_e)?
I don't know much about plpgsql's implementation, so please excuse if
the question is stupid.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/10/7 Andres Freund <andres@2ndquadrant.com>
On 2013-10-07 16:00:54 +0200, Pavel Stehule wrote:
/*
* We need to do subscript evaluation,which might require
@@ -4321,6 +4322,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
oldarrayval = (ArrayType *)DatumGetPointer(oldarraydatum);
/*
+ * support fast update for array scalarvariable is enabled only
+ * when target is a scalar variable and
variable holds a local
+ * copy of some array. + */ + inplace_update = (((PLpgSQL_datum *)target)->dtype == PLPGSQL_DTYPE_VAR
+ &&
((PLpgSQL_var *) target)->freeval);
+ + /* * Build the modified array value. */Will this recognize if the local Datum is just a reference to an
external toast Datum (i.e. varattrib_1b_e)?
this works on plpgsql local copies only (when cleaning is controlled by
plpgsql) - so it should be untoasted every time.
btw when I tested this patch I found a second plpgsql array issue -
repeated untoasting :( and we should not forget it.
Show quoted text
I don't know much about plpgsql's implementation, so please excuse if
the question is stupid.Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
I started reviewing this patch. Gone through the mail thread discussion to
understand the need of the patch. With patch there is significant
performance
improvement in case of update for the array scalar variable.
- Patch gets apply cleanly on master branch
- make, make install and make check works fine
Did code walk through, code change looks good to me. I was doing some
testing
in the area of scalar variable array and domain type. For the big array size
noticed significant performance improvement. So far haven't found any issues
with the code.
Patch looks good to me.
Just FYI.. Do need to remove array_fast_update GUC in the final version of
commit.
Regards,
Rushabh
On Mon, Oct 7, 2013 at 7:52 PM, Pavel Stehule <pavel.stehule@gmail.com>wrote:
2013/10/7 Andres Freund <andres@2ndquadrant.com>
On 2013-10-07 16:00:54 +0200, Pavel Stehule wrote:
/*
* We need to do subscript evaluation,which might require
@@ -4321,6 +4322,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
oldarrayval = (ArrayType *)DatumGetPointer(oldarraydatum);
/*
+ * support fast update for array scalarvariable is enabled only
+ * when target is a scalar variable and
variable holds a local
+ * copy of some array. + */ + inplace_update = (((PLpgSQL_datum *)target)->dtype == PLPGSQL_DTYPE_VAR
+ &&
((PLpgSQL_var *) target)->freeval);
+ + /* * Build the modified array value. */Will this recognize if the local Datum is just a reference to an
external toast Datum (i.e. varattrib_1b_e)?this works on plpgsql local copies only (when cleaning is controlled by
plpgsql) - so it should be untoasted every time.btw when I tested this patch I found a second plpgsql array issue -
repeated untoasting :( and we should not forget it.I don't know much about plpgsql's implementation, so please excuse if
the question is stupid.Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Rushabh Lathia
(sorry for possible duplicates, used wrong account on first try)
On 07.10.2013 17:00, Pavel Stehule wrote:
Hello
I fixed patch - there was a missing cast to domain when it was used (and
all regress tests are ok now).
This doesn't work correctly for varlen datatypes. I modified your
quicksort example to work with text[], and got this:
postgres=# SELECT
array_upper(quicksort(1,20000,array_agg((g::text))),1) FROM
generate_series(1,20000) g;
ERROR: invalid input syntax for integer: ""
CONTEXT: PL/pgSQL function quicksort(integer,integer,text[]) line 10 at
assignment
PL/pgSQL function quicksort(integer,integer,text[]) line 16 at assignment
The handling of array domains is also wrong. You replace the new value
to the array and only check the domain constraint after that. If the
constraint is violated, it's too late to roll that back. For example:
postgres=# create domain posarray as int[] check (value[1] > 0);
CREATE DOMAIN
postgres=# do $$
declare
iarr posarray;
begin
begin
iarr[1] := 1;
iarr[1] := -1;
exception when others then raise notice 'got error: %', sqlerrm; end;
raise notice 'value: %', iarr[1];
end;
$$;
NOTICE: got error: value for domain posarray violates check constraint
"posarray_check"
NOTICE: value: -1
DO
Without the patch, it prints 1, but with the patch, -1.
In general, I'm not convinced this patch is worth the trouble. The
speedup isn't all that great; manipulating large arrays in PL/pgSQL is
still so slow that if you care about performance you'll want to rewrite
your function in some other language or use temporary tables. And you
only get a gain with arrays of fixed-length element type with no NULLs.
So I think we should drop this patch in its current form. If we want to
make array manipulation in PL/pgSQL, I think we'll have to do something
similar to how we handle "row" variables, or something else entirely.
But that's a much bigger patch, and I'm not sure it's worth the trouble
either.
Looking at perf profile and the functions involved, though, I see some
low-hanging fruit: in array_set, the new array is allocated with
palloc0'd, but we only need to zero out a few alignment bytes where the
new value is copied. Replacing it with palloc() will save some cycles,
per the attached patch. Nowhere near as much as your patch, but this is
pretty uncontroversial.
- Heikki
Attachments:
avoid-unnecessary-zeroing-in-array_set-1.patchtext/x-diff; name=avoid-unnecessary-zeroing-in-array_set-1.patchDownload
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 438c3d0..ced41f0 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -2235,7 +2235,7 @@ array_set(ArrayType *array,
/*
* OK, create the new array and fill in header/dimensions
*/
- newarray = (ArrayType *) palloc0(newsize);
+ newarray = (ArrayType *) palloc(newsize);
SET_VARSIZE(newarray, newsize);
newarray->ndim = ndim;
newarray->dataoffset = newhasnulls ? overheadlen : 0;
@@ -2250,8 +2250,12 @@ array_set(ArrayType *array,
(char *) array + oldoverheadlen,
lenbefore);
if (!isNull)
+ {
+ /* zero out any padding in the slot reserved for the new item */
+ memset((char *) newarray + overheadlen + lenbefore, 0, newitemlen);
ArrayCastAndSet(dataValue, elmlen, elmbyval, elmalign,
(char *) newarray + overheadlen + lenbefore);
+ }
memcpy((char *) newarray + overheadlen + lenbefore + newitemlen,
(char *) array + oldoverheadlen + lenbefore + olditemlen,
lenafter);
Heikki Linnakangas <heikki@localhost.vmware.com> writes:
In general, I'm not convinced this patch is worth the trouble. The
speedup isn't all that great; manipulating large arrays in PL/pgSQL is
still so slow that if you care about performance you'll want to rewrite
your function in some other language or use temporary tables. And you
only get a gain with arrays of fixed-length element type with no NULLs.
So I think we should drop this patch in its current form. If we want to
make array manipulation in PL/pgSQL, I think we'll have to do something
similar to how we handle "row" variables, or something else entirely.
I think that this area would be a fruitful place to make use of the
noncontiguous datatype storage ideas that we were discussing with the
PostGIS guys recently. I agree that tackling it in the context of plpgsql
alone is not a good way to go at it.
I'm not saying this in a vacuum of information, either. Some of the guys
at Salesforce have been poking at noncontiguous storage for arrays and
have gotten nice speedups --- but their patch is for plpgsql only and
only addresses arrays, which makes it enough of a kluge that I've not
wanted to bring it to the community. I think we should work towards
a general solution instead.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: 5293BEB7.1060101@localhost
2013/11/25 Heikki Linnakangas <heikki@localhost.vmware.com>
On 07.10.2013 17:00, Pavel Stehule wrote:
Hello
I fixed patch - there was a missing cast to domain when it was used (and
all regress tests are ok now).This doesn't work correctly for varlen datatypes. I modified your
quicksort example to work with text[], and got this:
postgres=# SELECT array_upper(quicksort(1,20000,array_agg((g::text))),1)
FROM generate_series(1,20000) g;
ERROR: invalid input syntax for integer: " "
CONTEXT: PL/pgSQL function quicksort(integer,integer,text[]) line 10 at
assignment
PL/pgSQL function quicksort(integer,integer,text[]) line 16 at assignmentThe handling of array domains is also wrong. You replace the new value to
the array and only check the domain constraint after that. If the
constraint is violated, it's too late to roll that back. For example:
both are bug, and should be fixed
postgres=# create domain posarray as int[] check (value[1] > 0);
CREATE DOMAIN
postgres=# do $$
declare
iarr posarray;
begin
begin
iarr[1] := 1;
iarr[1] := -1;
exception when others then raise notice 'got error: %', sqlerrm; end;
raise notice 'value: %', iarr[1];
end;
$$;
NOTICE: got error: value for domain posarray violates check constraint
"posarray_check"
NOTICE: value: -1
DOWithout the patch, it prints 1, but with the patch, -1.
In general, I'm not convinced this patch is worth the trouble. The speedup
isn't all that great; manipulating large arrays in PL/pgSQL is still so
slow that if you care about performance you'll want to rewrite your
function in some other language or use temporary tables. And you only get a
gain with arrays of fixed-length element type with no NULLs.
I understand to your objection, but I disagree so particular solution is
not enough. Update of fixed array should be fast - for this task there are
no any workaround in plpgsql, so using a array as fast cache for
calculation has some (sometimes very high) hidden costs. A very advanced
users can use a different PL or can prepare C extensions, but for common
users, this cost is hidden and unsolvable.
So I think we should drop this patch in its current form. If we want to
make array manipulation in PL/pgSQL, I think we'll have to do something
similar to how we handle "row" variables, or something else entirely. But
that's a much bigger patch, and I'm not sure it's worth the trouble either.
Sorry, I disagree again - using same mechanism for arrays of fixed types
and arrays of varlena types should be ineffective. Slow part of this area
is manipulation with large memory blocks - there we fighting with physical
CPU and access to memory limits.
Now, I know about few significant PL/pgSQL issues related to manipulation
with variables
* repeated detoast
* missing preallocation
* low effectiveness manipulation with large arrays
* low effectiveness with casting (using IO casting instead binary)
A final solution of these issues needs redesign of work with variables in
PL/pgSQL - some in Perl style (maybe). It can be nice, but I don't know
somebody who will work on this early.
I would to concentrate to integration of plpgsql_check to core - and maybe
later I can work on this topic (maybe after 2 years).
So any particular solution (should not be mine) can serve well two three
years. This code is 100% hidden on top, and we can change it safely, so
particular solution can be enough.
Regards
Pavel
Show quoted text
Looking at perf profile and the functions involved, though, I see some
low-hanging fruit: in array_set, the new array is allocated with palloc0'd,
but we only need to zero out a few alignment bytes where the new value is
copied. Replacing it with palloc() will save some cycles, per the attached
patch. Nowhere near as much as your patch, but this is pretty
uncontroversial.- Heikki
Import Notes
Reply to msg id not found: 5293BEB7.1060101@localhost
2013/11/25 Tom Lane <tgl@sss.pgh.pa.us>
Heikki Linnakangas <heikki@localhost.vmware.com> writes:
In general, I'm not convinced this patch is worth the trouble. The
speedup isn't all that great; manipulating large arrays in PL/pgSQL is
still so slow that if you care about performance you'll want to rewrite
your function in some other language or use temporary tables. And you
only get a gain with arrays of fixed-length element type with no NULLs.So I think we should drop this patch in its current form. If we want to
make array manipulation in PL/pgSQL, I think we'll have to do something
similar to how we handle "row" variables, or something else entirely.I think that this area would be a fruitful place to make use of the
noncontiguous datatype storage ideas that we were discussing with the
PostGIS guys recently. I agree that tackling it in the context of plpgsql
alone is not a good way to go at it.I'm not saying this in a vacuum of information, either. Some of the guys
at Salesforce have been poking at noncontiguous storage for arrays and
have gotten nice speedups --- but their patch is for plpgsql only and
only addresses arrays, which makes it enough of a kluge that I've not
wanted to bring it to the community. I think we should work towards
a general solution instead.
I am for general solution (because these issues are good performance
traps), but a early particular solution can be valuable for lot of users
too - mainly if general solution can carry in two, three years horizon
Regards
Pavel
Show quoted text
regards, tom lane