GSoC 2017: Foreign Key Arrays
Dear PostgreSQL hacker community,
I am working on Foreign Key Arrays as part of the Google Summer of Code
2017.
I will be logging my progress on this thread as I progress, twice a week
(Mondays and Fridays), so anyone who is willing to comment, please do.
*The Problem*
Foreign Key Arrays were introduced by Marco Nenciarini[1]/messages/by-id/1343842863.5162.4.camel@greygoo.devise-it.lan, however, the
proposed patch had some performance issues. Let's assume we have two
tables, table B has a foreign key array that references table A, any change
in table A (INSERT, UPDATE, DELETE) would trigger a referential
integrity check on table B. The current implementation uses sequential
scans to accomplish this. This limits the size of tables using Foreign Key
Arrays to ~100 records which is not practical in real life applications.
*The Proposed Solution*
Ultimately, as proposed by Tom Lane[2]/messages/by-id/28389.1351094795@sss.pgh.pa.us, we would like to replace the
sequential scan with a GIN-indexed scan which would greatly enhance the
performance.
To achieve this, introducing a number of new operators is required.
However, for the scope of the project, we will focus on the most basic case
where the Primary Keys are of pseudo-type anyelement and the Foreign Keys
are of pseudo-type anyarray, thus the main operator of concern will be
@>(anyarray,anyelement).
*Progress So Far*
The actual coding begins on 30th of May, till then I will use my time to
research, to settle the technical details of my plan.
- Collected resources about GIN indexing
- http://www.sigaev.ru/gin/README.txt
- https://wiki.postgresql.org/wiki/GIN_generalization
- src\backend\access\gin\README in the repo
- Cloned the git repo found @ https://github.com/postgres/postgres and
identified the main two files I will be concerned with. (I know I may need
to edit other files but these seem to where I will spend most of my summer)
- src/backend/commands/tablecmds.c
- src/backend/utils/ri_triggers.c
*I am yet to identify the files concerned with the GIN opclass. <-- if
anyone can help with this*
- read a little about op classes
- https://www.postgresql.org/docs/9.5/static/indexes-opclass.html
- explored the existing op classes in Postgres
*Next Step*
I still don't have a solid grasp of how I am going to approach creating an
operator, so I would like to experiment till the next report on creating a
very simple operator.
*I have attached the original proposal here.*
[1]: /messages/by-id/1343842863.5162.4.camel@greygoo.devise-it.lan
/messages/by-id/1343842863.5162.4.camel@greygoo.devise-it.lan
[2]: /messages/by-id/28389.1351094795@sss.pgh.pa.us
Best Regards,
Mark Rofail
Attachments:
On Mon, May 22, 2017 at 7:51 PM, Mark Rofail <markm.rofail@gmail.com> wrote:
Cloned the git repo found @ https://github.com/postgres/postgres and
identified the main two files I will be concerned with. (I know I may need
to edit other files but these seem to where I will spend most of my summer)src/backend/commands/tablecmds.c
src/backend/utils/ri_triggers.cI am yet to identify the files concerned with the GIN opclass. <-- if anyone
can help with this
There's not only one GIN opclass. You can get a list like this:
select oid, * from pg_opclass where opcmethod = 2742;
Actually, you probably want to look for GIN opfamilies:
rhaas=# select oid, * from pg_opfamily where opfmethod = 2742;
oid | opfmethod | opfname | opfnamespace | opfowner
------+-----------+----------------+--------------+----------
2745 | 2742 | array_ops | 11 | 10
3659 | 2742 | tsvector_ops | 11 | 10
4036 | 2742 | jsonb_ops | 11 | 10
4037 | 2742 | jsonb_path_ops | 11 | 10
(4 rows)
To see which SQL functions are used to implement a particular
opfamily, use the OID from the previous step in a query like this:
rhaas=# select prosrc from pg_amop, pg_operator, pg_proc where
amopfamily = 2745 and amopopr = pg_operator.oid and oprcode =
pg_proc.oid;
prosrc
----------------
array_eq
arrayoverlap
arraycontains
arraycontained
(4 rows)
Then, you can look for those in the source tree. You can also search
for the associated support functions, e.g.:
rhaas=# select distinct amprocnum, prosrc from pg_amproc, pg_proc
where amprocfamily = 2745 and amproc = pg_proc.oid order by 1, 2;
amprocnum | prosrc
-----------+-----------------------
1 | bitcmp
1 | bpcharcmp
1 | btabstimecmp
1 | btboolcmp
1 | btcharcmp
1 | btfloat4cmp
1 | btfloat8cmp
1 | btint2cmp
1 | btint4cmp
1 | btint8cmp
1 | btnamecmp
1 | btoidcmp
1 | btoidvectorcmp
1 | btreltimecmp
1 | bttextcmp
1 | bttintervalcmp
1 | byteacmp
1 | cash_cmp
1 | date_cmp
1 | interval_cmp
1 | macaddr_cmp
1 | network_cmp
1 | numeric_cmp
1 | time_cmp
1 | timestamp_cmp
1 | timetz_cmp
2 | ginarrayextract
3 | ginqueryarrayextract
4 | ginarrayconsistent
6 | ginarraytriconsistent
(30 rows)
You might want to read https://www.postgresql.org/docs/devel/static/xindex.html
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
rhaas=# select oid, * from pg_opfamily where opfmethod = 2742;
oid | opfmethod | opfname | opfnamespace | opfowner
------+-----------+----------------+--------------+----------
2745 | 2742 | array_ops | 11 | 10
3659 | 2742 | tsvector_ops | 11 | 10
4036 | 2742 | jsonb_ops | 11 | 10
4037 | 2742 | jsonb_path_ops | 11 | 10
(4 rows)
I am particulary intrested in array_ops but I have failed in locating the
code behind it. Where is it reflected in the source code
Best Regards,
Mark Rofail
Hi, Mark!
On Tue, May 30, 2017 at 2:18 AM, Mark Rofail <markm.rofail@gmail.com> wrote:
rhaas=# select oid, * from pg_opfamily where opfmethod = 2742;
oid | opfmethod | opfname | opfnamespace | opfowner
------+-----------+----------------+--------------+----------
2745 | 2742 | array_ops | 11 | 10
3659 | 2742 | tsvector_ops | 11 | 10
4036 | 2742 | jsonb_ops | 11 | 10
4037 | 2742 | jsonb_path_ops | 11 | 10
(4 rows)I am particulary intrested in array_ops but I have failed in locating the
code behind it. Where is it reflected in the source code
Let's look what particular opclass is consisting of. Besides records in
pg_opfamily, it also contains records in pg_opclass, pg_amproc and pg_amop.
=# select * from pg_opclass where opcfamily = 2745;
opcmethod | opcname | opcnamespace | opcowner | opcfamily | opcintype |
opcdefault | opckeytype
-----------+-----------+--------------+----------+-----------+-----------+------------+------------
2742 | array_ops | 11 | 10 | 2745 | 2277 |
t | 2283
(1 row)
=# select * from pg_amproc where amprocfamily = 2745;
amprocfamily | amproclefttype | amprocrighttype | amprocnum |
amproc
--------------+----------------+-----------------+-----------+----------------------------
2745 | 2277 | 2277 | 2 |
pg_catalog.ginarrayextract
2745 | 2277 | 2277 | 3 |
ginqueryarrayextract
2745 | 2277 | 2277 | 4 |
ginarrayconsistent
2745 | 2277 | 2277 | 6 |
ginarraytriconsistent
(4 rows)
=# select * from pg_amop where amopfamily = 2745;
amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose |
amopopr | amopmethod | amopsortfamily
------------+--------------+---------------+--------------+-------------+---------+------------+----------------
2745 | 2277 | 2277 | 1 | s |
2750 | 2742 | 0
2745 | 2277 | 2277 | 2 | s |
2751 | 2742 | 0
2745 | 2277 | 2277 | 3 | s |
2752 | 2742 | 0
2745 | 2277 | 2277 | 4 | s |
1070 | 2742 | 0
(4 rows)
These records of system catalog are defined in special headers the source
code:
src/include/catalog/pg_amop.h
src/include/catalog/pg_amproc.h
src/include/catalog/pg_opclass.h
src/include/catalog/pg_opfamily.h
These records are written to system catalog during bootstrap process (see
src/backend/catalog/README).
As you can see pg_amproc records refer some procedures. Those procedures
are actually the majority of source code behind of opclass. Those
procedures are defined in src/backend/access/gin/ginarrayproc.c.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
src/include/catalog/pg_amop.h
src/include/catalog/pg_amproc.h
src/include/catalog/pg_opclass.h
src/include/catalog/pg_opfamily.h
Thanks to Alexander's reply I have been able to jump from catalog table to
table till I found the function I was looking for.
My goal is to add a new operator (@>(anyarray,anyelement)) to the (array_ops)
op class.
I am going to post the steps I took to locate the procedure, the following
is the trail of tables I followed.
pg_opfamily
pg_opfamily defines operator families.
Link to docs
<https://www.postgresql.org/docs/devel/static/catalog-pg-opfamily.html>
{
opfmethod; /* index access method opfamily is for */
opfname; /* name of this opfamily */
opfnamespace; /* namespace of this opfamily */
opfowner; /* opfamily owner */
}
gin=# select oid, * from pg_opfamily where opfmethod = 2742;
oid | opfmethod | opfname | opfnamespace | opfowner
------+-----------+----------------+--------------+----------
2745 | 2742 | array_ops | 11 | 10
3659 | 2742 | tsvector_ops | 11 | 10
4036 | 2742 | jsonb_ops | 11 | 10
4037 | 2742 | jsonb_path_ops | 11 | 10
(4 rows)
as this table defines operator families I won't need to modify them.
pg_opclass
pg_opclass defines index access method operator classes.
Link to docs
<https://www.postgresql.org/docs/current/static/catalog-pg-opclass.html>
{
opcmethod; /* index access method opclass is for */
opcname; /* name of this opclass */
opcnamespace; /* namespace of this opclass */
opcowner; /* opclass owner */
opcfamily; /* containing operator family */
opcintype; /* type of data indexed by opclass */
opcdefault; /* T if opclass is default for opcintype
opckeytype; /* type of data in index, or InvalidOid */
}
gin=# select * from pg_opclass where opcfamily = 2745;
opcmethod | opcname | opcnamespace | opcowner | opcfamily | opcintype |
opcdefault | opckeytype
-----------+-----------+--------------+----------+----------
-+-----------+------------+------------
2742 | array_ops | 11 | 10 | 2745 | 2277 | t
| 2283
(1 row)
as this table defines operator classes I won't need to modify them.
this led me to pg_amproc
pg_amproc
pg_amproc stores information about support procedures associated with
access method operator families.
Link to docs
<https://www.postgresql.org/docs/devel/static/catalog-pg-amproc.html>
{
amprocfamily; /* the index opfamily this entry is for */
amproclefttype; /* procedure's left input data type */
amprocrighttype; /* procedure's right input data type */
amprocnum[1]amprocnum refers to this table <https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>; /* support procedure index */
amproc; /* OID of the proc */
}
gin=# select * from pg_amproc where amprocfamily = 2745;
amprocfamily | amproclefttype | amprocrighttype | amprocnum |
amproc
------------------+--------------------+--------------------
-+---------------+----------------------------
2745 | 2277 | 2277
| 2 | pg_catalog.ginarrayextract
2745 | 2277 | 2277
| 3 | ginqueryarrayextract
2745 | 2277 | 2277
| 4 | ginarrayconsistent
2745 | 2277 | 2277
| 6 | ginarraytriconsistent
(4 rows)
[1]: amprocnum refers to this table <https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>
<https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>
as this table defines support procedures I won't need to modify them.
this led me to pg_amop
pg_amop
pg_amop stores information about operators associated with access method
operator families.
Link to docs
<https://www.postgresql.org/docs/devel/static/catalog-pg-amop.html>
{
amopfamily; /* the index opfamily this entry is for */
amoplefttype; /* operator's left input data type */
amoprighttype; /* operator's right input data type */
amopstrategy; /* operator strategy number */
amoppurpose; /* is operator for 's'earch or 'o'rdering? */
amopopr; /* the operator's pg_operator OID */
amopmethod; /* the index access method this entry is for
amopsortfamily; /* ordering opfamily OID, or 0 if search op
}
=# select * from pg_amop where amopfamily = 2745;
amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose |
amopopr | amopmethod | amopsortfamily
------------+--------------+---------------+--------------+-
------------+---------+------------+----------------
2745 | 2277 | 2277 | 1 | s |
2750 | 2742 | 0
2745 | 2277 | 2277 | 2 | s |
2751 | 2742 | 0
2745 | 2277 | 2277 | 3 | s |
2752 | 2742 | 0
2745 | 2277 | 2277 | 4 | s |
1070 | 2742 | 0
(4 rows)
I will need to add a record of my new operator to this table by appending
this line to src/include/catalog/pg_amop.h
DATA(insert (2745 2277 2277 1 s 2750 2742 0 ));
This will result in the following entry
=# select * from pg_amop where amopfamily = 2745;
amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose |
amopopr | amopmethod | amopsortfamily
------------+--------------+---------------+--------------+-
------------+---------+------------+----------------
2745 | 2277 | 2283 | 1 | s |
2750 | 2742 | 0
this led me to pg_operator
pg_operator
pg_operator stores information about operators.
Link to docs
<https://www.postgresql.org/docs/devel/static/catalog-pg-operator.html>
{
oprname; /* name of operator */
oprnamespace; /* OID of namespace containing this oper */
oprowner; /* operator owner */
oprkind; /* 'l', 'r', or 'b' */
oprcanmerge; /* can be used in merge join? */
oprcanhash; /* can be used in hash join? */
oprleft; /* left arg type, or 0 if 'l' oprkind */
oprright; /* right arg type, or 0 if 'r' oprkind */
oprresult; /* result datatype */
oprcom; /* OID of commutator oper, or 0 if none */
oprnegate; /* OID of negator oper, or 0 if none */
oprcode; /* OID of underlying function */
oprrest; /* OID of restriction estimator, or 0 */
oprjoin; /* OID of join estimator, or 0 */
}
postgres=# select * from pg_operator where oid = 2751;
oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash |
oprleft | oprright | oprresult | oprcom | oprnegate | oprcode |
oprrest | oprjoin
---------+--------------+----------+---------+-------------+
------------+---------+----------+-----------+--------+-----
------+---------------+--------------+------------------
@> | 11 | 10 | b | f | f |
2277 | 2277 | 16 | 2752 | 0 | arraycontains |
arraycontsel | arraycontjoinsel
(1 row)
I will need to add a record of my new operator to this table by appending
this line to src/include/catalog/pg_operator.h
However, as this is dependent on the procedure I have yet to create there
are still uknown values
DATA(insert OID = <uniqueProcId> ( "@>" PGNSP PGUID b f f 2277 2283 16
2752 0 arraycontainselem ???? ???? ));
DESCR("contains");
#define OID_ARRAY_CONTAINS_OP <uniqueProcId>
This will lead to this entry
oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash |
oprleft | oprright | oprresult | oprcom | oprnegate | oprcode |
oprrest | oprjoin
---------+--------------+----------+---------+-------------+
------------+---------+----------+-----------+--------+-----
------+---------------+--------------+------------------
@> | 11 | 10 | b | f | f |
2277 | 2283 | 16 | 2752 | 0 | arraycontainselem |
???? | ????
(1 row)
this led me to pg_proc
pg_proc
pg_proc stores information about functions (or procedures)
Link to docs
<https://www.postgresql.org/docs/devel/static/catalog-pg-proc.html>
{
proname; /* procedure name */
pronamespace; /* OID of namespace containing this proc */
proowner; /* procedure owner */
prolang; /* OID of pg_language entry */
procost; /* estimated execution cost */
prorows; /* estimated # of rows out (if proretset) */
provariadic; /* element type of variadic array, or 0 */
protransform; /* transforms calls to it during planning */
proisagg; /* is it an aggregate? */
proiswindow; /* is it a window function? */
prosecdef; /* security definer */
proleakproof; /* is it a leak-proof function? */
proisstrict; /* strict with respect to NULLs? */
proretset; /* returns a set? */
provolatile; /* see PROVOLATILE_ categories below */
proparallel; /* see PROPARALLEL_ categories below */
pronargs; /* number of arguments */
pronargdefaults; /* number of arguments with defaults */
prorettype; /* OID of result type */
proargtypes; /* parameter types (excludes OUT params) */
proallargtypes[1]amprocnum refers to this table <https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>; /* all param types (NULL if IN only) */
proargmodes[1]amprocnum refers to this table <https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>; /* parameter modes (NULL if IN only) */
proargnames[1]amprocnum refers to this table <https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>; /* parameter names (NULL if no names) */
proargdefaults; /* list of expression trees for argument
protrftypes[1]amprocnum refers to this table <https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>; /* types for which to apply transforms */
prosrc; /* procedure source text */
probin; /* secondary procedure info (can be NULL) */
proconfig[1]amprocnum refers to this table <https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>; /* procedure-local GUC settings */
proacl[1]amprocnum refers to this table <https://www.postgresql.org/docs/10/static/xindex.html#xindex-gin-support-table>; /* access permissions */
}
postgres=# select * from pg_proc where oid = 2748;
proname | pronamespace | proowner | prolang | procost | prorows |
provariadic | protransform | proisagg | proiswindow | prosecdef |
proleakproof | proisstrict | proretset | provolatile | proparallel | pron
args | pronargdefaults | prorettype | proargtypes | proallargtypes |
proargmodes | proargnames | proargdefaults | protrftypes | prosrc |
probin | proconfig | proacl
---------------+--------------+----------+---------+--------
-+---------+-------------+--------------+----------+--------
-----+-----------+--------------+-------------+-----------+-
------------+-------------+-----
-----+-----------------+------------+-------------+---------
-------+-------------+-------------+----------------+-------
------+---------------+--------+-----------+--------
arraycontains | 11 | 10 | 12 | 1 | 0 |
0 | - | f | f | f | f
| t | f | i | s |
2 | 0 | 16 | 2277 2277 | |
| | | | arraycontains |
| |
(1 row)
I have yet to study this table thoroughly.
This finally led me to the arraycontains procedure in src/backend/utils/adt/
arrayfuncs.c
Datum
arraycontains(PG_FUNCTION_ARGS)
{
AnyArrayType *array1 = PG_GETARG_ANY_ARRAY(0);
AnyArrayType *array2 = PG_GETARG_ANY_ARRAY(1);
Oid collation = PG_GET_COLLATION();
bool result;
result = array_contain_compare(array2, array1, collation, true,
&fcinfo->flinfo->fn_extra);
/* Avoid leaking memory when handed toasted input. */
AARR_FREE_IF_COPY(array1, 0);
AARR_FREE_IF_COPY(array2, 1);
PG_RETURN_BOOL(result);
}
This corrosponds to the operator @<(anyarray, anyarray) which is the
generalised form of my proposed operator @<(anyarray, anyelement).
Studying the syntax will help me produce a function that follows the
postgres style rules.
• After finding the arraycontains function, I implemented
arraycontainselem that corresponds to the operator @<(anyarray,
anyelem)
◦ Please read the attached patch file to view my progress.
• In addition to src/backend/utils/adt/arrayfuncs.c where I
implemented arraycontainselem.
◦ I also edited pg_amop (src/include/catalog/pg_amop.h) since
it stores information about operators associated with access method
operator families.
+DATA(insert ( 2745 2277 2283 2 s 2753 2742 0 ));
{
2745: Oid amopfamily; (denotes gin array_ops)
277: Oid amoplefttype; (denotes anyaray)
2283: Oid amoprighttype; (denotes anyelem)
5: int16 amopstrategy; /* operator strategy number */ (denotes the new
startegy that is yet to be created)
's': char amoppurpose; (denotes 's' for search)
2753: Oid amopopr; (denotes the new operator Oid)
2742: Oid amopmethod;(denotes gin)
0: Oid amopsortfamily; (0 since search operator)
}
◦ And pg_operator (src/include/catalog/pg_operator.h) since it
stores information about operators.
+DATA(insert OID = 2753 ( "@>" PGNSP PGUID b f f 2277 2283 16 0 0
arraycontainselem 0 0 ));
{
"@>": NameData oprname; /* name of operator */
Oid oprnamespace; /* OID of namespace containing this oper */
Oid oprowner; /* operator owner */
'b': char oprkind; /* 'l', 'r', or 'b' */ (denotes infix)
'f': bool oprcanmerge; /* can be used in merge join? */
'f': bool oprcanhash; /* can be used in hash join? */
277: Oid oprleft; (denotes anyaray)
2283: Oid oprright; (denotes anyelem)
16: Oid oprresult; (denotes boolean)
0: Oid oprcom; /* OID of commutator oper, or 0 if none */ (needs to be
revisited)
0: Oid oprnegate; /* OID of negator oper, or 0 if none */ (needs to be
revisited)
arraycontainselem: regproc oprcode; /* OID of underlying function */
0: regproc oprrest; /* OID of restriction estimator, or 0 */
0: regproc oprjoin; /* OID of join estimator, or 0 */
}
Attachments:
elemOperatorV2.patchtext/x-patch; charset=US-ASCII; name=elemOperatorV2.patchDownload+43-1
I was idly following along in GSoC 2017: Foreign Key Arrays
when I noticed this:
=# select * from pg_amproc where amprocfamily = 2745;
amprocfamily | amproclefttype | amprocrighttype | amprocnum |
amproc
--------------+----------------+-----------------+-----------+
2745 | 2277 | 2277 | 2 |
pg_catalog.ginarrayextract
2745 | 2277 | 2277 | 3 |
ginqueryarrayextract
...
where only ginarrayextract is schema-qualified. It seems to be
regproc's output procedure doing it:
=# select 2743::regproc, 2774::regproc;
regproc | regproc
----------------------------+----------------------
pg_catalog.ginarrayextract | ginqueryarrayextract
The manual says regproc "will display schema-qualified names on output
if the object would not be found in the current search path without
being qualified."
Is regproc displaying the schema in this case because there are two
overloaded flavors of ginarrayextract, though both are in pg_catalog?
Could it be searching for the object by name, ignoring the argument
signature, and just detecting that it hit one with a different OID first?
-Chap
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Chapman Flack <chap@anastigmatix.net> writes:
The manual says regproc "will display schema-qualified names on output
if the object would not be found in the current search path without
being qualified."
That's less than the full truth :-(
Is regproc displaying the schema in this case because there are two
overloaded flavors of ginarrayextract, though both are in pg_catalog?
Yes, see the test in regprocout:
* Would this proc be found (uniquely!) by regprocin? If not,
* qualify it.
Of course, in a situation like this, schema-qualification is not enough to
save the day; regprocin will still fail because the name is ambiguous.
You really need to use regprocedure not regproc if you want any guarantees
about the results.
(The fact that we have regproc at all is a bit of a historical accident,
caused by some limitations of the bootstrap mode.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
*Updates till now:*
- added a record to pg_proc (src/include/catalog/pg_proc.h)
- modified opr_sanity regression check expected results
- implemented a low-level function called `array_contains_elem` as an
equivalent to `array_contain_compare` but accepts anyelement instead of
anyarray as the right operand. This is more efficient than constructing an
array and then immediately deconstructing it.
*Questions:*
- I'd like to check that anyelem and anyarray have the same element
type. but anyelem is obtained from PG_FUNCTION_ARGS as a Datum. How can
I make such a check?
Best Regards,
Mark Rofail
Attachments:
elemOperatorV3.patchtext/x-patch; charset=US-ASCII; name=elemOperatorV3.patchDownload+114-4
On Sun, Jun 18, 2017 at 12:41 AM, Mark Rofail <markm.rofail@gmail.com>
wrote:
*Questions:*
- I'd like to check that anyelem and anyarray have the same element
type. but anyelem is obtained from PG_FUNCTION_ARGS as a Datum. How
can I make such a check?
As I know, it's implicitly checked during query analyze stage. You don't
have to implement your own check inside function implementation.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Okay, so major breakthrough.
*Updates:*
- The operator @>(anyarray, anyelement) is now functional
- The segmentation fault was due to applying PG_FREE_IF_COPY on a
datum when it should only be applied on TOASTed inputs
- The only problem now is if for example you apply the operator as
follows '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it maps to @>(anyarray,
anyarray) since 'AAAAAAAAAA646' is interpreted as char[] instead of Text
- Added some regression tests (src/test/regress/sql/arrays.sql) and
their results(src/test/regress/expected/arrays.out)
- wokred on the new GIN strategy, I don't think it would vary much from
GinContainsStrategy.
*What I plan to do:*
- I need to start working on the Referential Integrity code but I don't
where to start
Best Regards,
Mark Rofail
Attachments:
elemOperatorV3_1.patchtext/x-patch; charset=US-ASCII; name=elemOperatorV3_1.patchDownload+177-2
Mark Rofail wrote:
Okay, so major breakthrough.
*Updates:*
- The operator @>(anyarray, anyelement) is now functional
- The segmentation fault was due to applying PG_FREE_IF_COPY on a
datum when it should only be applied on TOASTed inputs
- The only problem now is if for example you apply the operator as
follows '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it maps to @>(anyarray,
anyarray) since 'AAAAAAAAAA646' is interpreted as char[] instead of Text
- Added some regression tests (src/test/regress/sql/arrays.sql) and
their results(src/test/regress/expected/arrays.out)
- wokred on the new GIN strategy, I don't think it would vary much from
GinContainsStrategy.
OK, that's great.
*What I plan to do:*
- I need to start working on the Referential Integrity code but I don't
where to start
You need to study the old patch posted by Marco Nenciarini.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
*What I did:*
- read into the old patch but couldn't apply it since it's quite old. It
needs to be rebased and that's what I am working on. It's a lot of work.
- incomplete patch can be found attached here
*Bugs*
- problem with the @>(anyarray, anyelement) opertator: if for example,
you apply the operator as follows '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it
maps to @>(anyarray, anyarray) since 'AAAAAAAAAA646' is interpreted as
char[] instead of Text
*Suggestion:*
- since I needed to check if the Datum was null and its type, I had to
do it in the arraycontainselem and pass it as a parameter to the underlying
function array_contains_elem. I'm proposing to introduce a new struct like
ArrayType, but ElementType along all with brand new MACROs to make dealing
with anyelement easier in any polymorphic context.
Best Regards,
Mark Rofail
On Tue, Jun 20, 2017 at 12:19 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:
Show quoted text
Mark Rofail wrote:
Okay, so major breakthrough.
*Updates:*
- The operator @>(anyarray, anyelement) is now functional
- The segmentation fault was due to applying PG_FREE_IF_COPY on a
datum when it should only be applied on TOASTed inputs
- The only problem now is if for example you apply the operator as
follows '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it maps to@>(anyarray,
anyarray) since 'AAAAAAAAAA646' is interpreted as char[] instead
of Text
- Added some regression tests (src/test/regress/sql/arrays.sql) and
their results(src/test/regress/expected/arrays.out)
- wokred on the new GIN strategy, I don't think it would vary muchfrom
GinContainsStrategy.
OK, that's great.
*What I plan to do:*
- I need to start working on the Referential Integrity code but I
don't
where to start
You need to study the old patch posted by Marco Nenciarini.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
incomplete-Array-ELEMENT-foreign-key-v2-REBASED-ddb5fdc.patchtext/x-patch; charset=US-ASCII; name=incomplete-Array-ELEMENT-foreign-key-v2-REBASED-ddb5fdc.patchDownload+369-18
On Mon, Jun 26, 2017 at 2:26 AM, Mark Rofail <markm.rofail@gmail.com> wrote:
*What I did:*
- read into the old patch but couldn't apply it since it's quite old.
It needs to be rebased and that's what I am working on. It's a lot of
work.
- incomplete patch can be found attached hereHave you met any particular problem here? Or is it just a lot of
mechanical work?
*Bugs*
- problem with the @>(anyarray, anyelement) opertator: if for example,
you apply the operator as follows '{AAAAAAAAAA646'}' @> 'AAAAAAAAAA646' it
maps to @>(anyarray, anyarray) since 'AAAAAAAAAA646' is interpreted as
char[] instead of TextI don't think it is bug. When types are not specified explicitly, then
optimizer do its best on guessing them. Sometimes results are
counterintuitive to user. But that is not bug, it's probably a room for
improvement. And I don't think this improvement should be subject of this
GSoC. Anyway, array FK code should use explicit type cast, and then you
wouldn't meet this problem.
On the other hand, you could just choose another operator name for
arraycontainselem.
Then such problem probably wouldn't occur.
*Suggestion:*
- since I needed to check if the Datum was null and its type, I had to
do it in the arraycontainselem and pass it as a parameter to the underlying
function array_contains_elem. I'm proposing to introduce a new struct like
ArrayType, but ElementType along all with brand new MACROs to make dealing
with anyelement easier in any polymorphic context.You don't need to do explicit check for nulls, because arraycontainselem
is marked as strict function. Executor never pass null inputs to your
function if its declared as strict. See evaluate_function().
Also, during query planning it's checked that all polymorphic are
consistent between each other. See
https://www.postgresql.org/docs/devel/static/extend-type-system.html#extend-types-polymorphic
and check_generic_type_consistency() for details.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Jun 26, 2017 at 6:44 PM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:
Have you met any particular problem here? Or is it just a lot of
mechanical work?
Just A LOT of mechanictal work, thankfully. The patch is now rebased and
all regress tests have passed (even the element_foreign_key). Please find
the patch below !
I don't think it is bug.
That's good news !
*What I plan to do next *
- study ri_triggers.c (src/backend/utils/adt/ri_triggers.c) since this
is where the new RI code will reside
Best Regards,
Mark Rofail
Attachments:
Array-ELEMENT-foreign-key-v2-REBASED-f0256c7.patchtext/x-patch; charset=US-ASCII; name=Array-ELEMENT-foreign-key-v2-REBASED-f0256c7.patchDownload+1805-82
Mark Rofail wrote:
On Mon, Jun 26, 2017 at 6:44 PM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:Have you met any particular problem here? Or is it just a lot of
mechanical work?Just A LOT of mechanictal work, thankfully. The patch is now rebased and
all regress tests have passed (even the element_foreign_key). Please find
the patch below !
Great!
*What I plan to do next *
- study ri_triggers.c (src/backend/utils/adt/ri_triggers.c) since this
is where the new RI code will reside
Any news?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I was unaware that there was a ver3 patch:
/messages/by-id/28617.1351095467@sss.pgh.pa.us
I rebased this also (rebased version attached here).
There were considerable changes in syntax between v2 and v3, and different
approaches in the implementations, so I have to restudy ri_triggers.c but
at least the old patch gave me a good idea of what's going on.
As for the *limitations *of the patch:
1. Only one "ELEMENT" column allowed in a multi-column key
- - e.g. FOREIGN KEY (c1, ELEMENT c2, ELEMENT c3) REFERENCES t1 (u1,
u2, u3) will throw an error
2. Supported actions:
- - NO ACTION
- - RESTRICT
3. The use of count(distinct y) in the SQL statements if the referencing
column is an array. Since its equality operator is different from the PK
unique index equality operator this leads to a broken statement
- regression=# create table ff (f1 float8 primary key);
CREATE TABLE
regression=# create table cc (f1 numeric references ff);
CREATE TABLE
regression=# create table cc2 (f1 numeric[], foreign key(each element
of f1) references ff);
ERROR: foreign key constraint "cc2_f1_fkey" cannot be implemented
DETAIL: Key column "f1" has element type numeric which does not have
a default btree operator class that's compatible with class "float8_ops".
4. undesirable dependency on default opclass semantics in the patch,
which is that it supposes it can use array_eq() to detect whether or not
the referencing column has changed. But I think that can be fixed without
undue pain by providing a refactored version of array_eq() that can be told
which element-comparison function to use
5. fatal performance issues. If you issue any UPDATE or DELETE against
the PK table, you get a query like this for checking to see if the RI
constraint would be violated:
SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x;
6. cross-type FKs are unsupported
These are the limitations I gathered from the previous mailing list:
/messages/by-id/1343842863.5162.4.camel@greygoo.devise-it.lan
I am pretty sure other limitations will arise.
I am confident that between the time the patch was implemented(2012) and
now postgres has grown considerably, the array functions are now more
robust and will help in resolving many issues.
I would like to point out that Limitation #5 is the first limitation we
should eliminate as it deems the feature unbeneficial.
I would like to thank Marco Nenciarini, Gabriele, Gianni and Tom Lane, for
their hard work in the previous patches and anyone else I forgot.
As for limitations for the anyarray @> anyelem operator's *limitations*:
1. since anyarray @< anyarray and anyarray @> anyelem have the same
symbol when a statemnt like this is executed '{AAAAAAAAAA646'}' @>
'AAAAAAAAAA646' it's mapped to anyarray @< anyarray instead of anyarray @>
anyelem
- but as Alexander pointed out
On Mon, Jun 26, 2017 at 6:44 PM, Alexander Korotkov <aekorotkov@gmail.com>
wrote:
When types are not specified explicitly, then optimizer do its best on
guessing them. Sometimes results are counterintuitive to user. But that
is not bug, it's probably a room for improvement. And I don't think this
improvement should be subject of this GSoC. Anyway, array FK code should
use explicit type cast, and then you wouldn't meet this problem.
*What I plan to do next: *
- located the SQL statements triggered at any insert or update and will
now "convert" them to use GIN. However, NO ACTION and RESTRICT are the
only actions supported right now
so that's how I will spend the next week.
Best Regards,
Mark Rofail
Attachments:
Array-ELEMENT-foreign-key-v3-REBASED-42794d6.patchtext/x-patch; charset=US-ASCII; name=Array-ELEMENT-foreign-key-v3-REBASED-42794d6.patchDownload+1760-83
To make the queries fired by the RI triggers GIN indexed. We need to ‒ as
Tom Lane has previously suggested[1]/messages/by-id/28389.1351094795@sss.pgh.pa.us ‒ to replace the query
SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x;
with
SELECT 1 FROM ONLY fktable x WHERE ARRAY[$1] <@ fkcol FOR SHARE OF x;
but since we have @<(anyarray, anyelement) it can be improved to
SELECT 1 FROM ONLY fktable x WHERE $1 @> fkcol FOR SHARE OF x;
and the piece of code responsible for all of this is ri_GenerateQual in
ri_triggers.c.
How to accomplish that is the next step. I don't know if we should hardcode
the "@>" symbol or if we just index the fk table then ri_GenerateQual would
be able to find the operator on it's own.
*What I plan to do:*
- study how to index the fk table upon its creation. I suspect this can
be done in tablecmds.c
*Questions:*
- how can you programmatically in C index a table?
[1]: /messages/by-id/28389.1351094795@sss.pgh.pa.us
Best Regards,
Mark Rofail
Attachments:
GIN-fk-RI-code-v1.patchtext/x-patch; charset=US-ASCII; name=GIN-fk-RI-code-v1.patchDownload+2-9
* What I am working on*
- since we want to create an index on the referencing column, I am
working on firing a 'CREATE INDEX' query programatically right after the
'CREATE TABLE' query
- The problem I ran into is how to specify my Strategy (
GinContainsElemStrategy) within the CREATE INDEX query. For
example: CREATE
INDEX ON fktable USING gin (fkcolumn array_ops)
Where does the strategy number fit?
- The patch is attached here, is the approach I took to creating an
index programmatically, correct?
Best Regard,
Mark Rofail
Attachments:
GIN-fk-RI-code-v1.1.patchtext/x-patch; charset=US-ASCII; name=GIN-fk-RI-code-v1.1.patchDownload+27-9
On Sun, Jul 9, 2017 at 2:35 AM, Mark Rofail <markm.rofail@gmail.com> wrote:
* What I am working on*
- since we want to create an index on the referencing column, I am
working on firing a 'CREATE INDEX' query programatically right after
the 'CREATE TABLE' query
- The problem I ran into is how to specify my Strategy (
GinContainsElemStrategy) within the CREATE INDEX query. For
example: CREATE INDEX ON fktable USING gin (fkcolumn array_ops)
Where does the strategy number fit?
- The patch is attached here, is the approach I took to creating an
index programmatically, correct?
Could you, please, specify idea of what you're implementing in more
detail? AFACS, you're going to automatically create GIN indexes on FK
array columns. However, if we don't do this for regular columns, why
should we do for array columns? For me that sounds like a separate feature
which should be implemented for both regular and array FK columns.
Regarding your questions. If you need to create index supporting given
operator, you shouldn't take care about strategy number. Strategy number
makes sense only in opclass internals. You just need to specify opclass
which support your operator. In principle, you can find all of them in
pg_amop table. Alternatively you can just stick to GIN array_ops.
In general the approach you create index looks OK. It's OK to manually
create DDL node and execute it. As you can see, this is done in many other
places of backend code.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company