simplifying foreign key/RI checks
While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1]/messages/by-id/CADkLM=cTt_8Fg1Jtij5j+QEBOxz9Cuu4DiMDYOwdtktDAKzuLw@mail.gmail.com, [2]/messages/by-id/1813.1586363881@antos),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads. The two main aspects of how we do these checks that
seemingly cause the most overhead are:
* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them
* Using plain SQL queries issued over SPI
There is a discussion nearby titled "More efficient RI checks - take
2" [2]/messages/by-id/1813.1586363881@antos to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here. I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.
I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:
SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x
$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:
QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)
So in most cases, the trigger's function would only execute the plan
that's already there, at least in a given session. That's good but
what we realized would be even better is if we didn't have to
"execute" a full-fledged "plan" for this, that is, to simply find out
whether a row containing the key we're looking for exists in the
referenced relation and if found lock it. Directly scanning the index
and locking it directly with table_tuple_lock() like ExecLockRows()
does gives us exactly that behavior, which seems simple enough to be
done in a not-so-long local function in ri_trigger.c. I gave that a
try and came up with the attached. It also takes care of the case
where the referenced relation is partitioned in which case its
appropriate leaf partition's index is scanned.
The patch results in ~2x improvement in the performance of inserts and
updates on referencing tables:
create table p (a numeric primary key);
insert into p select generate_series(1, 1000000);
create table f (a bigint references p);
-- unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6340.733 ms (00:06.341)
update f set a = a + 1;
UPDATE 1000000
Time: 7490.906 ms (00:07.491)
-- patched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3340.808 ms (00:03.341)
update f set a = a + 1;
UPDATE 1000000
Time: 4178.171 ms (00:04.178)
The improvement is even more dramatic when the referenced table (that
we're no longer querying over SPI) is partitioned. Here are the
numbers when the PK relation has 1000 hash partitions.
Unpatched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 35898.783 ms (00:35.899)
update f set a = a + 1;
UPDATE 1000000
Time: 37736.294 ms (00:37.736)
Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 5633.377 ms (00:05.633)
update f set a = a + 1;
UPDATE 1000000
Time: 6345.029 ms (00:06.345)
That's over ~5x improvement!
While the above case seemed straightforward enough for skipping SPI,
it seems a bit hard to do the same for other cases where we query the
*referencing* relation during an operation on the referenced table
(for example, checking if the row being deleted is still referenced),
because the plan in those cases is not predictably an index scan.
Also, the filters in those queries are more than likely to not match
the partition key of a partitioned referencing relation, so all
partitions will have to scanned. I have left those cases as future
work.
The patch seems simple enough to consider for inclusion in v14 unless
of course we stumble into some dealbreaker(s). I will add this to
March CF.
--
Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/CADkLM=cTt_8Fg1Jtij5j+QEBOxz9Cuu4DiMDYOwdtktDAKzuLw@mail.gmail.com
Hi,
I was looking at this statement:
insert into f select generate_series(1, 2000000, 2);
Since certain generated values (the second half) are not in table p,
wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:
yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint
"f_a_fkey"
DETAIL: Key (a)=(1001) is not present in table "p".
For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
+ * Collect partition key values from the unique key.
At the end of the nested loop, should there be an assertion
that partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is incremented
when a match is found by the inner loop.
Cheers
On Mon, Jan 18, 2021 at 4:40 AM Amit Langote <amitlangote09@gmail.com>
wrote:
Show quoted text
While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1], [2]),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads. The two main aspects of how we do these checks that
seemingly cause the most overhead are:* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them* Using plain SQL queries issued over SPI
There is a discussion nearby titled "More efficient RI checks - take
2" [2] to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here. I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x
$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)So in most cases, the trigger's function would only execute the plan
that's already there, at least in a given session. That's good but
what we realized would be even better is if we didn't have to
"execute" a full-fledged "plan" for this, that is, to simply find out
whether a row containing the key we're looking for exists in the
referenced relation and if found lock it. Directly scanning the index
and locking it directly with table_tuple_lock() like ExecLockRows()
does gives us exactly that behavior, which seems simple enough to be
done in a not-so-long local function in ri_trigger.c. I gave that a
try and came up with the attached. It also takes care of the case
where the referenced relation is partitioned in which case its
appropriate leaf partition's index is scanned.The patch results in ~2x improvement in the performance of inserts and
updates on referencing tables:create table p (a numeric primary key);
insert into p select generate_series(1, 1000000);
create table f (a bigint references p);-- unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6340.733 ms (00:06.341)update f set a = a + 1;
UPDATE 1000000
Time: 7490.906 ms (00:07.491)-- patched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3340.808 ms (00:03.341)update f set a = a + 1;
UPDATE 1000000
Time: 4178.171 ms (00:04.178)The improvement is even more dramatic when the referenced table (that
we're no longer querying over SPI) is partitioned. Here are the
numbers when the PK relation has 1000 hash partitions.Unpatched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 35898.783 ms (00:35.899)update f set a = a + 1;
UPDATE 1000000
Time: 37736.294 ms (00:37.736)Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 5633.377 ms (00:05.633)update f set a = a + 1;
UPDATE 1000000
Time: 6345.029 ms (00:06.345)That's over ~5x improvement!
While the above case seemed straightforward enough for skipping SPI,
it seems a bit hard to do the same for other cases where we query the
*referencing* relation during an operation on the referenced table
(for example, checking if the row being deleted is still referenced),
because the plan in those cases is not predictably an index scan.
Also, the filters in those queries are more than likely to not match
the partition key of a partitioned referencing relation, so all
partitions will have to scanned. I have left those cases as future
work.The patch seems simple enough to consider for inclusion in v14 unless
of course we stumble into some dealbreaker(s). I will add this to
March CF.--
Amit Langote
EDB: http://www.enterprisedb.com[1]
/messages/by-id/CADkLM=cTt_8Fg1Jtij5j+QEBOxz9Cuu4DiMDYOwdtktDAKzuLw@mail.gmail.com
po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com>
napsal:
While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1], [2]),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads. The two main aspects of how we do these checks that
seemingly cause the most overhead are:* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them* Using plain SQL queries issued over SPI
There is a discussion nearby titled "More efficient RI checks - take
2" [2] to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here. I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x
$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)
What is performance when the referenced table is small? - a lot of
codebooks are small between 1000 to 10K rows.
On Tue, Jan 19, 2021 at 3:01 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x
$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)What is performance when the referenced table is small? - a lot of codebooks are small between 1000 to 10K rows.
I see the same ~2x improvement.
create table p (a numeric primary key);
insert into p select generate_series(1, 1000);
create table f (a bigint references p);
Unpatched:
insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 5461.377 ms (00:05.461)
Patched:
insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 2357.440 ms (00:02.357)
That's expected because the overhead of using SPI to check the PK
table, which the patch gets rid of, is the same no matter the size of
the index to be scanned.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
I was looking at this statement:insert into f select generate_series(1, 2000000, 2);
Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint "f_a_fkey"
DETAIL: Key (a)=(1001) is not present in table "p".
Sorry, a wrong copy-paste by me. Try this:
create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);
-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)
update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)
-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)
update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)
For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
+ * Collect partition key values from the unique key.
At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.
I've updated the patch to add the Assert. Thanks for taking a look.
--
Amit Langote
EDB: http://www.enterprisedb.com
Thanks for the quick response.
+ if (mapped_partkey_attnums[i] == pk_attnums[j])
+ {
+ partkey_vals[i] = pk_vals[j];
+ partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false;
+ i++;
+ break;
The way counter (i) is incremented is out of my expectation.
In the rare case, where some i doesn't have corresponding pk_attnums[j],
wouldn't there be a dead loop ?
I think the goal of adding the assertion should be not loop infinitely even
if the invariant is not satisfied.
I guess a counter other than i would be better for this purpose.
Cheers
On Mon, Jan 18, 2021 at 6:45 PM Amit Langote <amitlangote09@gmail.com>
wrote:
Show quoted text
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
I was looking at this statement:insert into f select generate_series(1, 2000000, 2);
Since certain generated values (the second half) are not in table p,
wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:
yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint"f_a_fkey"
DETAIL: Key (a)=(1001) is not present in table "p".
Sorry, a wrong copy-paste by me. Try this:
create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
+ * Collect partition key values from the unique key.
At the end of the nested loop, should there be an assertion that
partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is
incremented when a match is found by the inner loop.
I've updated the patch to add the Assert. Thanks for taking a look.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Tue, 19 Jan 2021 at 10:45, Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
I was looking at this statement:insert into f select generate_series(1, 2000000, 2);
Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint "f_a_fkey"
DETAIL: Key (a)=(1001) is not present in table "p".Sorry, a wrong copy-paste by me. Try this:
create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
+ * Collect partition key values from the unique key.
At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.I've updated the patch to add the Assert. Thanks for taking a look.
After apply the v2 patches, here are some warnings:
In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’:
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
do { \
^
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’
ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
^~~~~~~~~~~~~~
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’
ereport(elevel, errmsg_internal(__VA_ARGS__))
^~~~~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’
elog(ERROR, "unexpected table_tuple_lock status: %u", res);
^~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here
default:
^~~~~~~
--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.
út 19. 1. 2021 v 3:08 odesílatel Amit Langote <amitlangote09@gmail.com>
napsal:
On Tue, Jan 19, 2021 at 3:01 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com>
napsal:
I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x
$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)What is performance when the referenced table is small? - a lot of
codebooks are small between 1000 to 10K rows.
I see the same ~2x improvement.
create table p (a numeric primary key);
insert into p select generate_series(1, 1000);
create table f (a bigint references p);Unpatched:
insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 5461.377 ms (00:05.461)Patched:
insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 2357.440 ms (00:02.357)That's expected because the overhead of using SPI to check the PK
table, which the patch gets rid of, is the same no matter the size of
the index to be scanned.
It looks very well.
Regards
Pavel
Show quoted text
--
Amit Langote
EDB: http://www.enterprisedb.com
In file included from
/home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
from
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:
In function ‘ri_PrimaryKeyExists’:
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5:
warning: this statement may fall through [-Wimplicit-fallthrough=]
do { \
^
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2:
note: in expansion of macro ‘ereport_domain’
ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
^~~~~~~~~~~~~~
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2:
note: in expansion of macro ‘ereport’
ereport(elevel, errmsg_internal(__VA_ARGS__))
^~~~~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5:
note: in expansion of macro ‘elog’
elog(ERROR, "unexpected table_tuple_lock status: %u", res);
^~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4:
note: here
default:
^~~~~~~--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.
I also get this warning. Adding a "break;" at line 418 resolves the warning.
On Tue, Jan 19, 2021 at 2:56 PM Corey Huinker <corey.huinker@gmail.com> wrote:
In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’:
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
do { \
^
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’
ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
^~~~~~~~~~~~~~
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’
ereport(elevel, errmsg_internal(__VA_ARGS__))
^~~~~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’
elog(ERROR, "unexpected table_tuple_lock status: %u", res);
^~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here
default:
^~~~~~~
Thanks, will fix it.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Mon, Jan 18, 2021 at 9:45 PM Amit Langote <amitlangote09@gmail.com>
wrote:
On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
I was looking at this statement:insert into f select generate_series(1, 2000000, 2);
Since certain generated values (the second half) are not in table p,
wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:
yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint"f_a_fkey"
DETAIL: Key (a)=(1001) is not present in table "p".
Sorry, a wrong copy-paste by me. Try this:
create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :
+ * Collect partition key values from the unique key.
At the end of the nested loop, should there be an assertion that
partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is
incremented when a match is found by the inner loop.
I've updated the patch to add the Assert. Thanks for taking a look.
--
Amit Langote
EDB: http://www.enterprisedb.com
v2 patch applies and passes make check and make check-world. Perhaps, given
the missing break at line 418 without any tests failing, we could add
another regression test if we're into 100% code path coverage. As it is, I
think the compiler warning was a sufficient alert.
The code is easy to read, and the comments touch on the major points of
what complexities arise from partitioned tables.
A somewhat pedantic complaint I have brought up off-list is that this patch
continues the pattern of the variable and function names making the
assumption that the foreign key is referencing the primary key of the
referenced table. Foreign key constraints need only reference a unique
index, it doesn't have to be the primary key. Granted, that unique index is
behaving exactly as a primary key would, so conceptually it is very
similar, but keeping with the existing naming (pk_rel, pk_type, etc) can
lead a developer to think that it would be just as correct to find the
referenced relation and get the primary key index from there, which would
not always be correct. This patch correctly grabs the index from the
constraint itself, so no problem there.
I like that this patch changes the absolute minimum of the code in order to
get a very significant performance benefit. It does so in a way that should
reduce resource pressure found in other places [1]/messages/by-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com. This will in turn
reduce the performance penalty of "doing the right thing" in terms of
defining enforced foreign keys. It seems to get a clearer performance boost
than was achieved with previous efforts at statement level triggers.
This patch completely sidesteps the DELETE case, which has more insidious
performance implications, but is also far less common, and whose solution
will likely be very different.
[1]: /messages/by-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com
/messages/by-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com
On Tue, Jan 19, 2021 at 3:46 PM Corey Huinker <corey.huinker@gmail.com> wrote:
v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any tests failing, we could add another regression test if we're into 100% code path coverage. As it is, I think the compiler warning was a sufficient alert.
Thanks for the review. I will look into checking the coverage.
The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned tables.
A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and function names making the assumption that the foreign key is referencing the primary key of the referenced table. Foreign key constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique index is behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing naming (pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced relation and get the primary key index from there, which would not always be correct. This patch correctly grabs the index from the constraint itself, so no problem there.
I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.
I like that this patch changes the absolute minimum of the code in order to get a very significant performance benefit. It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce the performance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer performance boost than was achieved with previous efforts at statement level triggers.
[1] /messages/by-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com
Thanks. I hadn't noticed [1] before today, but after looking it over,
it seems that what is being proposed there can still be of use. As
long as SPI is used in ri_trigger.c, it makes sense to consider any
tweaks addressing its negative impact, especially if they are not very
invasive. There's this patch too from the last month:
https://commitfest.postgresql.org/32/2930/
This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different.
Yeah, we should continue looking into the ways to make referenced-side
RI checks be less bloated.
I've attached the updated patch.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Tue, Jan 19, 2021 at 12:00 PM Zhihong Yu <zyu@yugabyte.com> wrote:
+ if (mapped_partkey_attnums[i] == pk_attnums[j]) + { + partkey_vals[i] = pk_vals[j]; + partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false; + i++; + break;The way counter (i) is incremented is out of my expectation.
In the rare case, where some i doesn't have corresponding pk_attnums[j], wouldn't there be a dead loop ?I think the goal of adding the assertion should be not loop infinitely even if the invariant is not satisfied.
I guess a counter other than i would be better for this purpose.
I have done that in v3. Thanks.
--
Amit Langote
EDB: http://www.enterprisedb.com
I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.
I agree with leaving the existing terminology where it is for this patch.
Changing the function name is probably enough to alert the reader that the
things that are called pks may not be precisely that.
I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.
I think that's a nice compromise, it makes the reader aware of the concept.
I've attached the updated patch.
Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and check world.
No additional regression tests required, as no new functionality is
introduced.
No docs required, as there is nothing user-facing.
Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, is
the prevailing memory context short lived enough that we don't care?
2. Same question for the AtrrMap map, should there be a free_attrmap().
On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker <corey.huinker@gmail.com> wrote:
I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.I think that's a nice compromise, it makes the reader aware of the concept.
I've attached the updated patch.
Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and check world.
No additional regression tests required, as no new functionality is introduced.
No docs required, as there is nothing user-facing.
Thanks for the review.
Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived enough that we don't care?
2. Same question for the AtrrMap map, should there be a free_attrmap().
I hadn't checked, but yes, the prevailing context is
AfterTriggerTupleContext, a short-lived one that is reset for every
trigger event tuple. I'm still inclined to explicitly free those
objects, so changed like that. While at it, I also changed
mapped_partkey_attnums to root_partattrs for readability.
Attached v4.
--
Amit Langote
EDB: http://www.enterprisedb.com
Hi,
+ for (i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ if (pk_nulls[i] != 'n' &&
OidIsValid(entry->cast_func_finfo.fn_oid))
It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment
to the three local variables. That way, ri_HashCompareOp wouldn't be called
when pk_nulls[i] == 'n'.
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
...
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
It seems the handling for TM_Updated and TM_Deleted is the same. The cases
for these two values can be put next to each other (saving one block of
code).
Cheers
On Fri, Jan 22, 2021 at 11:10 PM Amit Langote <amitlangote09@gmail.com>
wrote:
Show quoted text
On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker <corey.huinker@gmail.com>
wrote:I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.I think that's a nice compromise, it makes the reader aware of the
concept.
I've attached the updated patch.
Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and checkworld.
No additional regression tests required, as no new functionality is
introduced.
No docs required, as there is nothing user-facing.
Thanks for the review.
Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, isthe prevailing memory context short lived enough that we don't care?
2. Same question for the AtrrMap map, should there be a free_attrmap().
I hadn't checked, but yes, the prevailing context is
AfterTriggerTupleContext, a short-lived one that is reset for every
trigger event tuple. I'm still inclined to explicitly free those
objects, so changed like that. While at it, I also changed
mapped_partkey_attnums to root_partattrs for readability.Attached v4.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
+ for (i = 0; i < riinfo->nkeys; i++) + { + Oid eq_opr = eq_oprs[i]; + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid); + + if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))It seems the pk_nulls[i] != 'n' check can be lifted ahead of the
assignment to the three local variables. That way, ri_HashCompareOp
wouldn't be called when pk_nulls[i] == 'n'.+ case TM_Updated: + if (IsolationUsesXactSnapshot()) ... + case TM_Deleted: + if (IsolationUsesXactSnapshot())It seems the handling for TM_Updated and TM_Deleted is the same. The cases
for these two values can be put next to each other (saving one block of
code).Cheers
I'll pause on reviewing v4 until you've addressed the suggestions above.
On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker <corey.huinker@gmail.com> wrote:
On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
Thanks for the review.
+ for (i = 0; i < riinfo->nkeys; i++) + { + Oid eq_opr = eq_oprs[i]; + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid); + + if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment to the three local variables. That way, ri_HashCompareOp wouldn't be called when pk_nulls[i] == 'n'.
Good idea, so done. Although, there can't be nulls right now.
+ case TM_Updated: + if (IsolationUsesXactSnapshot()) ... + case TM_Deleted: + if (IsolationUsesXactSnapshot())It seems the handling for TM_Updated and TM_Deleted is the same. The cases for these two values can be put next to each other (saving one block of code).
Ah, yes. The TM_Updated case used to be handled a bit differently in
earlier unposted versions of the patch, though at some point I
concluded that the special handling was unnecessary, but didn't
realize what you just pointed out. Fixed.
I'll pause on reviewing v4 until you've addressed the suggestions above.
Here's v5.
--
Amit Langote
EDB: http://www.enterprisedb.com
On Sun, Jan 24, 2021 at 6:51 AM Amit Langote <amitlangote09@gmail.com>
wrote:
On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker <corey.huinker@gmail.com>
wrote:On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
Thanks for the review.
+ for (i = 0; i < riinfo->nkeys; i++) + { + Oid eq_opr = eq_oprs[i]; + Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]); + RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr,typeid);
+
+ if (pk_nulls[i] != 'n' &&OidIsValid(entry->cast_func_finfo.fn_oid))
It seems the pk_nulls[i] != 'n' check can be lifted ahead of the
assignment to the three local variables. That way, ri_HashCompareOp
wouldn't be called when pk_nulls[i] == 'n'.Good idea, so done. Although, there can't be nulls right now.
+ case TM_Updated: + if (IsolationUsesXactSnapshot()) ... + case TM_Deleted: + if (IsolationUsesXactSnapshot())It seems the handling for TM_Updated and TM_Deleted is the same. The
cases for these two values can be put next to each other (saving one block
of code).Ah, yes. The TM_Updated case used to be handled a bit differently in
earlier unposted versions of the patch, though at some point I
concluded that the special handling was unnecessary, but didn't
realize what you just pointed out. Fixed.I'll pause on reviewing v4 until you've addressed the suggestions above.
Here's v5.
v5 patches apply to master.
Suggested If/then optimization is implemented.
Suggested case merging is implemented.
Passes make check and make check-world yet again.
Just to confirm, we *don't* free the RI_CompareHashEntry because it points
to an entry in a hash table which is TopMemoryContext aka lifetime of the
session, correct?
Anybody else want to look this patch over before I mark it Ready For
Committer?