Insert query performance

Started by sudover 1 year ago7 messagesgeneral
Jump to latest
#1sud
suds1434@gmail.com

Hello All,

In a version 15.4 postgres database, Is it possible that , if we have two
big range partition tables with foreign key relationships between them,
insert into the child table can cause slowness if we don't have foreign key
index present in the child table? Basically it need to make sure the new
row already added to parent partition table or not.

And if there is any possible way(example query tracing etc) to get the
underlying system queries which gets triggered as part of the main insert
query? For example in above scenario, postgres must be executing some query
to check if the incoming row to the child table already exists in the
parent table or not?

Regards

Sud

#2Muhammad Ikram
mmikram@gmail.com
In reply to: sud (#1)
Re: Insert query performance

Hi Sud,

Please make following change in your postgresql.conf file

log_statement = 'all'

Regards,
Ikram

On Mon, Aug 19, 2024 at 12:48 PM sud <suds1434@gmail.com> wrote:

Hello All,

In a version 15.4 postgres database, Is it possible that , if we have two
big range partition tables with foreign key relationships between them,
insert into the child table can cause slowness if we don't have foreign key
index present in the child table? Basically it need to make sure the new
row already added to parent partition table or not.

And if there is any possible way(example query tracing etc) to get the
underlying system queries which gets triggered as part of the main insert
query? For example in above scenario, postgres must be executing some query
to check if the incoming row to the child table already exists in the
parent table or not?

Regards

Sud

--
Muhammad Ikram

#3David Rowley
dgrowleyml@gmail.com
In reply to: sud (#1)
Re: Insert query performance

On Mon, 19 Aug 2024 at 19:48, sud <suds1434@gmail.com> wrote:

In a version 15.4 postgres database, Is it possible that , if we have two big range partition tables with foreign key relationships between them, insert into the child table can cause slowness if we don't have foreign key index present in the child table? Basically it need to make sure the new row already added to parent partition table or not.

Having an index on the referencing columns is only useful for DELETEs
and UPDATEs affecting the foreign key column(s). For INSERTs to the
referencing table, technically having indexes there would only slow
down inserts due to the additional overhead of having to maintain the
index, however, the overhead of having the index might be fairly
minuscule when compared to performing a CASCADE UPDATE or DELETE to
the referencing table when the DDL is performed on the referenced
table.

And if there is any possible way(example query tracing etc) to get the underlying system queries which gets triggered as part of the main insert query? For example in above scenario, postgres must be executing some query to check if the incoming row to the child table already exists in the parent table or not?

EXPLAIN ANALYZE will list the time it took to execute the foreign key
trigger in the "Trigger for constraint" section.

David

#4sud
suds1434@gmail.com
In reply to: David Rowley (#3)
Re: Insert query performance

On Mon, Aug 19, 2024 at 4:33 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Mon, 19 Aug 2024 at 19:48, sud <suds1434@gmail.com> wrote:

In a version 15.4 postgres database, Is it possible that , if we have

two big range partition tables with foreign key relationships between them,
insert into the child table can cause slowness if we don't have foreign key
index present in the child table? Basically it need to make sure the new
row already added to parent partition table or not.

Having an index on the referencing columns is only useful for DELETEs
and UPDATEs affecting the foreign key column(s). For INSERTs to the
referencing table, technically having indexes there would only slow
down inserts due to the additional overhead of having to maintain the
index, however, the overhead of having the index might be fairly
minuscule when compared to performing a CASCADE UPDATE or DELETE to
the referencing table when the DDL is performed on the referenced
table.

And if there is any possible way(example query tracing etc) to get the

underlying system queries which gets triggered as part of the main insert
query? For example in above scenario, postgres must be executing some query
to check if the incoming row to the child table already exists in the
parent table or not?

EXPLAIN ANALYZE will list the time it took to execute the foreign key
trigger in the "Trigger for constraint" section.

David

Thank you so much David.

If I get it correct , the index on the foreign key mainly helps improve the
deletes/updates performance of the parent table , if the same FK column
gets impacted from the parent table. (This might be the reason why our
detach partition in the parent table runs long and never completes as we
have no foreign key indexed).

However, my initial understanding of "*having the FK index will improve the
insert performance in the child table*" is not accurate it seems. Rather as
you mentioned it may negatively impact the loading/insert performance
because it has to now update the additional index in each insert. In case
of insert into child table, to ensure if the child row is already present
in the parent , it just scans the parent by the Primary key of the parent
table (which is be default indexed) and thus it doesn't need an index in
the child table foreign keys or having an index in the foreign key in the
child table won't help the constraint validation faster. Please correct me
if my understanding is wrong here.

Additionally as you mentioned "explain analyze" will show a section on how
much time it really takes for the constraint validation , I can see that
section now. But it seems it will really need that INSERT statement to be
executed and that we can't really do in production as that will physically
insert data into the table. So do you mean to just do the "explain analyze"
for the INSERT query and capture the plan and then do the rollback? And in
our case it's a row by row insert happening , so we will see if we can
club/sum that "constraint validation" time for a handful if insert somehow
to get a better idea on the percentage of time we really spent in the
constraint validation.

#5sud
suds1434@gmail.com
In reply to: Muhammad Ikram (#2)
Re: Insert query performance

On Mon, Aug 19, 2024 at 1:25 PM Muhammad Ikram <mmikram@gmail.com> wrote:

Hi Sud,

Please make following change in your postgresql.conf file

log_statement = 'all'

Will this put all the internal sql query or the recursive query entries in
the pg_stats_statement view which we can analyze then? And also to debug
issues in the production will it be a good idea to set it for a few times
and then turn it off or it can have significant performance overhead.

#6Muhammad Ikram
mmikram@gmail.com
In reply to: sud (#5)
Re: Insert query performance

It will record all statements in logs. If you are concerned about query
times then you may use pg_stat_statements.

Muhammad Ikram

On Tue, 20 Aug 2024 at 12:19, sud <suds1434@gmail.com> wrote:

Show quoted text

On Mon, Aug 19, 2024 at 1:25 PM Muhammad Ikram <mmikram@gmail.com> wrote:

Hi Sud,

Please make following change in your postgresql.conf file

log_statement = 'all'

Will this put all the internal sql query or the recursive query entries in
the pg_stats_statement view which we can analyze then? And also to debug
issues in the production will it be a good idea to set it for a few times
and then turn it off or it can have significant performance overhead.

#7David Rowley
dgrowleyml@gmail.com
In reply to: sud (#4)
Re: Insert query performance

On Tue, 20 Aug 2024 at 19:09, sud <suds1434@gmail.com> wrote:

However, my initial understanding of "having the FK index will improve the insert performance in the child table" is not accurate it seems. Rather as you mentioned it may negatively impact the loading/insert performance because it has to now update the additional index in each insert. In case of insert into child table, to ensure if the child row is already present in the parent , it just scans the parent by the Primary key of the parent table (which is be default indexed) and thus it doesn't need an index in the child table foreign keys or having an index in the foreign key in the child table won't help the constraint validation faster. Please correct me if my understanding is wrong here.

If you think about what must happen when you insert into the
referencing table, the additional validation that the foreign key must
do is check that a corresponding record exists in the referenced
table. An index on the referencing table does not help speed that up.

Additionally as you mentioned "explain analyze" will show a section on how much time it really takes for the constraint validation , I can see that section now. But it seems it will really need that INSERT statement to be executed and that we can't really do in production as that will physically insert data into the table. So do you mean to just do the "explain analyze" for the INSERT query and capture the plan and then do the rollback? And in our case it's a row by row insert happening , so we will see if we can club/sum that "constraint validation" time for a handful if insert somehow to get a better idea on the percentage of time we really spent in the constraint validation.

I'd recommend performing a schema-only dump of your production
database and experimenting well away from production. See pg_dump
--schema-only.

I also recommend not leaving performance to chance and testing the
impact of index vs no index away from production with some data loaded
that is representative of your production data (or use the production
data if it's available and small enough to manage). Use pgbench to
see what impact having the index on the referencing table has on
performance on inserts into that table vs what improvements you gain
from having the index when there's cascading delete from the
referenced table.

You might also want to look into auto_explain [1]https://www.postgresql.org/docs/15/auto-explain.html. You can load this
into a single session and set auto_explain.log_min_duration = 0,
auto_explain.log_analyze = on and auto_explain.log_nested_statements =
on. That should give you the plan for the cascade DELETE query that's
executed by the trigger when you perform the DELETE on the referenced
table. (Also see the note about auto_explain.log_timing)

David

[1]: https://www.postgresql.org/docs/15/auto-explain.html