Performance question
Hello,
I would like to ask from your experience which would be the best "generic"
method for checking if row sets of a certain condition exists in a PLPGSQL
function.
I know of 4 methods so far (please feel free to add if I missed out any
others)
1) get a count (my previous experience with ORCL shaped this option)
select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
do X
else
do y
end if
Cons : It seems doing a count(*) is not the best option for PG
2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql->sql->plpgsql switches
3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Seems to remove the above (item 2) issues (if any)
4) using exists
if exists ( select 1 from table where <<condition>> ) then
do x
else
do y
end if
My question is what would be the best (in terms of performance) method to
use? My gut feeling is to use option 4 for PG. Am I right or is there any
other method?
Thanks in advance
Anil
Hi Anil:
On Wed, Nov 19, 2014 at 5:26 PM, Anil Menon <gakmenon@gmail.com> wrote:
Cons : It seems doing a count(*) is not the best option for PG
For this and some of the following options, if you are going to just test
for existence, I would consider adding a limit 1 somewehere on the query,
to let the optimizer know you only need one and it will abort the scan on
first hit. Probabley not needed if you are going to give a query which uses
an unique index, but it shouldn't hurt.
Francisco Olarte.
On 11/19/2014 08:26 AM, Anil Menon wrote:
Hello,
I would like to ask from your experience which would be the best
"generic" method for checking if row sets of a certain condition exists
in a PLPGSQL function.I know of 4 methods so far (please feel free to add if I missed out any
others)1) get a count (my previous experience with ORCL shaped this option)
select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
do X
else
do y
end if
Cons : It seems doing a count(*) is not the best option for PG
Well that would depend on the table size, whether it was 100 rows vs
1,000,000 rows
2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql->sql->plpgsql switches
plpgsql is fairly tightly coupled to SQL, so I have not really seen any
problems. But then I am not working on large datasets.
3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end ifSeems to remove the above (item 2) issues (if any)
AFAIK, you cannot do the above as written. PERFORM does not return a result:
It would have to be more like:
perform primary_key_Col from table where <<condition>>
4) using exists
if exists ( select 1 from table where <<condition>> ) then
do x
else
do y
end ifMy question is what would be the best (in terms of performance) method
to use? My gut feeling is to use option 4 for PG. Am I right or is there
any other method?
All of the above is context specific. To know for sure you will need to
test on actual data.
Thanks in advance
Anil
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thanks Adrian
On Thu, Nov 20, 2014 at 3:46 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 11/19/2014 08:26 AM, Anil Menon wrote:
Hello,
I would like to ask from your experience which would be the best
"generic" method for checking if row sets of a certain condition exists
in a PLPGSQL function.I know of 4 methods so far (please feel free to add if I missed out any
others)1) get a count (my previous experience with ORCL shaped this option)
select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
do X
else
do y
end if
Cons : It seems doing a count(*) is not the best option for PGWell that would depend on the table size, whether it was 100 rows vs
1,000,000 rows
The table is estimated/guesstimated to be ~900 million rows (~30Ma day,
90 days history, though initially it would be ~30M), though the <<where>>
part of the query would return between 0 and ~2 rows
2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql->sql->plpgsql switchesplpgsql is fairly tightly coupled to SQL, so I have not really seen any
problems. But then I am not working on large datasets.
I think that ~900M rows would constitute a large data set most likely
3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end ifSeems to remove the above (item 2) issues (if any)
AFAIK, you cannot do the above as written. PERFORM does not return a
result:http://www.postgresql.org/docs/9.3/interactive/plpgsql-
statements.html#PLPGSQL-STATEMENTS-SQL-NORESULTIt would have to be more like:
perform primary_key_Col from table where <<condition>>
You are absolutely right - my bad.
4) using exists
if exists ( select 1 from table where <<condition>> ) then
do x
else
do y
end ifMy question is what would be the best (in terms of performance) method
to use? My gut feeling is to use option 4 for PG. Am I right or is there
any other method?All of the above is context specific. To know for sure you will need to
test on actual data.
Absolutely right, just that I want to ensure that I follow the most
optimal method before the DB goes into production, after which priorities
change on what needs to be changed.
--
Adrian Klaver
adrian.klaver@aklaver.com
I guess the best answer would be "its very context specific", but picking
the brains of experienced resources helps :-)
Thanks again
Anil
Anil Menon wrote:
I would like to ask from your experience which would be the best "generic" method for checking if row
sets of a certain condition exists in a PLPGSQL function.I know of 4 methods so far (please feel free to add if I missed out any others)
[...]
Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?
It may be that rows are added or removed between the check and the
corresponding action.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Thanks Laurenz, very good point!
Luckily (phew!) the business scenario is such that race conditions cannot
occur (and the transaction table is append only). There is business
workflow to address duplicates but
1) it occurs extremely rarely (it would be a deliberate sabotage if it
occurs)
2) there is no impact on business
Yours
Anil
On Fri, Nov 21, 2014 at 5:16 PM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:
Show quoted text
Anil Menon wrote:
I would like to ask from your experience which would be the best
"generic" method for checking if row
sets of a certain condition exists in a PLPGSQL function.
I know of 4 methods so far (please feel free to add if I missed out any
others)
[...]Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?It may be that rows are added or removed between the check and the
corresponding action.Yours,
Laurenz Albe