Performance question

Started by Anil Menonover 11 years ago6 messagesgeneral
Jump to latest
#1Anil Menon
gakmenon@gmail.com

Hello,

I would like to ask from your experience which would be the best "generic"
method for checking if row sets of a certain condition exists in a PLPGSQL
function.

I know of 4 methods so far (please feel free to add if I missed out any
others)

1) get a count (my previous experience with ORCL shaped this option)

select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
do X
else
do y
end if
Cons : It seems doing a count(*) is not the best option for PG

2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql->sql->plpgsql switches

3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if

Seems to remove the above (item 2) issues (if any)

4) using exists
if exists ( select 1 from table where <<condition>> ) then
do x
else
do y
end if

My question is what would be the best (in terms of performance) method to
use? My gut feeling is to use option 4 for PG. Am I right or is there any
other method?

Thanks in advance
Anil

#2Francisco Olarte
folarte@peoplecall.com
In reply to: Anil Menon (#1)
Re: Performance question

Hi Anil:

On Wed, Nov 19, 2014 at 5:26 PM, Anil Menon <gakmenon@gmail.com> wrote:

Cons : It seems doing a count(*) is not the best option for PG

For this and some of the following options, if you are going to just test
for existence, I would consider adding a limit 1 somewehere on the query,
to let the optimizer know you only need one and it will abort the scan on
first hit. Probabley not needed if you are going to give a query which uses
an unique index, but it shouldn't hurt.

Francisco Olarte.

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Anil Menon (#1)
Re: Performance question

On 11/19/2014 08:26 AM, Anil Menon wrote:

Hello,

I would like to ask from your experience which would be the best
"generic" method for checking if row sets of a certain condition exists
in a PLPGSQL function.

I know of 4 methods so far (please feel free to add if I missed out any
others)

1) get a count (my previous experience with ORCL shaped this option)

select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
do X
else
do y
end if
Cons : It seems doing a count(*) is not the best option for PG

Well that would depend on the table size, whether it was 100 rows vs
1,000,000 rows

2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql->sql->plpgsql switches

plpgsql is fairly tightly coupled to SQL, so I have not really seen any
problems. But then I am not working on large datasets.

3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if

Seems to remove the above (item 2) issues (if any)

AFAIK, you cannot do the above as written. PERFORM does not return a result:

http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT

It would have to be more like:

perform primary_key_Col from table where <<condition>>

4) using exists
if exists ( select 1 from table where <<condition>> ) then
do x
else
do y
end if

My question is what would be the best (in terms of performance) method
to use? My gut feeling is to use option 4 for PG. Am I right or is there
any other method?

All of the above is context specific. To know for sure you will need to
test on actual data.

Thanks in advance
Anil

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Anil Menon
gakmenon@gmail.com
In reply to: Adrian Klaver (#3)
Re: Performance question

Thanks Adrian

On Thu, Nov 20, 2014 at 3:46 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 11/19/2014 08:26 AM, Anil Menon wrote:

Hello,

I would like to ask from your experience which would be the best
"generic" method for checking if row sets of a certain condition exists
in a PLPGSQL function.

I know of 4 methods so far (please feel free to add if I missed out any
others)

1) get a count (my previous experience with ORCL shaped this option)

select count(*) into vcnt
from table
where <<condition>>
if vcnt >0 then
do X
else
do y
end if
Cons : It seems doing a count(*) is not the best option for PG

Well that would depend on the table size, whether it was 100 rows vs
1,000,000 rows

​The table is estimated/guesstimated to be ~900 million rows (~30Ma day​,
90 days history, though initially it would be ~30M), though the <<where>>
part of the query would return between 0 and ~2 rows

2) Use a non-count option
select primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql->sql->plpgsql switches

plpgsql is fairly tightly coupled to SQL, so I have not really seen any
problems. But then I am not working on large datasets.

​I think that ~900M rows would constitute a large data set most likely

3) using perform
perform primary_key_Col into vcnt
from table
where <<condition>>
if found then
do X
else
do y
end if

Seems to remove the above (item 2) issues (if any)

AFAIK, you cannot do the above as written. PERFORM does not return a
result:

http://www.postgresql.org/docs/9.3/interactive/plpgsql-
statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT

It would have to be more like:

perform primary_key_Col from table where <<condition>>

​You are absolutely right - my bad​.

4) using exists
if exists ( select 1 from table where <<condition>> ) then
do x
else
do y
end if

My question is what would be the best (in terms of performance) method
to use? My gut feeling is to use option 4 for PG. Am I right or is there
any other method?

All of the above is context specific. To know for sure you will need to
test on actual data.

​Absolutely right, just that I want to ensure that I follow the most
optimal method before the DB goes into production, after which priorities
change on what needs to be changed.​

--
Adrian Klaver
adrian.klaver@aklaver.com

​I guess the best answer would be "its very context specific​", but picking
the brains of experienced resources helps :-)

Thanks again
Anil​

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Anil Menon (#1)
Re: Performance question

Anil Menon wrote:

I would like to ask from your experience which would be the best "generic" method for checking if row
sets of a certain condition exists in a PLPGSQL function.

I know of 4 methods so far (please feel free to add if I missed out any others)

[...]

Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?

It may be that rows are added or removed between the check and the
corresponding action.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Anil Menon
gakmenon@gmail.com
In reply to: Laurenz Albe (#5)
Re: Performance question

Thanks Laurenz, very good point!

Luckily (phew!) the business scenario is such that race conditions cannot
occur (and the transaction table is append only). There is business
workflow to address duplicates but
1) it occurs extremely rarely (it would be a deliberate sabotage if it
occurs)
2) there is no impact on business

Yours
Anil

On Fri, Nov 21, 2014 at 5:16 PM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:

Show quoted text

Anil Menon wrote:

I would like to ask from your experience which would be the best

"generic" method for checking if row

sets of a certain condition exists in a PLPGSQL function.

I know of 4 methods so far (please feel free to add if I missed out any

others)
[...]

Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?

It may be that rows are added or removed between the check and the
corresponding action.

Yours,
Laurenz Albe