What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

Started by Moshe Jacobsonover 13 years ago14 messagesgeneral
Jump to latest
#1Moshe Jacobson
moshe@neadwerx.com

I am working on an audit logging trigger that gets called for every row
inserted, updated or deleted on any table.
For this, I need to store a couple of temporary session variables such as
the ID of the user performing the change, which can be set at the start of
the session.
Until now I have been using a permanent table to store the session
variables, but it has been difficult to wipe the data properly at the end
of the session.
So I have decided to try to implement them using temporary tables.

The problem now is that for every row now, I need to check for the
existence of the temporary table before I access it, in order to avoid
exceptions.
Either I can do all such accesses within a BEGIN...EXCEPTION block, or I
can precede any such accesses with CREATE TEMP TABLE IF NOT EXISTS.
Is one of these much faster than the other? Will I be slowing things down
inordinately by doing this for every row?

Thanks.

--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | www.neadwerx.com

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Moshe Jacobson (#1)
Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Mon, Oct 1, 2012 at 8:36 AM, Moshe Jacobson <moshe@neadwerx.com> wrote:

I am working on an audit logging trigger that gets called for every row
inserted, updated or deleted on any table.
For this, I need to store a couple of temporary session variables such as
the ID of the user performing the change, which can be set at the start of
the session.
Until now I have been using a permanent table to store the session
variables, but it has been difficult to wipe the data properly at the end of
the session.
So I have decided to try to implement them using temporary tables.

The problem now is that for every row now, I need to check for the existence
of the temporary table before I access it, in order to avoid exceptions.
Either I can do all such accesses within a BEGIN...EXCEPTION block, or I can
precede any such accesses with CREATE TEMP TABLE IF NOT EXISTS.
Is one of these much faster than the other? Will I be slowing things down
inordinately by doing this for every row?

Couple points:
*) Functions without exception blocks are faster than those with.
*) Therefore, CREATE/IF NOT EXISTS is probably faster (test to be sure)
*) Carefully consider if you you will ever in the future introduce
connection pooling. If you do, relying on session scoped objects like
temp tables is probably not a good idea.
*) You can rig permanent tables around pg_backend_pid(). On session
login, clear session private records that have your pid (if any).
Transaction temporary data can be similarly rigged around
txid_current() with an even simpler maintenance process.

merlin

#3Moshe Jacobson
moshe@neadwerx.com
In reply to: Merlin Moncure (#2)
Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

Merlin,

On Mon, Oct 1, 2012 at 10:28 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

Couple points:
*) Functions without exception blocks are faster than those with.

Clearly.

*) Therefore, CREATE/IF NOT EXISTS is probably faster (test to be sure)

I don't think that can be assumed by your premise above. Essentially we are
comparing the price of starting an exception block against checking the
catalog for a table.

*) Carefully consider if you you will ever in the future introduce
connection pooling. If you do, relying on session scoped objects like
temp tables is probably not a good idea.

We already use connection pooling with pgbouncer, but upon disconnect, it
issues a DISCARD ALL statement, which should take care of this.

*) You can rig permanent tables around pg_backend_pid(). On session
login, clear session private records that have your pid (if any).
Transaction temporary data can be similarly rigged around
txid_current() with an even simpler maintenance process.

We currently do use permanent tables using pg_backend_pid(). It's because
of the connection pooling specifically that we are having problems with
stale data. I have been unable to find a way to automatically clear that
data upon start or end of a session, or at least check if it's been set in
this session or not.

--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | www.neadwerx.com

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Moshe Jacobson (#3)
Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Mon, Oct 1, 2012 at 10:21 AM, Moshe Jacobson <moshe@neadwerx.com> wrote:

Merlin,

On Mon, Oct 1, 2012 at 10:28 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

Couple points:
*) Functions without exception blocks are faster than those with.

Clearly.

*) Therefore, CREATE/IF NOT EXISTS is probably faster (test to be sure)

I don't think that can be assumed by your premise above. Essentially we are
comparing the price of starting an exception block against checking the
catalog for a table.

A vanilla create table has to scan the catalogs also.

*) Carefully consider if you you will ever in the future introduce
connection pooling. If you do, relying on session scoped objects like
temp tables is probably not a good idea.

We already use connection pooling with pgbouncer, but upon disconnect, it
issues a DISCARD ALL statement, which should take care of this.

Especially if you're using pgbouncer transaction mode, using temporary
tables is probably not a good idea. When you DISCARD them, it
invalidates all your function plans which is going to be painful if
you have a lot of pl/pgsql (test any non-trivial pl/pgsql routine and
you'll see it's much slower on the first invocation). Also, if you
need to share data between transactions, it's not clear how you're
organizing such that different application sessions are going to tromp
over each other's data.

*) You can rig permanent tables around pg_backend_pid(). On session
login, clear session private records that have your pid (if any).
Transaction temporary data can be similarly rigged around
txid_current() with an even simpler maintenance process.

We currently do use permanent tables using pg_backend_pid(). It's because of
the connection pooling specifically that we are having problems with stale
data. I have been unable to find a way to automatically clear that data upon
start or end of a session, or at least check if it's been set in this
session or not.

IMO the right way to do it is to generate a unique application token
(sequence is ok if you're not worried about it being guessed) when
your application session logs in. That token should be passed into
*all* your session specific backend functions and can be used to
organize session specific temporary data in your permanent tables.

To deal with ungraceful application client exit, you can consider
implementing an on_proc_exit handler to close the session down so that
it can be appropriately cleaned up (there are severe limits to the SQL
you can execute in the handler but you can make dblink calls). If
some of them still sneak through, periodic sweep on stale pids
against pg_stat_activity should take care of them.

Note, if your users have some type of unique identifier (like a login
or an email) and if they are only allowed to have one active session
at a time, you can organize your session data around that instead of
generating a token.

merlin

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#4)
Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Mon, Oct 1, 2012 at 11:22 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

We currently do use permanent tables using pg_backend_pid(). It's because of
the connection pooling specifically that we are having problems with stale
data. I have been unable to find a way to automatically clear that data upon
start or end of a session, or at least check if it's been set in this
session or not.

IMO the right way to do it is to generate a unique application token
(sequence is ok if you're not worried about it being guessed) when
your application session logs in. That token should be passed into
*all* your session specific backend functions and can be used to
organize session specific temporary data in your permanent tables.

To deal with ungraceful application client exit, you can consider
implementing an on_proc_exit handler to close the session down so that

actually, you can't do that (on_proc_exit or scan for pids) if you're
using transaction mode connection pooling. In our case, we modified
pgbouncer to pass async notifications and would have used that to
periodically scan connected clients if we didn't have the luxury of
one client/session only.

merlin

#6Moshe Jacobson
moshe@neadwerx.com
In reply to: Merlin Moncure (#4)
Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Mon, Oct 1, 2012 at 12:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

*) Functions without exception blocks are faster than those with.
*) Therefore, CREATE/IF NOT EXISTS is probably faster (test to be sure)

I don't think that can be assumed by your premise above. Essentially we

are

comparing the price of starting an exception block against checking the
catalog for a table.

A vanilla create table has to scan the catalogs also.

Yes but that is irrelevant to the discussion. I am comparing the speed of
repeated table existence checks with the speed of repeated exception blocks
that access said table.

We already use connection pooling with pgbouncer, but upon disconnect, it

issues a DISCARD ALL statement [...]

Especially if you're using pgbouncer transaction mode, using temporary
tables is probably not a good idea.

We are using it in session mode, so none of that is relevant to my
situation.

*) You can rig permanent tables around pg_backend_pid(). [...]

We currently do use permanent tables using pg_backend_pid(). It's

because of

the connection pooling specifically that we are having problems with

stale

data. I have been unable to find a way to automatically clear that data

upon

start or end of a session, or at least check if it's been set in this
session or not.

IMO the right way to do it is to generate a unique application token
[...] when your application session logs in. That token should be passed
into
*all* your session specific backend functions [...]

No, this will not work because the backend functions are trigger functions,
so they cannot be passed this data.

Thanks.

--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | www.neadwerx.com

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Moshe Jacobson (#6)
Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Mon, Oct 1, 2012 at 3:58 PM, Moshe Jacobson <moshe@neadwerx.com> wrote:

On Mon, Oct 1, 2012 at 12:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

*) Functions without exception blocks are faster than those with.
*) Therefore, CREATE/IF NOT EXISTS is probably faster (test to be sure)

I don't think that can be assumed by your premise above. Essentially we
are
comparing the price of starting an exception block against checking the
catalog for a table.

A vanilla create table has to scan the catalogs also.

Yes but that is irrelevant to the discussion. I am comparing the speed of
repeated table existence checks with the speed of repeated exception blocks
that access said table.

Both approaches have to do a catalog scan (even if you've established
an exception block the server still internally has to do a catalog
scan in order to raise an appropriate error). The exception block has
the additional overhead of a subtransaction.

We already use connection pooling with pgbouncer, but upon disconnect,
it
issues a DISCARD ALL statement [...]

Especially if you're using pgbouncer transaction mode, using temporary
tables is probably not a good idea.

We are using it in session mode, so none of that is relevant to my
situation.

OK. (but I don't like solutions that prevent stateless connection
pooling). In just about all cases where scalability was a concern and
I used session scoped objects I ended up regretting it somewhere down
the line. A more stateless approach has a lot of advantages besides
supporting more aggressive connection pooling -- for example you can
restart the server and all your connected clients wont lose local
temporary data. Anyways, enough lecturing -- I'm sure you've got
things pretty well figured out :-).

*) You can rig permanent tables around pg_backend_pid(). [...]

We currently do use permanent tables using pg_backend_pid(). It's
because of
the connection pooling specifically that we are having problems with
stale
data. I have been unable to find a way to automatically clear that data
upon
start or end of a session, or at least check if it's been set in this
session or not.

IMO the right way to do it is to generate a unique application token
[...] when your application session logs in. That token should be passed
into
*all* your session specific backend functions [...]

No, this will not work because the backend functions are trigger functions,
so they cannot be passed this data.

Sure they can...there are a number of ways to do it (although they all
require A. an extra round trip to establish the ID for later
statements in the transaction to read or B. a wrapping function that
handles the work on the server side). That said, if you're super
duper sure you'll never use transaction mode pooling, Temp tables are
ok to use unless your sessions are quite short (in which case all the
plan invalidation flying around will start to hurt).

merlin

#8Moshe Jacobson
moshe@neadwerx.com
In reply to: Merlin Moncure (#7)
Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Tue, Oct 2, 2012 at 9:18 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

Yes but that is irrelevant to the discussion. I am comparing the speed of
repeated table existence checks with the speed of repeated exception

blocks

that access said table.

Both approaches have to do a catalog scan (even if you've established
an exception block the server still internally has to do a catalog
scan in order to raise an appropriate error). The exception block has
the additional overhead of a subtransaction.

OK this makes sense. Thanks :-) I will go for the CREATE TABLE approach.

--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | www.neadwerx.com

#9Ivan Voras
ivoras@freebsd.org
In reply to: Moshe Jacobson (#1)
Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On 01/10/2012 15:36, Moshe Jacobson wrote:

I am working on an audit logging trigger that gets called for every row
inserted, updated or deleted on any table.
For this, I need to store a couple of temporary session variables such as
the ID of the user performing the change, which can be set at the start of
the session.
Until now I have been using a permanent table to store the session
variables, but it has been difficult to wipe the data properly at the end
of the session.

Do you know about session variables? I did something similar to what you
are describing and it ended up much simpler than using tables, temporary
or not.

You need to configure them in postgresql.conf, e.g.:

custom_variable_classes = 'myapp'

Then in the application code:

SET myapp.uid = 42;

And in the pl/pgsql function:

CREATE OR REPLACE FUNCTION dblog() RETURNS TRIGGER AS $$
DECLARE
uid INTEGER;
BEGIN
BEGIN
SELECT current_setting('myapp.uid') INTO uid;
EXCEPTION
WHEN undefined_object THEN
uid = null;
WHEN data_exception THEN
uid = null;
END;
...
END;
$$ LANGUAGE plpgsql;

The major benefit here is that it doesn't touch the table engines,
temporary or not.

#10Moshe Jacobson
moshe@neadwerx.com
In reply to: Ivan Voras (#9)
Re: Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Thu, Oct 4, 2012 at 6:12 AM, Ivan Voras <ivoras@freebsd.org> wrote:

On 01/10/2012 15:36, Moshe Jacobson wrote:

I am working on an audit logging trigger that gets called for every row
inserted, updated or deleted on any table.
For this, I need to store a couple of temporary session variables such as
the ID of the user performing the change, which can be set at the start

of

the session.

Do you know about session variables?
The major benefit here is that it doesn't touch the table engines,
temporary or not.

This sounds incredibly useful. Why have I not heard of this until today??
In your example you still had to use a BEGIN...EXCEPTION block. Is that
faster than a create temp table?

--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | www.neadwerx.com

#11Ivan Voras
ivoras@freebsd.org
In reply to: Moshe Jacobson (#10)
Re: Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On 5 October 2012 04:53, Moshe Jacobson <moshe@neadwerx.com> wrote:

On Thu, Oct 4, 2012 at 6:12 AM, Ivan Voras <ivoras@freebsd.org> wrote:

On 01/10/2012 15:36, Moshe Jacobson wrote:

I am working on an audit logging trigger that gets called for every row
inserted, updated or deleted on any table.
For this, I need to store a couple of temporary session variables such
as
the ID of the user performing the change, which can be set at the start
of
the session.

Do you know about session variables?
The major benefit here is that it doesn't touch the table engines,
temporary or not.

This sounds incredibly useful. Why have I not heard of this until today??
In your example you still had to use a BEGIN...EXCEPTION block. Is that
faster than a create temp table?

I think I can make a fairly educated guess that catching exceptions
while dealing with session variables should be much, much faster than
creating any kind of a table :)

Besides, from what you said, you will ensure on the app level that the
session variable is set sometime close to when you open a connection
to Pg, so the "catch" part of the exception block will probably not
run at all.

#12Merlin Moncure
mmoncure@gmail.com
In reply to: Ivan Voras (#11)
Re: Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Fri, Oct 5, 2012 at 3:09 AM, Ivan Voras <ivoras@freebsd.org> wrote:

I think I can make a fairly educated guess that catching exceptions
while dealing with session variables should be much, much faster than
creating any kind of a table :)

That is true, but it's not clear how using session variables keeps you
from having to create the table. If the table is already there, a
session variable guarding the table construction shouldn't be any
faster/better than a simple 'create if not exists'. A catalog scan is
basically boils down to a query. So I'm not sure that actually helps.

On the other hand, if the temp table can be completely ditched for a
session variable or two, then yeah, that would be much better since
you'd avoid the overhead of creating the table completely.

merlin

#13Ivan Voras
ivoras@freebsd.org
In reply to: Merlin Moncure (#12)
Re: Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On 5 October 2012 15:55, Merlin Moncure <mmoncure@gmail.com> wrote:

On Fri, Oct 5, 2012 at 3:09 AM, Ivan Voras <ivoras@freebsd.org> wrote:

I think I can make a fairly educated guess that catching exceptions
while dealing with session variables should be much, much faster than
creating any kind of a table :)

On the other hand, if the temp table can be completely ditched for a
session variable or two, then yeah, that would be much better since
you'd avoid the overhead of creating the table completely.

Yes, this is what I was aiming at, based on the OP mentioning he only
has a limited amount of data to manage in this way.

#14Moshe Jacobson
moshe@neadwerx.com
In reply to: Ivan Voras (#13)
Re: Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

On Fri, Oct 5, 2012 at 11:13 AM, Ivan Voras <ivoras@freebsd.org> wrote:

On the other hand, if the temp table can be completely ditched for a
session variable or two, then yeah, that would be much better since
you'd avoid the overhead of creating the table completely.

Yes, this is what I was aiming at, based on the OP mentioning he only
has a limited amount of data to manage in this way.

Yup, I'm going to see how this goes using session variables. I think it is
exactly what I need. Thanks!

--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe@neadwerx.com | www.neadwerx.com