Prepared statements performance

Started by Daniel McGrealabout 14 years ago10 messagesgeneral

daniel.mcgreal@redbite.com

about 14 years ago

Hi!

My reading to date suggests that prepared statements should be faster to
execute than issuing the same statement multiple times. However, issuing
100'000 INSERTs turned out to be more than ten times faster than executing
the same prepared statement 100'000 times when executed via pgAdmin. The
table was:

CREATE TABLE test
(
one date,
two boolean,
three character varying,
four integer,
five numeric(18,5),
id serial NOT NULL --note the index here
)

The prepared statement test lasting ~160 seconds was:

TRUNCATE test;

BEGIN;
PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3, $4,
$5);

EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more executes...
END;

The insertion test lasting ~12 seconds was:

TRUNCATE test;

BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true,
'three', 4, 5.5);
-- 99'999 more inserts...
END;

I'm assuming then that I've done something mistakenly.

Many thanks,
Dan.

Daniel McGreal

daniel.mcgreal@redbite.com

about 14 years ago

In reply to: Daniel McGreal (#1)

Re: Prepared statements performance

Hi again,

I did a follow up test using 'multi-value' inserts which is three times
faster than multiple inserts thusly:

TRUNCATE test;
BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true,
'three', 4, 5.5)
,('2011-01-01', true, 'three', 4, 5.5)
-- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
END;

This is the kind of speed increase I was hoping for when using prepared
statements (which makes sense because in this multi-value insert the query
is only being planned once?).

Thanks,
Dan.
P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.

On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal
<daniel.mcgreal@redbite.com>wrote:

Show quoted text

Hi!

My reading to date suggests that prepared statements should be faster to
execute than issuing the same statement multiple times. However, issuing
100'000 INSERTs turned out to be more than ten times faster than
executing the same prepared statement 100'000 times when executed via
pgAdmin. The table was:

CREATE TABLE test
(
one date,
two boolean,
three character varying,
four integer,
five numeric(18,5),
id serial NOT NULL --note the index here
)

The prepared statement test lasting ~160 seconds was:

TRUNCATE test;

BEGIN;
PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3,
$4, $5);

EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more executes...
END;

The insertion test lasting ~12 seconds was:

TRUNCATE test;

BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
true, 'three', 4, 5.5);
-- 99'999 more inserts...
END;

I'm assuming then that I've done something mistakenly.

Many thanks,
Dan.

Import Notes

Reply to msg id not found: CACAnjQwHKhBCfGBSMaVvnYPpA-Vc7jVQm7wRv-9_Khy2r2d4xw@mail.gmail.com

Pavel Stehule

pavel.stehule@gmail.com

about 14 years ago

In reply to: Daniel McGreal (#2)

Re: Prepared statements performance

Hello

2012/5/10 Daniel McGreal <daniel.mcgreal@redbite.com>:

Hi again,

I did a follow up test using 'multi-value' inserts which is three times
faster than multiple inserts thusly:

if you need speed, use a COPY statement - it should be 10x faster than INSERTS

Pavel

Show quoted text

TRUNCATE test;
BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true,
'three', 4, 5.5)

,('2011-01-01', true, 'three', 4, 5.5)
-- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
END;

This is the kind of speed increase I was hoping for when using prepared
statements (which makes sense because in this multi-value insert the query
is only being planned once?).

Thanks,
Dan.
P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.

On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal
<daniel.mcgreal@redbite.com> wrote:

Hi!

My reading to date suggests that prepared statements should be faster to
execute than issuing the same statement multiple times. However, issuing
100'000 INSERTs turned out to be more than ten times faster than executing
the same prepared statement 100'000 times when executed via pgAdmin. The
table was:

CREATE TABLE test
(
one date,
two boolean,
three character varying,
four integer,
five numeric(18,5),
id serial NOT NULL --note the index here
)

The prepared statement test lasting ~160 seconds was:

TRUNCATE test;

BEGIN;
PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3,
$4, $5);

EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more executes...
END;

The insertion test lasting ~12 seconds was:

TRUNCATE test;

BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
true, 'three', 4, 5.5);
-- 99'999 more inserts...
END;

I'm assuming then that I've done something mistakenly.

Many thanks,
Dan.

Daniel McGreal

daniel.mcgreal@redbite.com

about 14 years ago

In reply to: Pavel Stehule (#3)

Re: Prepared statements performance

Hi,

Unfortunately these are experimental conditions. The conditions surrounding
the intended application are such that my two options are prepared
statements or many inserts. I put the multi-value inserts in as I was
curious as to why prepared statements would be slower given they only plan
the query once (as also does the multi-value insert, I assume).

It turns out though that the results are skewed by using pgAdmin. Executing
my scripts from the command line gives much more appropriate results.

Thanks,
Dan.

On Thu, May 10, 2012 at 10:16 AM, Pavel Stehule <pavel.stehule@gmail.com>wrote:

Show quoted text

Hello

2012/5/10 Daniel McGreal <daniel.mcgreal@redbite.com>:

Hi again,

I did a follow up test using 'multi-value' inserts which is three times
faster than multiple inserts thusly:

if you need speed, use a COPY statement - it should be 10x faster than
INSERTS

Pavel

TRUNCATE test;
BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',

true,

'three', 4, 5.5)

,('2011-01-01', true, 'three', 4, 5.5)
-- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
END;

This is the kind of speed increase I was hoping for when using prepared
statements (which makes sense because in this multi-value insert the

query

is only being planned once?).

Thanks,
Dan.
P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.

On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal
<daniel.mcgreal@redbite.com> wrote:

Hi!

My reading to date suggests that prepared statements should be faster

to

execute than issuing the same statement multiple times. However,

issuing

100'000 INSERTs turned out to be more than ten times faster than

executing

the same prepared statement 100'000 times when executed via pgAdmin.

The

table was:

CREATE TABLE test
(
one date,
two boolean,
three character varying,
four integer,
five numeric(18,5),
id serial NOT NULL --note the index here
)

The prepared statement test lasting ~160 seconds was:

TRUNCATE test;

BEGIN;
PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3,
$4, $5);

EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more executes...
END;

The insertion test lasting ~12 seconds was:

TRUNCATE test;

BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
true, 'three', 4, 5.5);
-- 99'999 more inserts...
END;

I'm assuming then that I've done something mistakenly.

Many thanks,
Dan.

Alban Hertroys

haramrae@gmail.com

about 14 years ago

In reply to: Daniel McGreal (#4)

Re: Prepared statements performance

On 10 May 2012 11:30, Daniel McGreal <daniel.mcgreal@redbite.com> wrote:

I put the multi-value inserts in as I was
curious as to why prepared statements would be slower given they only plan
the query once (as also does the multi-value insert, I assume).

That's a common misconception.

The reason that prepared statements are often slower, is exactly
_because_ they only plan the query once. Because the query-plan is
stored when the query gets prepared, the same plan gets used for every
combination of query parameters, so it has to be a fairly generic
query plan.

OTOH, the multi-value insert knows exactly what combinations of
"parameters" will be used in the query and the query planner can
optimise the query for those parameters. It wouldn't surprise me if it
would re-evaluate plan branch choices based on which row of values is
currently being inserted.

I think it's safe to say that prepared statements are only efficient
when you're dealing with repeated complicated queries, where preparing
the query plan takes a significant amount of time. It'll also shave
some time off queries that are inefficient regardless of how you
execute them (for example, because the query always needs to perform a
sequential scan).
They'll also be faster on database servers with a slower query planner
than the one in Postgres.

In most (all?) other cases, executing the query directly is probably faster.

Of course there are other benefits to prepared statements, such as a
natural immunity to SQL injection.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Radosław Smogura

rsmogura@softperience.eu

about 14 years ago

In reply to: Alban Hertroys (#5)

Re: Prepared statements performance

On Thu, 10 May 2012 13:52:29 +0200, Alban Hertroys wrote:

On 10 May 2012 11:30, Daniel McGreal <daniel.mcgreal@redbite.com>
wrote:

I put the multi-value inserts in as I was
curious as to why prepared statements would be slower given they
only plan
the query once (as also does the multi-value insert, I assume).

That's a common misconception.

The reason that prepared statements are often slower, is exactly
_because_ they only plan the query once. Because the query-plan is
stored when the query gets prepared, the same plan gets used for
every
combination of query parameters, so it has to be a fairly generic
query plan.

OTOH, the multi-value insert knows exactly what combinations of
"parameters" will be used in the query and the query planner can
optimise the query for those parameters. It wouldn't surprise me if
it
would re-evaluate plan branch choices based on which row of values is
currently being inserted.

I think it's safe to say that prepared statements are only efficient
when you're dealing with repeated complicated queries, where
preparing
the query plan takes a significant amount of time. It'll also shave
some time off queries that are inefficient regardless of how you
execute them (for example, because the query always needs to perform
a
sequential scan).
They'll also be faster on database servers with a slower query
planner
than the one in Postgres.

In most (all?) other cases, executing the query directly is probably
faster.

Of course there are other benefits to prepared statements, such as a
natural immunity to SQL injection.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

May I ask what kind of planning may occur during insert?
Regards,
Radek

Merlin Moncure

mmoncure@gmail.com

about 14 years ago

In reply to: Alban Hertroys (#5)

Re: Prepared statements performance

On Thu, May 10, 2012 at 6:52 AM, Alban Hertroys <haramrae@gmail.com> wrote:

On 10 May 2012 11:30, Daniel McGreal <daniel.mcgreal@redbite.com> wrote:

I put the multi-value inserts in as I was
curious as to why prepared statements would be slower given they only plan
the query once (as also does the multi-value insert, I assume).

That's a common misconception.

The reason that prepared statements are often slower, is exactly
_because_ they only plan the query once. Because the query-plan is
stored when the query gets prepared, the same plan gets used for every
combination of query parameters, so it has to be a fairly generic
query plan.

OTOH, the multi-value insert knows exactly what combinations of
"parameters" will be used in the query and the query planner can
optimise the query for those parameters. It wouldn't surprise me if it
would re-evaluate plan branch choices based on which row of values is
currently being inserted.

I think it's safe to say that prepared statements are only efficient
when you're dealing with repeated complicated queries, where preparing
the query plan takes a significant amount of time. It'll also shave
some time off queries that are inefficient regardless of how you
execute them (for example, because the query always needs to perform a
sequential scan).
They'll also be faster on database servers with a slower query planner
than the one in Postgres.

In most (all?) other cases, executing the query directly is probably faster.

Of course there are other benefits to prepared statements, such as a
natural immunity to SQL injection.

That can be often true, but for simple inserts there is no plan to get
wrong. Prepared statements can knock about 30-50% of statement
latency off in such cases if you're not i/o bound.

Definitely though prepared statements are headache though and I rarely use them.

merlin

Alban Hertroys

haramrae@gmail.com

about 14 years ago

In reply to: Radosław Smogura (#6)

Re: Prepared statements performance

On 10 May 2012 15:05, Radosław Smogura <rsmogura@softperience.eu> wrote:

May I ask what kind of planning may occur during insert?

Well, for example, if there's a unique constraint on the table then
the database will have to check that the newly inserted values don't
conflict with values that are already in the table. It needs to plan
an efficient strategy for that, which depends on the values being
inserted.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Tom Lane

tgl@sss.pgh.pa.us

about 14 years ago

In reply to: Alban Hertroys (#8)

Re: Prepared statements performance

Alban Hertroys <haramrae@gmail.com> writes:

On 10 May 2012 15:05, Radosław Smogura <rsmogura@softperience.eu> wrote:

May I ask what kind of planning may occur during insert?

Well, for example, if there's a unique constraint on the table then
the database will have to check that the newly inserted values don't
conflict with values that are already in the table. It needs to plan
an efficient strategy for that, which depends on the values being
inserted.

There is no planning associated with checking unique constraints; that's
just a matter for the index mechanisms.

I think the real point here is that a simple INSERT/VALUES has such a
trivial plan that there is hardly any gain to be had by avoiding the
planning stage. Then the other overhead of a prepared statement
(looking up the saved plan, checking it's not stale, etc) outweighs
that. Or at least it could. 3x slower seems a bit fishy; I wonder
whether there's some client-side inefficiency involved in that.
Doing performance measurements with pgAdmin seems pretty questionable
in the first place ...

regards, tom lane

#10

Daniel McGreal

daniel.mcgreal@redbite.com

about 14 years ago

In reply to: Tom Lane (#9)

Re: Prepared statements performance

Doing the same tests from psql gives:

1. ~2.5 seconds for INSERT/VALUES
2. ~10 seconds for prepared statement executes
3. ~15 seconds for multiple INSERTs

Dan.

On Thu, May 10, 2012 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Alban Hertroys <haramrae@gmail.com> writes:

On 10 May 2012 15:05, Radosław Smogura <rsmogura@softperience.eu> wrote:

May I ask what kind of planning may occur during insert?

Well, for example, if there's a unique constraint on the table then
the database will have to check that the newly inserted values don't
conflict with values that are already in the table. It needs to plan
an efficient strategy for that, which depends on the values being
inserted.

There is no planning associated with checking unique constraints; that's
just a matter for the index mechanisms.

I think the real point here is that a simple INSERT/VALUES has such a
trivial plan that there is hardly any gain to be had by avoiding the
planning stage. Then the other overhead of a prepared statement
(looking up the saved plan, checking it's not stale, etc) outweighs
that. Or at least it could. 3x slower seems a bit fishy; I wonder
whether there's some client-side inefficiency involved in that.
Doing performance measurements with pgAdmin seems pretty questionable
in the first place ...

regards, tom lane