Trouble with plan statistics for behaviour for query.

Started by Trevor Campbellalmost 14 years ago13 messagesgeneral

tcampbell@atlassian.com

almost 14 years ago

We are having trouble with a particular query being slow in a strange manner.

The query is a join over two large tables that are suitably indexed.

select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE,
CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=? order by
CG.CREATED asc, CI.ID asc

For some tasks we run this particular query a very large number of times and it has a significant performance impact
when it runs slowly.

If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more. The problem is that
a day later the performance will have dropped back to its previously slow state.

The reason this is so hard to understand is that the activity on this table is very low, with no updates and only a
relatively small number of inserts each day, < 0.1% of the table size.

Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg (cost=0.00..17.91 rows=8 width=33) (actual
time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=2 width=91) (actual
time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms

The explain output always seems the same even when the performance is poor, but I can't be sure of that.

Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.

Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );

CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);

CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );

CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );

Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows

The changegroup table has on average about 4 rows per issueid value, which is the query parameter.

We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on these tables.

Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this is a problem across a variety of postgres
versions.

Craig James

cjames@emolecules.com

almost 14 years ago

In reply to: Trevor Campbell (#1)

Re: Trouble with plan statistics for behaviour for query.

On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com>wrote:

We are having trouble with a particular query being slow in a strange
manner.

The query is a join over two large tables that are suitably indexed.

select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc

This has an unbound variable '?' in it.

For some tasks we run this particular query a very large number of times
and it has a significant performance impact when it runs slowly.

If we run ANALYSE over the CHANGEITEM table then the performance picks up
by a factor of 5 or more. The problem is that a day later the performance
will have dropped back to its previously slow state.

The reason this is so hard to understand is that the activity on this
table is very low, with no updates and only a relatively small number of
inserts each day, < 0.1% of the table size.

Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087
rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual
time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg
(cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci
(cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms

What's the exact SQL you used to get this ... did you use a specific
CG.ISSUEID to run your test? If that's the case, this EXPLAIN ANALYZE
won't be the same as the one generated for your actual application.

Craig

Show quoted text

The explain output always seems the same even when the performance is
poor, but I can't be sure of that.

Overall it seems like PostgreSQL just forgets about the statistics it has
gathered after a short while.

Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );

CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);

CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );

CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );

Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows

The changegroup table has on average about 4 rows per issueid value, which
is the query parameter.

We run autovacuum and autoanalyse, but as the activity in the table is low
these are rarely if ever invoked on these tables.

Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this
is a problem across a variety of postgres versions.

Trevor Campbell

tcampbell@atlassian.com

almost 14 years ago

In reply to: Craig James (#2)

Re: Trouble with plan statistics for behaviour for query.

On 01/06/12 08:55, Craig James wrote:

On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com <mailto:tcampbell@atlassian.com>> wrote:

We are having trouble with a particular query being slow in a strange manner.

The query is a join over two large tables that are suitably indexed.

select CG.ID <http://CG.ID>, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID <http://CI.ID>, CI.FIELDTYPE, CI.FIELD,
CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID <http://CG.ID> = CI.GROUPID where
CG.ISSUEID=? order by CG.CREATED asc, CI.ID <http://CI.ID> asc

This has an unbound variable '?' in it.

These queries are being run from a java application using JDBC and when run the variable is bound to an long integer
value. While trying to investigate the problem, I have been just hard coding a value in the statement.

Show quoted text

For some tasks we run this particular query a very large number of times and it has a significant performance
impact when it runs slowly.

If we run ANALYSE over the CHANGEITEM table then the performance picks up by a factor of 5 or more. The problem
is that a day later the performance will have dropped back to its previously slow state.

The reason this is so hard to understand is that the activity on this table is very low, with no updates and only
a relatively small number of inserts each day, < 0.1% of the table size.

Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087 rows=14 loops=1)
Sort Key: cg.created, ci.id <http://ci.id>
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg (cost=0.00..17.91 rows=8 width=33) (actual
time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=2 width=91) (actual
time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id <http://cg.id>)
Total runtime: 0.116 ms

What's the exact SQL you used to get this ... did you use a specific CG.ISSUEID to run your test? If that's the case,
this EXPLAIN ANALYZE won't be the same as the one generated for your actual application.

Craig

The explain output always seems the same even when the performance is poor, but I can't be sure of that.

Overall it seems like PostgreSQL just forgets about the statistics it has gathered after a short while.

Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );

CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);

CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );

CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );

Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows

The changegroup table has on average about 4 rows per issueid value, which is the query parameter.

We run autovacuum and autoanalyse, but as the activity in the table is low these are rarely if ever invoked on
these tables.

Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this is a problem across a variety of
postgres versions.

Craig James

cjames@emolecules.com

almost 14 years ago

In reply to: Trevor Campbell (#3)

Re: Trouble with plan statistics for behaviour for query.

On Thu, May 31, 2012 at 4:01 PM, Trevor Campbell <tcampbell@atlassian.com>wrote:

On 01/06/12 08:55, Craig James wrote:

On Thu, May 31, 2012 at 3:29 PM, Trevor Campbell <tcampbell@atlassian.com>wrote:

We are having trouble with a particular query being slow in a strange
manner.

The query is a join over two large tables that are suitably indexed.

select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
CI.GROUPID where CG.ISSUEID=? order by CG.CREATED asc, CI.ID asc

This has an unbound variable '?' in it.

These queries are being run from a java application using JDBC and when
run the variable is bound to an long integer value. While trying to
investigate the problem, I have been just hard coding a value in the
statement.

I use Perl, not JDBC, but this thread may be relevant to your problem.

http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html

Craig

Show quoted text

For some tasks we run this particular query a very large number of times
and it has a significant performance impact when it runs slowly.

If we run ANALYSE over the CHANGEITEM table then the performance picks
up by a factor of 5 or more. The problem is that a day later the
performance will have dropped back to its previously slow state.

The reason this is so hard to understand is that the activity on this
table is very low, with no updates and only a relatively small number of
inserts each day, < 0.1% of the table size.

Explain output:
Sort (cost=86.90..86.93 rows=11 width=118) (actual time=0.086..0.087
rows=14 loops=1)
Sort Key: cg.created, ci.id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..86.71 rows=11 width=118) (actual
time=0.022..0.061 rows=14 loops=1)
-> Index Scan using chggroup_issue on changegroup cg
(cost=0.00..17.91 rows=8 width=33) (actual time=0.012..0.015 rows=7 loops=1)
Index Cond: (issueid = 81001::numeric)
-> Index Scan using chgitem_chggrp on changeitem ci
(cost=0.00..8.58 rows=2 width=91) (actual time=0.005..0.005 rows=2 loops=7)
Index Cond: (groupid = cg.id)
Total runtime: 0.116 ms

What's the exact SQL you used to get this ... did you use a specific
CG.ISSUEID to run your test? If that's the case, this EXPLAIN ANALYZE
won't be the same as the one generated for your actual application.

Craig

The explain output always seems the same even when the performance is
poor, but I can't be sure of that.

Overall it seems like PostgreSQL just forgets about the statistics it has
gathered after a short while.

Schema details:
CREATE TABLE changegroup
(
id numeric(18,0) NOT NULL,
issueid numeric(18,0),
author character varying(255),
created timestamp with time zone,
CONSTRAINT pk_changegroup PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
CREATE INDEX chggroup_issue
ON changegroup
USING btree
(issueid );

CREATE TABLE changeitem
(
id numeric(18,0) NOT NULL,
groupid numeric(18,0),
fieldtype character varying(255),
field character varying(255),
oldvalue text,
oldstring text,
newvalue text,
newstring text,
CONSTRAINT pk_changeitem PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);

CREATE INDEX chgitem_chggrp
ON changeitem
USING btree
(groupid );

CREATE INDEX chgitem_field
ON changeitem
USING btree
(field COLLATE pg_catalog."default" );

Table sizes
changegroup : 2,000,000 rows
changeitem : 2,500,000 rows

The changegroup table has on average about 4 rows per issueid value,
which is the query parameter.

We run autovacuum and autoanalyse, but as the activity in the table is
low these are rarely if ever invoked on these tables.

Environment.
Testing using PostgreSQL 9.1.3 on x86_64-redhat-linux-gnu, although this
is a problem across a variety of postgres versions.

Trevor Campbell

tcampbell@atlassian.com

almost 14 years ago

In reply to: Craig James (#4)

Re: Trouble with plan statistics for behaviour for query.

Thanks Craig, that certainly leads down the right path.

The following is all done in pgAdmin3:

Using an actual value we I get the plan I expect
explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE,
CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=10006 order by
CG.CREATED asc, CI.ID asc

"Sort (cost=106.18..106.22 rows=13 width=434) (actual time=0.115..0.115 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..105.94 rows=13 width=434) (actual time=0.019..0.067 rows=12 loops=1)"
" -> Index Scan using chggroup_issue on changegroup cg (cost=0.00..19.73 rows=10 width=29) (actual
time=0.009..0.013 rows=10 loops=1)"
" Index Cond: (issueid = 10006::numeric)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=3 width=411) (actual
time=0.004..0.005 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 0.153 ms"

Using a prepared statement with a variable , I get a poor plan requiring a sequential scan
prepare t2(real) as
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING,
CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=$1 order by
CG.CREATED asc, CI.ID asc;

explain analyze execute t2 (10006);

"Sort (cost=126448.89..126481.10 rows=12886 width=434) (actual time=1335.615..1335.616 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..125569.19 rows=12886 width=434) (actual time=0.046..1335.556 rows=12 loops=1)"
" -> Seq Scan on changegroup cg (cost=0.00..44709.26 rows=10001 width=29) (actual time=0.026..1335.460 rows=10
loops=1)"
" Filter: ((issueid)::double precision = $1)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.05 rows=3 width=411) (actual
time=0.007..0.008 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 1335.669 ms"

Using a prepared statement with a cast of the variable to the right type, I get the good plan back
prepare t2(real) as
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING,
CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID = CI.GROUPID where CG.ISSUEID=cast($1 as
numeric) order by CG.CREATED asc, CI.ID asc;

explain analyze execute t2 (10006);

"Sort (cost=106.19..106.22 rows=13 width=434) (actual time=0.155..0.156 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..105.95 rows=13 width=434) (actual time=0.048..0.111 rows=12 loops=1)"
" -> Index Scan using chggroup_issue on changegroup cg (cost=0.00..19.73 rows=10 width=29) (actual
time=0.031..0.042 rows=10 loops=1)"
" Index Cond: (issueid = ($1)::numeric)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58 rows=3 width=411) (actual
time=0.006..0.006 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 0.203 ms"

Now the challenge is to get java/jdbc to get this done right. We make a big effort to ensure we always use prepared
statements and variable bindings to help protect from SQL injection vulnerabilities.

Show quoted text

On 01/06/12 09:08, Craig James wrote:

I use Perl, not JDBC, but this thread may be relevant to your problem.

http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html

Craig James

cjames@emolecules.com

almost 14 years ago

In reply to: Trevor Campbell (#5)

Re: Trouble with plan statistics for behaviour for query.

On Thu, May 31, 2012 at 4:34 PM, Trevor Campbell <tcampbell@atlassian.com>wrote:

Thanks Craig, that certainly leads down the right path.

The following is all done in pgAdmin3:

Using an actual value we I get the plan I expect
explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID,
CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
CI.GROUPID where CG.ISSUEID=10006 order by CG.CREATED asc, CI.ID asc

"Sort (cost=106.18..106.22 rows=13 width=434) (actual time=0.115..0.115
rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..105.94 rows=13 width=434) (actual
time=0.019..0.067 rows=12 loops=1)"
" -> Index Scan using chggroup_issue on changegroup cg
(cost=0.00..19.73 rows=10 width=29) (actual time=0.009..0.013 rows=10
loops=1)"
" Index Cond: (issueid = 10006::numeric)"
" -> Index Scan using chgitem_chggrp on changeitem ci
(cost=0.00..8.58 rows=3 width=411) (actual time=0.004..0.005 rows=1
loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 0.153 ms"

Using a prepared statement with a variable , I get a poor plan requiring a
sequential scan
prepare t2(real) as
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
CI.GROUPID where CG.ISSUEID=$1 order by CG.CREATED asc, CI.ID asc;

explain analyze execute t2 (10006);

"Sort (cost=126448.89..126481.10 rows=12886 width=434) (actual
time=1335.615..1335.616 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..125569.19 rows=12886 width=434) (actual
time=0.046..1335.556 rows=12 loops=1)"
" -> Seq Scan on changegroup cg (cost=0.00..44709.26 rows=10001
width=29) (actual time=0.026..1335.460 rows=10 loops=1)"
" Filter: ((issueid)::double precision = $1)"
" -> Index Scan using chgitem_chggrp on changeitem ci
(cost=0.00..8.05 rows=3 width=411) (actual time=0.007..0.008 rows=1
loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 1335.669 ms"

Using a prepared statement with a cast of the variable to the right type,
I get the good plan back
prepare t2(real) as
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
CI.GROUPID where CG.ISSUEID=cast($1 as numeric) order by CG.CREATED asc,
CI.ID asc;

explain analyze execute t2 (10006);

"Sort (cost=106.19..106.22 rows=13 width=434) (actual time=0.155..0.156
rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..105.95 rows=13 width=434) (actual
time=0.048..0.111 rows=12 loops=1)"
" -> Index Scan using chggroup_issue on changegroup cg
(cost=0.00..19.73 rows=10 width=29) (actual time=0.031..0.042 rows=10
loops=1)"
" Index Cond: (issueid = ($1)::numeric)"
" -> Index Scan using chgitem_chggrp on changeitem ci
(cost=0.00..8.58 rows=3 width=411) (actual time=0.006..0.006 rows=1
loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 0.203 ms"

Now the challenge is to get java/jdbc to get this done right. We make a
big effort to ensure we always use prepared statements and variable
bindings to help protect from SQL injection vulnerabilities.

JDBC has some features that are supposed to be convenient (automatic
preparing based on a number-of-executions threshold) that strike me as
misguided. It's one thing to hide irrelevant details from the app, and
another thing entirely to cause a huge change in the exact SQL that's sent
to the server ... which is what JDBC seems to do.

I think the trick is that if you use JDBC prepared statements, you have to
understand how it's trying to be trickly and circumvent it so that you're
always in full control of what it's doing.

Craig

Show quoted text

On 01/06/12 09:08, Craig James wrote:

I use Perl, not JDBC, but this thread may be relevant to your problem.

http://postgresql.1045698.n5.**nabble.com/Slow-statement-**
when-using-JDBC-td3368379.html<http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html>

Vitalii Tymchyshyn

tivv00@gmail.com

almost 14 years ago

In reply to: Trevor Campbell (#5)

Re: Trouble with plan statistics for behaviour for query.

If I am correct, JDBC uses named portal only on the 5th time you use
PreparedStatement (configurable). Before it uses unnamed thing that
should work as if you did embed the value. So the solution is to
recreate PreparedStatement each time (so you will have no problems with
SQL injection). Note that "smart" pools may detect this situation and
reuse PreparedStatement for same query texts internally. If so, this to
switch this off.
In case you still have problems, I'd recommend you to ask in postgresql
jdbc mailing list.
Also I've heard that somewhere in 9.2 postgresql server may replan such
cases each time.

Best regards, Vitalii Tymchyshyn

01.06.12 02:34, Trevor Campbell написав(ла):

Show quoted text

Thanks Craig, that certainly leads down the right path.

The following is all done in pgAdmin3:

Using an actual value we I get the plan I expect
explain analyze select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED,
CI.ID, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE,
CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
CI.GROUPID where CG.ISSUEID=10006 order by CG.CREATED asc, CI.ID asc

"Sort (cost=106.18..106.22 rows=13 width=434) (actual
time=0.115..0.115 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..105.94 rows=13 width=434) (actual
time=0.019..0.067 rows=12 loops=1)"
" -> Index Scan using chggroup_issue on changegroup cg
(cost=0.00..19.73 rows=10 width=29) (actual time=0.009..0.013 rows=10
loops=1)"
" Index Cond: (issueid = 10006::numeric)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58
rows=3 width=411) (actual time=0.004..0.005 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 0.153 ms"

Using a prepared statement with a variable , I get a poor plan
requiring a sequential scan
prepare t2(real) as
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
CI.GROUPID where CG.ISSUEID=$1 order by CG.CREATED asc, CI.ID asc;

explain analyze execute t2 (10006);

"Sort (cost=126448.89..126481.10 rows=12886 width=434) (actual
time=1335.615..1335.616 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..125569.19 rows=12886 width=434) (actual
time=0.046..1335.556 rows=12 loops=1)"
" -> Seq Scan on changegroup cg (cost=0.00..44709.26 rows=10001
width=29) (actual time=0.026..1335.460 rows=10 loops=1)"
" Filter: ((issueid)::double precision = $1)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.05
rows=3 width=411) (actual time=0.007..0.008 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 1335.669 ms"

Using a prepared statement with a cast of the variable to the right
type, I get the good plan back
prepare t2(real) as
select CG.ID, CG.ISSUEID, CG.AUTHOR, CG.CREATED, CI.ID, CI.FIELDTYPE,
CI.FIELD, CI.OLDVALUE, CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
from PUBLIC.CHANGEGROUP CG inner join PUBLIC.CHANGEITEM CI on CG.ID =
CI.GROUPID where CG.ISSUEID=cast($1 as numeric) order by CG.CREATED
asc, CI.ID asc;

explain analyze execute t2 (10006);

"Sort (cost=106.19..106.22 rows=13 width=434) (actual
time=0.155..0.156 rows=12 loops=1)"
" Sort Key: cg.created, ci.id"
" Sort Method: quicksort Memory: 29kB"
" -> Nested Loop (cost=0.00..105.95 rows=13 width=434) (actual
time=0.048..0.111 rows=12 loops=1)"
" -> Index Scan using chggroup_issue on changegroup cg
(cost=0.00..19.73 rows=10 width=29) (actual time=0.031..0.042 rows=10
loops=1)"
" Index Cond: (issueid = ($1)::numeric)"
" -> Index Scan using chgitem_chggrp on changeitem ci (cost=0.00..8.58
rows=3 width=411) (actual time=0.006..0.006 rows=1 loops=10)"
" Index Cond: (groupid = cg.id)"
"Total runtime: 0.203 ms"

Now the challenge is to get java/jdbc to get this done right. We make
a big effort to ensure we always use prepared statements and variable
bindings to help protect from SQL injection vulnerabilities.

On 01/06/12 09:08, Craig James wrote:

I use Perl, not JDBC, but this thread may be relevant to your problem.

http://postgresql.1045698.n5.nabble.com/Slow-statement-when-using-JDBC-td3368379.html

Maciek Sakrejda

m.sakrejda@gmail.com

almost 14 years ago

In reply to: Vitalii Tymchyshyn (#7)

Re: Trouble with plan statistics for behaviour for query.

If I am correct, JDBC uses named portal only on the 5th time you use
PreparedStatement (configurable). Before it uses unnamed thing that should
work as if you did embed the value.

If this is due to the difference in parameter type information, this
doesn't have anything to do with named portals.

My guess is that the driver has one idea about parameter types (based
on either the specific setTypeFoo call or the Java type of the
parameter passed to setObject), and the server another (based on the
type information of the CHANGEGROUP.ISSUEID column). Actually, I'm
rather surprised to see 'real' there: if you're using setObject with a
Long, I would imagine that turns into a bigint (which I believe the
server knows how to coerce to numeric). Can you show us your JDBC
code?

idc danny

idcdanny@yahoo.com

almost 14 years ago

In reply to: Vitalii Tymchyshyn (#7)

Array fundamentals

Hi all,
I got 2 complementary functions, which will do opposite things.
1 - CombineStrings(stringarray character varying[]) RETURNS character varying
This one will take as parameter an array of strings and will return a string with some formatted information inside
2- SplitString2Array(stringtosplit character varying) RETURNS character varying[]
This one will take as parameter a formatted string and will return an array of string

The following is true, both works just fine :
select SplitString2Array(CombineStrings(ARRAY['abba', 'queen']))
will return {'abba', 'queen'}

Now, if I want do do the following:
select CombineStrings(ARRAY[SplitString2Array("SomeTextColumn"), 'New string to add']) from "SomeTable"

i get the following error:
array value must start with "{" or dimension information

What am I doing wrong, I am feeling I still don't get the array fundamentals. My goal is to add to inside formatted information in the column "SomeTextColumn" my new string 'New string to add' in the same manner if I would been used the following:
Insert into "SomeTable"("SomeTextColumn") values (CombineString(ARRAY['abba', 'queen', 'New string to add']))

Thank you in advance,
Danny

#10

Jeff Davis

pgsql@j-davis.com

almost 14 years ago

In reply to: idc danny (#9)

Re: [PERFORM] Array fundamentals

On Sat, 2012-06-02 at 10:05 -0700, idc danny wrote:

Now, if I want do do the following:
select CombineStrings(ARRAY[SplitString2Array("SomeTextColumn"), 'New
string to add']) from "SomeTable"

i get the following error:
array value must start with "{" or dimension information

This discussion is better suited to another list, like -general, so I'm
moving it there.

In the fragment:
ARRAY[SplitString2Array("SomeTextColumn"), 'New string to add']
The first array element is itself an array of strings, but the second is
a plain string. Array elements must all be the same type.

What you want to do is replace that fragment with something more like:
array_append(SplitString2Array("SomeTextColumn"), 'New string to add')

If that still doesn't work, we'll need to see the exact definitions of
your functions.

Also, as a debugging strategy, I recommend that you look at the pieces
that do work, and slowly build up the fragments until it doesn't work.
That will allow you to see the inputs to each function, and it makes it
easier to see why it doesn't work.

Regards,
Jeff Davis

#11

Trevor Campbell

tcampbell@atlassian.com

almost 14 years ago

In reply to: Maciek Sakrejda (#8)

Re: Trouble with plan statistics for behaviour for query.

Thanks for all your help so far. I have been away for a couple of days so my apologies for not replying earlier.

We are using a third party library to run our SQL via JDBC (not one of the common ones like Hibernate etc), but I have
been able to dig out the exact statements run in the scenario we are experiencing.

We always run a prepare and then execute a query as follows, we never reuse the prepared statement:

_ps = _connection.prepareStatement(sql, resultSetType, resultSetConcurrency);

// The values here are:
// sql = SELECT CG.ID, CG.issueid, CG.AUTHOR, CG.CREATED, CI.ID, CI.groupid, CI.FIELDTYPE, CI.FIELD, CI.OLDVALUE,
CI.OLDSTRING, CI.NEWVALUE, CI.NEWSTRING
// FROM public.changegroup CG INNER JOIN public.changeitem CI ON CG.ID = CI.groupid
// WHERE CG.issueid=? ORDER BY CG.CREATED ASC, CI.ID ASC
// resultSetType = ResultSet.TYPE_FORWARD_ONLY
// resultSetConcurrency = ResultSet.CONCUR_READ_ONLY

_ps.setLong(1, field);

_rs = _ps.executeQuery();

Show quoted text

On 02/06/12 01:38, Maciek Sakrejda wrote:

If I am correct, JDBC uses named portal only on the 5th time you use
PreparedStatement (configurable). Before it uses unnamed thing that should
work as if you did embed the value.

If this is due to the difference in parameter type information, this
doesn't have anything to do with named portals.

My guess is that the driver has one idea about parameter types (based
on either the specific setTypeFoo call or the Java type of the
parameter passed to setObject), and the server another (based on the
type information of the CHANGEGROUP.ISSUEID column). Actually, I'm
rather surprised to see 'real' there: if you're using setObject with a
Long, I would imagine that turns into a bigint (which I believe the
server knows how to coerce to numeric). Can you show us your JDBC
code?

#12

Trevor Campbell

tcampbell@atlassian.com

almost 14 years ago

In reply to: Maciek Sakrejda (#8)

Re: Trouble with plan statistics for behaviour for query.

Actually, I'm rather surprised to see 'real' there: if you're using setObject with a Long, I would imagine that turns
into a bigint (which I believe the server knows how to coerce to numeric).

The (real) is just my fault in testing. I just copy/pasted from elsewhere and it is not what is coming from the JDBC
driver, I don't know how to see what the driver is actually sending.

#13

Robert Haas

robertmhaas@gmail.com

almost 14 years ago

In reply to: idc danny (#9)

Re: Array fundamentals

On Sat, Jun 2, 2012 at 1:05 PM, idc danny <idcdanny@yahoo.com> wrote:

Hi all,
I got 2 complementary functions, which will do opposite things.
1 - CombineStrings(stringarray character varying[]) RETURNS character
varying
This one will take as parameter an array of strings and will return a string
with some formatted information inside
2- SplitString2Array(stringtosplit character varying) RETURNS character
varying[]
This one will take as parameter a formatted string and will return an array
of string

The following is true, both works just fine :
select SplitString2Array(CombineStrings(ARRAY['abba', 'queen']))
will return {'abba', 'queen'}

Now, if I want do do the following:
select CombineStrings(ARRAY[SplitString2Array("SomeTextColumn"), 'New string
to add']) from "SomeTable"
i get the following error:
array value must start with "{" or dimension information

What am I doing wrong, I am feeling I still don't get the array
fundamentals. My goal is to add to inside formatted information in the
column "SomeTextColumn" my new string 'New string to add' in the same manner
if I would been used the following:
Insert into "SomeTable"("SomeTextColumn") values
(CombineString(ARRAY['abba', 'queen', 'New string to add']))

It sounds like one or both of your functions have a bug in them, but
without knowing what they're supposed to do or seeing the source code,
it's pretty hard to guess what it might be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company