Strange discrepancy in query performance...

Started by Jason L. Buberelover 18 years ago5 messagesgeneral
Jump to latest
#1Jason L. Buberel
jason@buberel.org

I'm hoping that someone on the list can help me understand an apparent
discrepancy in the performance information that I'm collecting on a
particularly troublesome query.

The configuration: pg-8.2.4 on RHEL4. log_min_duration_statement = 1m.

In my syslog output, I see entries indicating that the
JDBC-driver-originated query on a table named 'city_summary' are taking
upwards of 300 seconds:

Oct 1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
LOG: duration: 307077.037 ms execute S_42: select * from city_summary
where state = $1 and city_master_id = $2 and res_type =
'single_family' and date = $3
and range = 90 and zip = $4 and quartile = '__ALL'
DETAIL: parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'

However, if I run the same query on the same host at the same time that
the Java application is running, but from the psql command line, it
takes only 0.37 seconds:

time /opt/postgres-8.2.4/bin/psql --port 54824 -U postgres -d

altos_research -c 'select fact_id from city_summary where state =
\'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile =
\'__ALL\' and res_type = \'single_family\' and range = \'90\' and date =
\'2007-09-28\';'

fact_id
----------
46624925
(1 row)

0.00user 0.00system 0:00.37elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+285minor)pagefaults 0swaps

The output of 'explain' seems to indicate that the right index is being
used:

QUERY PLAN
Index Scan using city_summary_pkey on city_summary (cost=0.00..12.27
rows=1 width=2721)
Index Cond: ((date = '2007-09-28'::text) AND
(state = 'CA'::text) AND
(city_master_id = 334::bigint) AND
(quartile = '__ALL'::text) AND
(range = '90'::text))
Filter: ((zip = '__ALL'::text) AND ((res_type)::text =
'single_family'::text))
(3 rows)

The index looks like this:

# \d city_summary_pkey
Index "public.city_summary_pkey"
Column | Type
--------------------+---------
date | text
state | text
city_master_id | bigint
zip_master_id | integer
res_type_master_id | bigint
quartile | text
range | text
primary key, btree, for table "public.city_summary"

Any ideas on why I am seeing such a big difference between the two
measurements (JDBC/syslog vs. command line)?

Thanks,
Jason

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jason L. Buberel (#1)
Re: Strange discrepancy in query performance...

"Jason L. Buberel" <jason@buberel.org> writes:

In my syslog output, I see entries indicating that the
JDBC-driver-originated query on a table named 'city_summary' are taking
upwards of 300 seconds:

Oct 1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
LOG: duration: 307077.037 ms execute S_42: select * from city_summary
where state = $1 and city_master_id = $2 and res_type =
'single_family' and date = $3
and range = 90 and zip = $4 and quartile = '__ALL'
DETAIL: parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'

However, if I run the same query on the same host at the same time that
the Java application is running, but from the psql command line, it
takes only 0.37 seconds:

time /opt/postgres-8.2.4/bin/psql --port 54824 -U postgres -d

altos_research -c 'select fact_id from city_summary where state =
\'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile =
\'__ALL\' and res_type = \'single_family\' and range = \'90\' and date =
\'2007-09-28\';'

This is not, in fact, the same query --- the JDBC-originated one is
parameterized, which means it very possibly has a different plan
(since the planner doesn't know the particular values to plan for).

Try using PREPARE and EXPLAIN EXECUTE to examine the plan that is
being produced for the parameterized query.

regards, tom lane

#3Jason L. Buberel
jason@buberel.org
In reply to: Tom Lane (#2)
Re: Strange discrepancy in query performance...

Tom-right-as-usual:

Yep - you were right about the query plan for the prepared statement (a
sequential scan of the table) differed a bit from the directly-executed
version :)

For reference, when using JasperReports .jrxml files as the basis for
the query, I only had to do to the following to 'force' postgres to
treat the jasper report parameter as a number and not text, thereby
allowing the correct index to be used:

select * from city summary where city_master_id =
$P{city_master_id}::bigint ...

Query times went from 300+ seconds back down to ~100ms.

-jason

Tom Lane wrote:

Show quoted text

"Jason L. Buberel" <jason@buberel.org> writes:

In my syslog output, I see entries indicating that the
JDBC-driver-originated query on a table named 'city_summary' are taking
upwards of 300 seconds:

Oct 1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
LOG: duration: 307077.037 ms execute S_42: select * from city_summary
where state = $1 and city_master_id = $2 and res_type =
'single_family' and date = $3
and range = 90 and zip = $4 and quartile = '__ALL'
DETAIL: parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'

However, if I run the same query on the same host at the same time that
the Java application is running, but from the psql command line, it
takes only 0.37 seconds:

time /opt/postgres-8.2.4/bin/psql --port 54824 -U postgres -d

altos_research -c 'select fact_id from city_summary where state =
\'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile =
\'__ALL\' and res_type = \'single_family\' and range = \'90\' and date =
\'2007-09-28\';'

This is not, in fact, the same query --- the JDBC-originated one is
parameterized, which means it very possibly has a different plan
(since the planner doesn't know the particular values to plan for).

Try using PREPARE and EXPLAIN EXECUTE to examine the plan that is
being produced for the parameterized query.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jason L. Buberel (#3)
Re: Strange discrepancy in query performance...

"Jason L. Buberel" <jason@buberel.org> writes:

For reference, when using JasperReports .jrxml files as the basis for
the query, I only had to do to the following to 'force' postgres to
treat the jasper report parameter as a number and not text, thereby
allowing the correct index to be used:

select * from city summary where city_master_id =
$P{city_master_id}::bigint ...

Query times went from 300+ seconds back down to ~100ms.

Hmm ... if Postgres were just given the parameter symbol with no type
information, I believe it would have assumed it was bigint (or in
general, the same type as what it's being compared to). So your
problem suggests that Jasper is deliberately telling the backend that
that parameter is of type text. If that's coming from something you
did in your code, you probably ought to change the code. If not,
it seems like a bug/omission in Jasper.

regards, tom lane

#5Jason L. Buberel
jason@buberel.org
In reply to: Tom Lane (#4)
Re: Strange discrepancy in query performance...

I agree that this is a bug in JasperReports. I've been stepping throgh
their code to determine where the paramter type is set to
'java.lang.String', but have not yet figured out how their Java API will
allow me to override that with 'java.lang.Integer' or something more
appropriate.

If I figure something out, I'll post to the list.

Regards,
jason

Tom Lane wrote:

Show quoted text

Hmm ... if Postgres were just given the parameter symbol with no type
information, I believe it would have assumed it was bigint (or in
general, the same type as what it's being compared to). So your
problem suggests that Jasper is deliberately telling the backend that
that parameter is of type text. If that's coming from something you
did in your code, you probably ought to change the code. If not,
it seems like a bug/omission in Jasper.

regards, tom lane