check sql progress

Started by Johnson, Shaunnabout 24 years ago3 messagesgeneral
Jump to latest
#1Johnson, Shaunn
SJohnson6@bcbsm.com

Howdy:

I'm running Postgres 7.1.3 on RedHat Linux 7.2 kernel
version 2.4.7 rel 10.

Basic SQL questions (I think). I'm trying to write
a query:

[script]

--explain
drop table t_bp_stage9;
create table t_bp_stage9 as
select
lpad (a.member_id::text,10,'0') as bp_member_id,
b.bp_disease_id,
b.bp_measure_id,
b.bp_end_period_id,
b.bp_leg_cust_memb_id,
b.bp_source_code
from
t_bp_test1 a, t_bp_stage8 b
where
cast(a.member_id_legacy_cust as char) = b.bp_leg_cust_memb_id
;
grant select on t_bp_stage9 to public;

[/script]

When I do and explain, I get this:

[explain]

psql:./test2:17: NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..32520.00 rows=10000 width=84)
-> Seq Scan on t_bp_test1 a (cost=0.00..20.00 rows=1000 width=24)
-> Seq Scan on t_bp_stage8 b (cost=0.00..20.00 rows=1000 width=60)

EXPLAIN

[/explain]

I started the job from last night (about 6pm) and had to
kill it when I got in this morning (8am). I know the
data is large (t_bp_stage8 = 8183745 records;
t_bp_test1 =1677375 records) but what I haven't seen
is the partition space where the database lives get smaller
(when ever I create large tables, I check the partition
and, sure enough, the file system loses a bit of space).

My questions: How can I check to see if this query is actually
doing something? Is my query even right? Even though there's
temporary, should I index them and drop them later?

Suggestions?

Thanks!

-X

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Johnson, Shaunn (#1)
Re: check sql progress

"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:

When I do and explain, I get this:

Nested Loop (cost=0.00..32520.00 rows=10000 width=84)
-> Seq Scan on t_bp_test1 a (cost=0.00..20.00 rows=1000 width=24)
-> Seq Scan on t_bp_stage8 b (cost=0.00..20.00 rows=1000 width=60)

I started the job from last night (about 6pm) and had to
kill it when I got in this morning (8am). I know the
data is large (t_bp_stage8 = 8183745 records;
t_bp_test1 =1677375 records)

The planner evidently doesn't know that. Have you ever done a VACUUM
(perhaps better, VACUUM ANALYZE) on these tables?

Also, I'd suggest simplifying the where clause:

where cast(a.member_id_legacy_cust as char) = b.bp_leg_cust_memb_id;

Can't that be just

where a.member_id_legacy_cust = b.bp_leg_cust_memb_id;

? I think that the added cast will probably prevent the planner from
using any intelligent query plans, like merge or hash join.

regards, tom lane

#3Johnson, Shaunn
SJohnson6@bcbsm.com
In reply to: Tom Lane (#2)
Re: check sql progress

--Thanks for the reply:

--I will try the 'vacuum analyze' on the tables now.

--As far as:

[snip code]

where cast(a.member_id_legacy_cust as char) = b.bp_leg_cust_memb_id;

[/snip]

--The reason I did that was because "a.member_id_legacy_cust" is something
from an MS Access table and the definitions of the columns were different:

[table of t_bp_test1]

Table "t_bp_test1"
Attribute | Type | Modifier
--------------------------+------------------------+----------
prod_id | integer |
member_id | character varying(254) |
member_id_consist | character varying(254) |
member_id_legacy | character varying(254) |
member_id_legacy_consist | character varying(254) |
member_id_legacy_cust | character varying(254) |
members_consist | integer

[/table]

[table of t_bp_stage8]

Table "t_bp_stage8"
Attribute | Type | Modifier
---------------------+-----------+----------
bp_disease_id | text |
bp_measure_id | text |
bp_end_period_id | text |
bp_leg_cust_memb_id | character |
bp_source_code | text |

[/table]

--I just thought it would be easier to CAST to char.
Maybe that's what's the holdup ... but if I didn't have
cast, then I get an error:

[error without using CAST]

psql:./test2:17: ERROR: Unable to identify an operator '='
for types 'varchar' and 'bpchar'

You will have to retype this query using an explicit cast

[/error]

--Maybe I'm going about this the wrong way - could I join
or intersect and get the right results? I wonder ...

--Or, maybe create a new table with new column definitions ...
but that seems to be counterproductive ... I guess ...

--Thanks again!

-X

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:

When I do and explain, I get this:

Nested Loop (cost=0.00..32520.00 rows=10000 width=84)
-> Seq Scan on t_bp_test1 a (cost=0.00..20.00 rows=1000 width=24)
-> Seq Scan on t_bp_stage8 b (cost=0.00..20.00 rows=1000 width=60)

I started the job from last night (about 6pm) and had to
kill it when I got in this morning (8am). I know the
data is large (t_bp_stage8 = 8183745 records;
t_bp_test1 =1677375 records)

The planner evidently doesn't know that. Have you ever done a VACUUM
(perhaps better, VACUUM ANALYZE) on these tables?

Also, I'd suggest simplifying the where clause:

where cast(a.member_id_legacy_cust as char) = b.bp_leg_cust_memb_id;

Can't that be just

where a.member_id_legacy_cust = b.bp_leg_cust_memb_id;

? I think that the added cast will probably prevent the planner from
using any intelligent query plans, like merge or hash join.

regards, tom lane