Select time jump after adding filter; please help me figure out what I'm doing wrong.

Started by Andrew Edsonover 18 years ago9 messagesgeneral
Jump to latest
#1Andrew Edson
cheighlund@yahoo.com

I've been working on a db project intended to perform modifications to one db based on the data stored in another one. Specifically, I'm supposed to check a pair of tables based on two criteria; an id field, and a timestamp. This will be crossreferenced with the second database; the timestamps will be used to update timestamps on anything which resides in the table of interest there.

I was running the sequence through perl; with 76 records in the test copy of the second database, I was getting a four, five minute run. Not really bad, but I'm expecting the number of records in the second db to hit several thousand in production, so I thought I'd see if I could speed things up by adding an index on what I'm searching for.

After about an hour of run time, I killed the program and started looking into things.

The following is a copy of my index creation statement, index name, and explain and explain analyze output on the statement I was trying to run. Would someone please help me figure out what I'm doing wrong here?

attest=# create index ptrans_cid_trandt_idx on ptrans(cntrct_id, tran_dt) where rcrd_cd = '0A';

"ptrans_cid_trandt_idx" btree (cntrct_id, tran_dt) WHERE rcrd_cd = '0A'::bpchar

attest=# explain select cntrct_id, tran_dt from ptrans where rcrd_cd = 0;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on ptrans (cost=0.00..426034.67 rows=82443 width=21)
Filter: ((rcrd_cd)::text = '0'::text)
(2 rows)

attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd = 0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on ptrans (cost=0.00..426034.67 rows=82443 width=21) (actual time=60585.740..60585.740 rows=0 loops=1)
Filter: ((rcrd_cd)::text = '0'::text)
Total runtime: 60585.797 ms
(3 rows)

---------------------------------
Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when.

#2Richard Huxton
dev@archonet.com
In reply to: Andrew Edson (#1)
Re: Select time jump after adding filter; please help me figure out what I'm doing wrong.

Andrew Edson wrote:

The following is a copy of my index creation statement, index name, and explain and explain analyze output on the statement I was trying to run. Would someone please help me figure out what I'm doing wrong here?

attest=# create index ptrans_cid_trandt_idx on ptrans(cntrct_id, tran_dt) where rcrd_cd = '0A';

attest=# explain select cntrct_id, tran_dt from ptrans where rcrd_cd = 0;

attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd = 0;

The index has a where clause that doesn't match your query. It wouldn't
use the index anyway - you're not filtering or sorting on it.

--
Richard Huxton
Archonet Ltd

#3Richard Huxton
dev@archonet.com
In reply to: Richard Huxton (#2)
Re: Select time jump after adding filter; please help me figure out what I'm doing wrong.

Don't forget to CC: the list.

Andrew Edson wrote:

I apologize; You are correct in that I mistyped my original structure. Here is the information for the correct explain and explain analyze statements.

attest=# explain select cntrct_id, tran_dt from ptrans where rcrd_cd = '0A';

No need for the simple explain - explain analyse includes all the
information.

attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd = '0A';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ptrans (cost=1223.86..149853.23 rows=85741 width=21) (actual time=2302.363..70321.838 rows=6701655 loops=1)
-> Bitmap Index Scan on ptrans_cid_trandt_idx (cost=0.00..1223.86 rows=85741 width=0) (actual time=2269.064..2269.064 rows=204855 loops=1)
Total runtime: 89854.843 ms

Well, it's taking 90 seconds to return 6.7 million rows. Depending on
your system and memory settings, that might not be unreasonable.

It *is* getting the estimate of returned rows wrong (it thinks 85,741
will match) which is hugely out of line. Is there something odd with
this table/column or haven't you analysed recently? How many unique
values does rcrd_cd have, and how many rows does the table have?

I don't know that you'll get this down to sub-second responses though,
not if you're trying to return 6 million rows from an even larger table.

--
Richard Huxton
Archonet Ltd

#4Andrew Edson
cheighlund@yahoo.com
In reply to: Richard Huxton (#3)
Re: Select time jump after adding filter; please help me figure out what I'm doing wrong.

I apologize about the CC; I thought I had done so.

There are fourteen (14) distinct values in rcrd_cd. And I don't know if this counts as something odd, but I got the following values by doing a vacuum full analyze, then running the set with index, dropping index, running set without.

Values with index:

attest=# select count(*) from ptrans;
16488578

attest=# select count(*) from ptrans where rcrd_cd = '0A';
6701655

attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd = ' > 0A';
Bitmap Heap Scan on ptrans (cost=1223.86..151183.39 rows=87439 width=21) (actu > al time=2255.640..70371.304 rows=6701655 loops=1)
-> Bitmap Index Scan on ptrans_cid_trandt_idx (cost=0.00..1223.86 rows=8743 > 9 width=0) (actual time=2216.856..2216.856 rows=204855 loops=1)
Total runtime: 89964.419 ms

Values without index:

attest=# select count(*) from ptrans;
16488578

attest=# select count(*) from ptrans where rcrd_cd = '0A';
204855

attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd = '0A';
Seq Scan on ptrans (cost=0.00..384813.22 rows=87439 width=21) (actual time=20.286..65330.049 rows=204855 loops=1)
Filter: (rcrd_cd = '0A'::bpchar)
Total runtime: 65945.160 ms

Again, someone mind pointing out to me where I've managed to mess this one up?

Richard Huxton <dev@archonet.com> wrote:
Don't forget to CC: the list.

Andrew Edson wrote:

I apologize; You are correct in that I mistyped my original structure. Here is the information for the correct explain and explain analyze statements.

attest=# explain select cntrct_id, tran_dt from ptrans where rcrd_cd = '0A';

No need for the simple explain - explain analyse includes all the
information.

attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd = '0A';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on ptrans (cost=1223.86..149853.23 rows=85741 width=21) (actual time=2302.363..70321.838 rows=6701655 loops=1)
-> Bitmap Index Scan on ptrans_cid_trandt_idx (cost=0.00..1223.86 rows=85741 width=0) (actual time=2269.064..2269.064 rows=204855 loops=1)
Total runtime: 89854.843 ms

Well, it's taking 90 seconds to return 6.7 million rows. Depending on
your system and memory settings, that might not be unreasonable.

It *is* getting the estimate of returned rows wrong (it thinks 85,741
will match) which is hugely out of line. Is there something odd with
this table/column or haven't you analysed recently? How many unique
values does rcrd_cd have, and how many rows does the table have?

I don't know that you'll get this down to sub-second responses though,
not if you're trying to return 6 million rows from an even larger table.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

---------------------------------
Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV.

#5Richard Huxton
dev@archonet.com
In reply to: Andrew Edson (#4)
Re: Select time jump after adding filter; please help me figure out what I'm doing wrong.

Andrew Edson wrote:

I apologize about the CC; I thought I had done so.

no problem

There are fourteen (14) distinct values in rcrd_cd. And I don't know
if this counts as something odd, but I got the following values by
doing a vacuum full analyze, then running the set with index,
dropping index, running set without.

Might want to do ALTER TABLE ... ALTER COLUMN rcrd_cd SET STATISTICS =
14 (or a few more than that if you think it might be useful) - won't
help you with this though.

So - are you saying that with these two queries...

attest=# select count(*) from ptrans where rcrd_cd = '0A';
6701655

attest=# select count(*) from ptrans where rcrd_cd = '0A';
204855

...the only difference is that you've dropped an index?

Because that's just strange - and I don't think it's anything you're doing.

Do you have other partial indexes for different values of rcrd_cd, and
do they have similar problems? If this can be reproduced it might point
to something odd with bitmap scans.

Oh, remind me what version of PostgreSQL you're running?

--
Richard Huxton
Archonet Ltd

#6Andrew Edson
cheighlund@yahoo.com
In reply to: Richard Huxton (#5)
Re: Select time jump after adding filter; please help me figure out what I'm doing wrong.

Yes. The only difference between the two selects was that the index on the table in question was dropped. As far as I know, that was the only partial index on there, although since it's a test db, I could probably go in and experiment on a few more if needed.

This problem may have already been solved; I'm using an older version of Postgres; 8.1.3. My boss has requested that it not be upgraded just yet, however, so I'm stuck with it for the moment.

Richard Huxton <dev@archonet.com> wrote:
Andrew Edson wrote:

I apologize about the CC; I thought I had done so.

no problem

There are fourteen (14) distinct values in rcrd_cd. And I don't know
if this counts as something odd, but I got the following values by
doing a vacuum full analyze, then running the set with index,
dropping index, running set without.

Might want to do ALTER TABLE ... ALTER COLUMN rcrd_cd SET STATISTICS =
14 (or a few more than that if you think it might be useful) - won't
help you with this though.

So - are you saying that with these two queries...

attest=# select count(*) from ptrans where rcrd_cd = '0A';
6701655

attest=# select count(*) from ptrans where rcrd_cd = '0A';
204855

...the only difference is that you've dropped an index?

Because that's just strange - and I don't think it's anything you're doing.

Do you have other partial indexes for different values of rcrd_cd, and
do they have similar problems? If this can be reproduced it might point
to something odd with bitmap scans.

Oh, remind me what version of PostgreSQL you're running?

--
Richard Huxton
Archonet Ltd

---------------------------------
Moody friends. Drama queens. Your life? Nope! - their life, your story.
Play Sims Stories at Yahoo! Games.

#7Richard Huxton
dev@archonet.com
In reply to: Andrew Edson (#6)
Re: Select time jump after adding filter; please help me figure out what I'm doing wrong.

Andrew Edson wrote:

Yes. The only difference between the two selects was that the index
on the table in question was dropped. As far as I know, that was the
only partial index on there, although since it's a test db, I could
probably go in and experiment on a few more if needed.

This problem may have already been solved; I'm using an older version
of Postgres; 8.1.3. My boss has requested that it not be upgraded
just yet, however, so I'm stuck with it for the moment.

You should probably upgrade to 8.1.9 - there are a couple of bugfixes
that mention bitmap scans, although nothing that describes your problem
in-depth.

http://www.postgresql.org/docs/8.1/static/release.html#RELEASE-8-1-9

You don't need to dump/restore the database, it's just a matter of
upgrading the binaries within the 8.1 series. You should plan on doing
point-release updates fairly frequently.

I'd put the partial index back, see if you can reproduce it and if so
tell your boss you should upgrade. If you can't talk him around, give
him the URL for the release notes along with the three bugs you think
are most dangerous. If that doesn't work, I'm sure we can get someone
from the core developers to tell you the same thing I just have, and we
can forward that to your boss :-)

--
Richard Huxton
Archonet Ltd

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Edson (#6)
Re: Select time jump after adding filter; please help me figure out what I'm doing wrong.

Andrew Edson <cheighlund@yahoo.com> writes:

This problem may have already been solved; I'm using an older
version of Postgres; 8.1.3.

Ah. I think your result is explained by this 8.1.4 bug fix:

2006-05-18 14:57 tgl

* src/backend/optimizer/plan/createplan.c (REL8_1_STABLE): When a
bitmap indexscan is using a partial index, it is necessary to
include the partial index predicate in the scan's "recheck
condition". Otherwise, if the scan becomes lossy for lack of
bitmap memory, we would fail to enforce that returned rows satisfy
the predicate. Noted while studying bug #2441 from Arjen van der
Meijden.

IOW, once the bitmap goes lossy, we'd return *every* row on any page
that the index fingered as having *any* relevant row.

My boss has requested that it not be
upgraded just yet, however, so I'm stuck with it for the moment.

Better press him a little harder. There are a lot of known bugs in
8.1.3, and not any very good reason not to update to a later 8.1.x.

As a really short-term measure, you could possibly avoid this bug by
increasing work_mem enough that the bitmap doesn't get compressed.

regards, tom lane

#9Andrew Edson
cheighlund@yahoo.com
In reply to: Tom Lane (#8)
Re: Select time jump after adding filter; please help me figure out what I'm doing wrong.

Tom Lane <tgl@sss.pgh.pa.us> wrote: Ah. I think your result is explained by this 8.1.4 bug fix:

2006-05-18 14:57 tgl

* src/backend/optimizer/plan/createplan.c (REL8_1_STABLE): When a
bitmap indexscan is using a partial index, it is necessary to
include the partial index predicate in the scan's "recheck
condition". Otherwise, if the scan becomes lossy for lack of
bitmap memory, we would fail to enforce that returned rows satisfy
the predicate. Noted while studying bug #2441 from Arjen van der
Meijden.

IOW, once the bitmap goes lossy, we'd return *every* row on any page
that the index fingered as having *any* relevant row.

Better press him a little harder. There are a lot of known bugs in
8.1.3, and not any very good reason not to update to a later 8.1.x.

As a really short-term measure, you could possibly avoid this bug by
increasing work_mem enough that the bitmap doesn't get compressed.

regards, tom lane
I'll press as I can...thanks for pointing that out to me. How would I determine how much to increase work_mem by (or to) until then?

---------------------------------
Got a little couch potato?
Check out fun summer activities for kids.