Query not using index

Started by Ryan Mahoneyalmost 25 years ago15 messagesgeneral
Jump to latest
#1Ryan Mahoney
ryan@paymentalliance.net

Here's the query:

SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;

Here's the schema:

Attribute | Type |
Modifier
-----------------------+--------------------------+---------------------------------------------------------------------
cart_row_id | integer | not null default
product_id | integer | not null
color | character varying(100) |
size | character varying(100) |
style | character varying(100) |
order_id | integer | not null
time_added | timestamp with time zone |
voided_date | timestamp with time zone |
voided_by | integer |
expired_date | timestamp with time zone |
item_state | character varying(50) | default
'in_basket'
received_back_date | timestamp with time zone |
price_charged | numeric(30,6) |
price_refunded | numeric(30,6) |
shipping_charged | numeric(30,6) |
shipping_refunded | numeric(30,6) |
price_tax_charged | numeric(30,6) |
price_tax_refunded | numeric(30,6) |
shipping_tax_charged | numeric(30,6) |
shipping_tax_refunded | numeric(30,6) |
price_name | character varying(30) |
refund_id | integer |
cs_comments | text |
price | numeric(30,6) |
ship_group_id | integer |
ship_package_id | integer |
delivery_date | date |
sentiment | text |
vendor_id | integer |
linkshare_sent | bit(1) |
mapped_to_address | character(1) |
product_name | character varying(200) |
Indices: delivery_date_pa_shopping_cart_,
pa_cart_by_item_state,
pa_cart_by_order,
pa_cart_by_product,
pa_cart_row_order,
pa_item_map_to_addr,
pa_shop_cart_prod_ord_idx,
pa_shopping_cart_pkey,
ship_package_id_pa_shopping_car,
vendor_id_pa_shopping_cart_key

There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_id

I don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.

Thanks for any help!

-r

#2Ryan Mahoney
ryan@paymentalliance.net
In reply to: Ryan Mahoney (#1)
Re: Query not using index

I vacuum every half hour! Here is the output from EXPLAIN:

NOTICE: QUERY PLAN:

Seq Scan on pa_shopping_cart (cost=0.00..7237.94 rows=9420 width=296)

EXPLAIN

Thanks!

On Thu, 10 May 2001 18:19:16 +0000 (UTC),
sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:

Show quoted text

Have you vacuum analyzed recently and what does
explain show for the query?

On Thu, 10 May 2001 ryan@paymentalliance.net wrote:

Here's the query:

SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
[ ... ]
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_id

I don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#3Ryan Mahoney
ryan@paymentalliance.net
In reply to: Ryan Mahoney (#2)
Re: Query not using index

No the query usually returns between 0 and 5 rows. Usually not zero -
most often 1.

-r

On Thu, 10 May 2001 19:47:32 +0000 (UTC), mitch@venux.net ("Mitch
Vincent") wrote:

Show quoted text

Does that query really return 9420 rows ? If so, a sequential scan is
probably better/faster than an index scan..

-Mitch

----- Original Message -----
From: <ryan@paymentalliance.net>
To: <pgsql-general@postgresql.org>
Sent: Thursday, May 10, 2001 9:22 AM
Subject: Re: Query not using index

I vacuum every half hour! Here is the output from EXPLAIN:

NOTICE: QUERY PLAN:

Seq Scan on pa_shopping_cart (cost=0.00..7237.94 rows=9420 width=296)

EXPLAIN

Thanks!

On Thu, 10 May 2001 18:19:16 +0000 (UTC),
sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:

Have you vacuum analyzed recently and what does
explain show for the query?

On Thu, 10 May 2001 ryan@paymentalliance.net wrote:

Here's the query:

SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
[ ... ]
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_id

I don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#4Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Ryan Mahoney (#1)
Re: Query not using index

Have you vacuum analyzed recently and what does
explain show for the query?

On Thu, 10 May 2001 ryan@paymentalliance.net wrote:

Show quoted text

Here's the query:

SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
[ ... ]
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_id

I don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.

#5Ryan Mahoney
ryan@paymentalliance.net
In reply to: Stephan Szabo (#4)
Re: Query not using index

You and Stephan hit it right on the nose - our table has been
maliciously propagated with thousands of faulty values - once gone
index are in use and DB is SPEEDING along 8)

Thanks for your help!!!

-r

On Thu, 10 May 2001 21:49:28 +0000 (UTC), in
comp.databases.postgresql.general you wrote:

Show quoted text

--w2JjAQZceEVGylhD
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, May 10, 2001 at 05:22:07PM -0400, Tom Lane wrote:

ryan@paymentalliance.net writes:

No the query usually returns between 0 and 5 rows. Usually not zero -
most often 1.

=20
Ah. You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.

I had a similar situation, where I had a lot of rows with 0's in
them. Changing those to NULLs worked wonders. The planner (or
statistics gatherer, or something) apparently takes notice of the
distribution of non-NULL values.

Chris

--=20
chris@mt.sri.com -----------------------------------------------------
Chris Jones SRI International, Inc.
www.sri.com

--w2JjAQZceEVGylhD
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (NetBSD)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjr7CVoACgkQ4nX8TnrnU2+p0ACaAoSNEtwIlibMlh+H9ehJecmy
lBcAnjI0TYJubbSIwgzi8DuRxoos4OwT
=edfw
-----END PGP SIGNATURE-----

--w2JjAQZceEVGylhD
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

--w2JjAQZceEVGylhD--

#6Patrick Welche
prlw1@newn.cam.ac.uk
In reply to: Ryan Mahoney (#2)
Re: Re: Query not using index

On Thu, May 10, 2001 at 01:22:56PM +0000, ryan@paymentalliance.net wrote:

I vacuum every half hour! Here is the output from EXPLAIN:

NOTICE: QUERY PLAN:

Seq Scan on pa_shopping_cart (cost=0.00..7237.94 rows=9420 width=296)

EXPLAIN

Thanks!

Then try

set enable_seqscan to off;
explain select ...

and see what the cost is. Compare it with the value above.

Cheers,

Patrick

#7Mitch Vincent
mitch@venux.net
In reply to: Stephan Szabo (#4)
Re: Query not using index

Does that query really return 9420 rows ? If so, a sequential scan is
probably better/faster than an index scan..

-Mitch

----- Original Message -----
From: <ryan@paymentalliance.net>
To: <pgsql-general@postgresql.org>
Sent: Thursday, May 10, 2001 9:22 AM
Subject: Re: Query not using index

Show quoted text

I vacuum every half hour! Here is the output from EXPLAIN:

NOTICE: QUERY PLAN:

Seq Scan on pa_shopping_cart (cost=0.00..7237.94 rows=9420 width=296)

EXPLAIN

Thanks!

On Thu, 10 May 2001 18:19:16 +0000 (UTC),
sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:

Have you vacuum analyzed recently and what does
explain show for the query?

On Thu, 10 May 2001 ryan@paymentalliance.net wrote:

Here's the query:

SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
[ ... ]
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_id

I don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Mahoney (#3)
Re: Re: Query not using index

ryan@paymentalliance.net writes:

No the query usually returns between 0 and 5 rows. Usually not zero -
most often 1.

Ah. You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.

7.2 will probably do better with this sort of data distribution, but for
now it's a difficult problem.

regards, tom lane

#9Chris Jones
chris@mt.sri.com
In reply to: Tom Lane (#8)
Re: Re: Query not using index

On Thu, May 10, 2001 at 05:22:07PM -0400, Tom Lane wrote:

ryan@paymentalliance.net writes:

No the query usually returns between 0 and 5 rows. Usually not zero -
most often 1.

Ah. You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.

I had a similar situation, where I had a lot of rows with 0's in
them. Changing those to NULLs worked wonders. The planner (or
statistics gatherer, or something) apparently takes notice of the
distribution of non-NULL values.

Chris

--
chris@mt.sri.com -----------------------------------------------------
Chris Jones SRI International, Inc.
www.sri.com

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Jones (#9)
Re: Re: Query not using index

Chris Jones <chris@mt.sri.com> writes:

Ah. You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.

I had a similar situation, where I had a lot of rows with 0's in
them. Changing those to NULLs worked wonders.

Yes, if you have a lot of "dummy" values it's a good idea to represent
them as NULLs rather than some arbitrarily-chosen regular data value.
The planner does keep track of NULLs separately from everything else.

regards, tom lane

#11Richard Huxton
dev@archonet.com
In reply to: Mitch Vincent (#7)
Re: Re: Query not using index

From: "Tom Lane" <tgl@sss.pgh.pa.us>

I had a similar situation, where I had a lot of rows with 0's in
them. Changing those to NULLs worked wonders.

Yes, if you have a lot of "dummy" values it's a good idea to represent
them as NULLs rather than some arbitrarily-chosen regular data value.
The planner does keep track of NULLs separately from everything else.

Is there a good reason why rdbms don't just keep a cache of decisions on
this stuff. I realise SQL is supposed to be ad-hoc but in reality, it's the
old 90:10 rule where a handful of queries get run consistently and where
performance is important.

Why doesn't PG (or any other system afaik) just have a first guess, run the
query and then if the costs are horribly wrong cache the right result. I'm
guessing there's a bloody good reason (TM) for it since query planning has
got to be equivalent to least-cost path so NP (NP-Complete? I forget - too
long out of college).

- Richard Huxton

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Huxton (#11)
Re: Re: Query not using index

"Richard Huxton" <dev@archonet.com> writes:

Why doesn't PG (or any other system afaik) just have a first guess, run the
query and then if the costs are horribly wrong cache the right result.

?? Knowing that your previous guess was wrong doesn't tell you what the
right answer is, especially not for the somewhat-different question that
the next query is likely to provide.

The real problem here is simply that PG hasn't been keeping adequately
detailed statistics. I'm currently working on improving that for 7.2...
see discussions over in pghackers if you are interested.

regards, tom lane

#13Bruce Momjian
bruce@momjian.us
In reply to: Richard Huxton (#11)
Re: Re: Query not using index

Is there a good reason why rdbms don't just keep a cache of decisions on
this stuff. I realise SQL is supposed to be ad-hoc but in reality, it's the
old 90:10 rule where a handful of queries get run consistently and where
performance is important.

Why doesn't PG (or any other system afaik) just have a first guess, run the
query and then if the costs are horribly wrong cache the right result. I'm
guessing there's a bloody good reason (TM) for it since query planning has
got to be equivalent to least-cost path so NP (NP-Complete? I forget - too
long out of college).

I have asked about this before. Decisions about sequential/index scans
could be theoretically fed from the executor back to the optimizer for
later user.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#14Richard Huxton
dev@archonet.com
In reply to: Mitch Vincent (#7)
Re: Re: Query not using index

From: "Tom Lane" <tgl@sss.pgh.pa.us>

"Richard Huxton" <dev@archonet.com> writes:

Why doesn't PG (or any other system afaik) just have a first guess, run

the

query and then if the costs are horribly wrong cache the right result.

?? Knowing that your previous guess was wrong doesn't tell you what the
right answer is, especially not for the somewhat-different question that
the next query is likely to provide.

Surely if you used a seqscan on "where x=1" and only got 2 rows rather than
the 3000 you were expecting the only alternative is to try an index?

The real problem here is simply that PG hasn't been keeping adequately
detailed statistics. I'm currently working on improving that for 7.2...
see discussions over in pghackers if you are interested.

Thinking about it (along with Bruce's reply posted to the list) I guess the
difference is whether you gather the statistics up-front during a vacuum, or
build them as queries are used. You're always going to need *something* to
base your first guess on anyway - the "learning" would only help you in
those cases where the distribution of values wasn't a normal curve.

Anyway, given that I'm up to my neck in work at the moment and I don't
actually know what I'm talking about, I'll shut up and get back to keeping
clients happy :-)

- Richard Huxton

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Huxton (#14)
Re: Re: Query not using index

"Richard Huxton" <dev@archonet.com> writes:

?? Knowing that your previous guess was wrong doesn't tell you what the
right answer is, especially not for the somewhat-different question that
the next query is likely to provide.

Surely if you used a seqscan on "where x=1" and only got 2 rows rather than
the 3000 you were expecting the only alternative is to try an index?

But if the next query is "where x=2", what do you do? Keep in mind that
the data distributions people have been having trouble with are
irregular: you can't conclude anything very reliable about x=2 based on
what you know about x=1.

Thinking about it (along with Bruce's reply posted to the list) I guess the
difference is whether you gather the statistics up-front during a vacuum, or
build them as queries are used.

Stats gathered as a byproduct of individual queries might be useful if
you happen to get the exact same queries over again, but I doubt that
a succession of such results should be expected to build up a picture
that's complete enough to extrapolate to other queries. Stats gathered
by ANALYZE have the merit that they come from a process that's designed
specifically to give you a good statistical picture.

regards, tom lane