Query not using index
Here's the query:
SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
Here's the schema:
Attribute | Type |
Modifier
-----------------------+--------------------------+---------------------------------------------------------------------
cart_row_id | integer | not null default
product_id | integer | not null
color | character varying(100) |
size | character varying(100) |
style | character varying(100) |
order_id | integer | not null
time_added | timestamp with time zone |
voided_date | timestamp with time zone |
voided_by | integer |
expired_date | timestamp with time zone |
item_state | character varying(50) | default
'in_basket'
received_back_date | timestamp with time zone |
price_charged | numeric(30,6) |
price_refunded | numeric(30,6) |
shipping_charged | numeric(30,6) |
shipping_refunded | numeric(30,6) |
price_tax_charged | numeric(30,6) |
price_tax_refunded | numeric(30,6) |
shipping_tax_charged | numeric(30,6) |
shipping_tax_refunded | numeric(30,6) |
price_name | character varying(30) |
refund_id | integer |
cs_comments | text |
price | numeric(30,6) |
ship_group_id | integer |
ship_package_id | integer |
delivery_date | date |
sentiment | text |
vendor_id | integer |
linkshare_sent | bit(1) |
mapped_to_address | character(1) |
product_name | character varying(200) |
Indices: delivery_date_pa_shopping_cart_,
pa_cart_by_item_state,
pa_cart_by_order,
pa_cart_by_product,
pa_cart_row_order,
pa_item_map_to_addr,
pa_shop_cart_prod_ord_idx,
pa_shopping_cart_pkey,
ship_package_id_pa_shopping_car,
vendor_id_pa_shopping_cart_key
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_id
I don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.
Thanks for any help!
-r
I vacuum every half hour! Here is the output from EXPLAIN:
NOTICE: QUERY PLAN:
Seq Scan on pa_shopping_cart (cost=0.00..7237.94 rows=9420 width=296)
EXPLAIN
Thanks!
On Thu, 10 May 2001 18:19:16 +0000 (UTC),
sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:
Show quoted text
Have you vacuum analyzed recently and what does
explain show for the query?On Thu, 10 May 2001 ryan@paymentalliance.net wrote:
Here's the query:
SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
[ ... ]
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_idI don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Import Notes
Reference msg id not found: Pine.BSF.4.21.0105101112500.92606-100000@megazone23.bigpanda.com | Resolved by subject fallback
No the query usually returns between 0 and 5 rows. Usually not zero -
most often 1.
-r
On Thu, 10 May 2001 19:47:32 +0000 (UTC), mitch@venux.net ("Mitch
Vincent") wrote:
Show quoted text
Does that query really return 9420 rows ? If so, a sequential scan is
probably better/faster than an index scan..-Mitch
----- Original Message -----
From: <ryan@paymentalliance.net>
To: <pgsql-general@postgresql.org>
Sent: Thursday, May 10, 2001 9:22 AM
Subject: Re: Query not using indexI vacuum every half hour! Here is the output from EXPLAIN:
NOTICE: QUERY PLAN:
Seq Scan on pa_shopping_cart (cost=0.00..7237.94 rows=9420 width=296)
EXPLAIN
Thanks!
On Thu, 10 May 2001 18:19:16 +0000 (UTC),
sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:Have you vacuum analyzed recently and what does
explain show for the query?On Thu, 10 May 2001 ryan@paymentalliance.net wrote:
Here's the query:
SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
[ ... ]
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_idI don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Import Notes
Reference msg id not found: 00af01c0d987$b5f5a180$1251000a@windows | Resolved by subject fallback
Have you vacuum analyzed recently and what does
explain show for the query?
On Thu, 10 May 2001 ryan@paymentalliance.net wrote:
Show quoted text
Here's the query:
SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
[ ... ]
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_idI don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.
You and Stephan hit it right on the nose - our table has been
maliciously propagated with thousands of faulty values - once gone
index are in use and DB is SPEEDING along 8)
Thanks for your help!!!
-r
On Thu, 10 May 2001 21:49:28 +0000 (UTC), in
comp.databases.postgresql.general you wrote:
Show quoted text
--w2JjAQZceEVGylhD
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printableOn Thu, May 10, 2001 at 05:22:07PM -0400, Tom Lane wrote:
ryan@paymentalliance.net writes:
No the query usually returns between 0 and 5 rows. Usually not zero -
most often 1.=20
Ah. You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.I had a similar situation, where I had a lot of rows with 0's in
them. Changing those to NULLs worked wonders. The planner (or
statistics gatherer, or something) apparently takes notice of the
distribution of non-NULL values.Chris
--=20
chris@mt.sri.com -----------------------------------------------------
Chris Jones SRI International, Inc.
www.sri.com--w2JjAQZceEVGylhD
Content-Type: application/pgp-signature
Content-Disposition: inline-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (NetBSD)
Comment: For info see http://www.gnupg.orgiEYEARECAAYFAjr7CVoACgkQ4nX8TnrnU2+p0ACaAoSNEtwIlibMlh+H9ehJecmy
lBcAnjI0TYJubbSIwgzi8DuRxoos4OwT
=edfw
-----END PGP SIGNATURE-------w2JjAQZceEVGylhD
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster--w2JjAQZceEVGylhD--
Import Notes
Reply to msg id not found: 20010510153418.G26743@mt.sri.comReference msg id not found: 20010510153418.G26743@mt.sri.com | Resolved by subject fallback
On Thu, May 10, 2001 at 01:22:56PM +0000, ryan@paymentalliance.net wrote:
I vacuum every half hour! Here is the output from EXPLAIN:
NOTICE: QUERY PLAN:
Seq Scan on pa_shopping_cart (cost=0.00..7237.94 rows=9420 width=296)
EXPLAIN
Thanks!
Then try
set enable_seqscan to off;
explain select ...
and see what the cost is. Compare it with the value above.
Cheers,
Patrick
Does that query really return 9420 rows ? If so, a sequential scan is
probably better/faster than an index scan..
-Mitch
----- Original Message -----
From: <ryan@paymentalliance.net>
To: <pgsql-general@postgresql.org>
Sent: Thursday, May 10, 2001 9:22 AM
Subject: Re: Query not using index
Show quoted text
I vacuum every half hour! Here is the output from EXPLAIN:
NOTICE: QUERY PLAN:
Seq Scan on pa_shopping_cart (cost=0.00..7237.94 rows=9420 width=296)
EXPLAIN
Thanks!
On Thu, 10 May 2001 18:19:16 +0000 (UTC),
sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:Have you vacuum analyzed recently and what does
explain show for the query?On Thu, 10 May 2001 ryan@paymentalliance.net wrote:
Here's the query:
SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;
[ ... ]
There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_idI don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
ryan@paymentalliance.net writes:
No the query usually returns between 0 and 5 rows. Usually not zero -
most often 1.
Ah. You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.
7.2 will probably do better with this sort of data distribution, but for
now it's a difficult problem.
regards, tom lane
On Thu, May 10, 2001 at 05:22:07PM -0400, Tom Lane wrote:
ryan@paymentalliance.net writes:
No the query usually returns between 0 and 5 rows. Usually not zero -
most often 1.Ah. You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.
I had a similar situation, where I had a lot of rows with 0's in
them. Changing those to NULLs worked wonders. The planner (or
statistics gatherer, or something) apparently takes notice of the
distribution of non-NULL values.
Chris
--
chris@mt.sri.com -----------------------------------------------------
Chris Jones SRI International, Inc.
www.sri.com
Chris Jones <chris@mt.sri.com> writes:
Ah. You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.
I had a similar situation, where I had a lot of rows with 0's in
them. Changing those to NULLs worked wonders.
Yes, if you have a lot of "dummy" values it's a good idea to represent
them as NULLs rather than some arbitrarily-chosen regular data value.
The planner does keep track of NULLs separately from everything else.
regards, tom lane
From: "Tom Lane" <tgl@sss.pgh.pa.us>
I had a similar situation, where I had a lot of rows with 0's in
them. Changing those to NULLs worked wonders.Yes, if you have a lot of "dummy" values it's a good idea to represent
them as NULLs rather than some arbitrarily-chosen regular data value.
The planner does keep track of NULLs separately from everything else.
Is there a good reason why rdbms don't just keep a cache of decisions on
this stuff. I realise SQL is supposed to be ad-hoc but in reality, it's the
old 90:10 rule where a handful of queries get run consistently and where
performance is important.
Why doesn't PG (or any other system afaik) just have a first guess, run the
query and then if the costs are horribly wrong cache the right result. I'm
guessing there's a bloody good reason (TM) for it since query planning has
got to be equivalent to least-cost path so NP (NP-Complete? I forget - too
long out of college).
- Richard Huxton
"Richard Huxton" <dev@archonet.com> writes:
Why doesn't PG (or any other system afaik) just have a first guess, run the
query and then if the costs are horribly wrong cache the right result.
?? Knowing that your previous guess was wrong doesn't tell you what the
right answer is, especially not for the somewhat-different question that
the next query is likely to provide.
The real problem here is simply that PG hasn't been keeping adequately
detailed statistics. I'm currently working on improving that for 7.2...
see discussions over in pghackers if you are interested.
regards, tom lane
Is there a good reason why rdbms don't just keep a cache of decisions on
this stuff. I realise SQL is supposed to be ad-hoc but in reality, it's the
old 90:10 rule where a handful of queries get run consistently and where
performance is important.Why doesn't PG (or any other system afaik) just have a first guess, run the
query and then if the costs are horribly wrong cache the right result. I'm
guessing there's a bloody good reason (TM) for it since query planning has
got to be equivalent to least-cost path so NP (NP-Complete? I forget - too
long out of college).
I have asked about this before. Decisions about sequential/index scans
could be theoretically fed from the executor back to the optimizer for
later user.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From: "Tom Lane" <tgl@sss.pgh.pa.us>
"Richard Huxton" <dev@archonet.com> writes:
Why doesn't PG (or any other system afaik) just have a first guess, run
the
query and then if the costs are horribly wrong cache the right result.
?? Knowing that your previous guess was wrong doesn't tell you what the
right answer is, especially not for the somewhat-different question that
the next query is likely to provide.
Surely if you used a seqscan on "where x=1" and only got 2 rows rather than
the 3000 you were expecting the only alternative is to try an index?
The real problem here is simply that PG hasn't been keeping adequately
detailed statistics. I'm currently working on improving that for 7.2...
see discussions over in pghackers if you are interested.
Thinking about it (along with Bruce's reply posted to the list) I guess the
difference is whether you gather the statistics up-front during a vacuum, or
build them as queries are used. You're always going to need *something* to
base your first guess on anyway - the "learning" would only help you in
those cases where the distribution of values wasn't a normal curve.
Anyway, given that I'm up to my neck in work at the moment and I don't
actually know what I'm talking about, I'll shut up and get back to keeping
clients happy :-)
- Richard Huxton
"Richard Huxton" <dev@archonet.com> writes:
?? Knowing that your previous guess was wrong doesn't tell you what the
right answer is, especially not for the somewhat-different question that
the next query is likely to provide.
Surely if you used a seqscan on "where x=1" and only got 2 rows rather than
the 3000 you were expecting the only alternative is to try an index?
But if the next query is "where x=2", what do you do? Keep in mind that
the data distributions people have been having trouble with are
irregular: you can't conclude anything very reliable about x=2 based on
what you know about x=1.
Thinking about it (along with Bruce's reply posted to the list) I guess the
difference is whether you gather the statistics up-front during a vacuum, or
build them as queries are used.
Stats gathered as a byproduct of individual queries might be useful if
you happen to get the exact same queries over again, but I doubt that
a succession of such results should be expected to build up a picture
that's complete enough to extrapolate to other queries. Stats gathered
by ANALYZE have the merit that they come from a process that's designed
specifically to give you a good statistical picture.
regards, tom lane