can't win

Started by Jeff Rogersalmost 22 years ago4 messagesgeneral
Jump to latest
#1Jeff Rogers
jrogers@findlaw.com

I have a query that it seems is destined to be slow one way or another. I
have a table of around 30k articles, categorized by topic and ordered by date:

create table articles (
topic varchar(50),
created date,
data text
);

create index articles_topic_idx on articles(topic);
create index articles_created_idx on articles(created);

If I want to get the 5 most recent articles in a topic, I get a nice query
plan and a fast query:
# explain select * from articles where topic = 'Example' order by created desc
limit 5 ;
QUERY PLAN

-------------------------------------------------------------------------------
-----------------------------
Limit (cost=0.00..646.71 rows=5 width=828)
-> Index Scan Backward using articles_created_idx on articles
(cost=0.00..85202.16 rows=659 width=828)
Filter: (topic = 'Example'::character varying)
(3 rows)

# select * from articles where topic = 'Example' order by created desc limit 5
;
[.....]
Time: 18.42 ms

However, if the topic happens to not exist, this query takes a very long time:
# select * from articles where topic = 'NO-Example' order by created desc
limit 5 ;
[.....]
Time: 1075.36 ms

If I drop the date index or get more articles (so it doesn't do the backward
scan on articles_created_idx), then the situation is reversed: getting the
most recent articles for a topic that exists takes a fair amount of time,
while getting a topic that does not exist is nearly instantaneous.

Is there any way I can get the best of both worlds?

-J

#2Dann Corbit
DCorbit@connx.com
In reply to: Jeff Rogers (#1)
Re: can't win

It might be worthwhile to experiment with 2 new indexes:

Create UNIQUE index articles_created_topic_idx on articles(created,
topic);
Create UNIQUE index articles_topic_created_idx on articles(topic,
created);

Probably, one of the two should become your primary key.

That will give the optimizer some new choices for plans.

Show quoted text

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jeff Rogers
Sent: Friday, June 18, 2004 12:28 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] can't win

I have a query that it seems is destined to be slow one way
or another. I
have a table of around 30k articles, categorized by topic and
ordered by date:

create table articles (
topic varchar(50),
created date,
data text
);

create index articles_topic_idx on articles(topic);
create index articles_created_idx on articles(created);

If I want to get the 5 most recent articles in a topic, I get
a nice query
plan and a fast query:
# explain select * from articles where topic = 'Example'
order by created desc
limit 5 ;
QUERY PLAN

--------------------------------------------------------------
-----------------
-----------------------------
Limit (cost=0.00..646.71 rows=5 width=828)
-> Index Scan Backward using articles_created_idx on articles
(cost=0.00..85202.16 rows=659 width=828)
Filter: (topic = 'Example'::character varying)
(3 rows)

# select * from articles where topic = 'Example' order by
created desc limit 5
;
[.....]
Time: 18.42 ms

However, if the topic happens to not exist, this query takes
a very long time: # select * from articles where topic =
'NO-Example' order by created desc
limit 5 ;
[.....]
Time: 1075.36 ms

If I drop the date index or get more articles (so it doesn't
do the backward
scan on articles_created_idx), then the situation is
reversed: getting the
most recent articles for a topic that exists takes a fair
amount of time,
while getting a topic that does not exist is nearly instantaneous.

Is there any way I can get the best of both worlds?

-J

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so
that your
message can get through to the mailing list cleanly

#3Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Jeff Rogers (#1)
Re: can't win

On Fri, 18 Jun 2004, Jeff Rogers wrote:

create table articles (
topic varchar(50),
created date,
data text
);

create index articles_topic_idx on articles(topic);
create index articles_created_idx on articles(created);

If I want to get the 5 most recent articles in a topic, I get a nice query
plan and a fast query:
# explain select * from articles where topic = 'Example' order by created desc
limit 5 ;
QUERY PLAN

-------------------------------------------------------------------------------
-----------------------------
Limit (cost=0.00..646.71 rows=5 width=828)
-> Index Scan Backward using articles_created_idx on articles
(cost=0.00..85202.16 rows=659 width=828)
Filter: (topic = 'Example'::character varying)
(3 rows)

# select * from articles where topic = 'Example' order by created desc limit 5
;
[.....]
Time: 18.42 ms

However, if the topic happens to not exist, this query takes a very long time:
# select * from articles where topic = 'NO-Example' order by created desc
limit 5 ;
[.....]
Time: 1075.36 ms

If I drop the date index or get more articles (so it doesn't do the backward
scan on articles_created_idx), then the situation is reversed: getting the
most recent articles for a topic that exists takes a fair amount of time,
while getting a topic that does not exist is nearly instantaneous.

Is there any way I can get the best of both worlds?

Hmm, how about a multiple column index on (topic, created)? You might
need to use
select * from articles where topic='Example' order by topic desc, created
desc limit 5;
to get it to use the index though.

#4Jeff Rogers
jrogers@findlaw.com
In reply to: Stephan Szabo (#3)
Re: can't win

Is there any way I can get the best of both worlds?

Hmm, how about a multiple column index on (topic, created)? You might
need to use
select * from articles where topic='Example' order by topic desc, created
desc limit 5;
to get it to use the index though.

That works beautifully, thanks!

-J