what's going on here?

Started by Benabout 25 years ago11 messagesgeneral
Jump to latest
#1Ben
bench@silentmedia.com

We have two postgres-driven web pages, and we've noticed in the last few
days that one is loading about 10 times faster than the other. Looking
into it, we found out that postgres has very different execution paths for
what is essentially the same query, and this makes no sense to me.

Sorry for the deludge of info here, but maybe it'll explain why this is
happening....

First, here are the explains. Note the only parameter that changes is
playlist.stream, and yet for some reason the second query makes use of
indices which are unrelated to playlist.stream.

music=# explain select users.name as username, playlist.id as id,
paths.len as len, paths.path as path, timestamp(playlist.added) as added,
timestamp(playlist.played) as played from users, playlist, paths where
paths.md5=files.md5 and files.song=playlist.songid and
playlist.userid=users.id and playlist.played is null and playlist.stream=1
order by playlist.added, playlist.id;
NOTICE: QUERY PLAN:

Sort (cost=10845.28..10845.28 rows=1857 width=80)
-> Hash Join (cost=10007.93..10744.44 rows=1857 width=80)
-> Merge Join (cost=10006.44..10668.17 rows=1857 width=64)
-> Sort (cost=3754.03..3754.03 rows=2321 width=36)
-> Hash Join (cost=306.62..3624.28 rows=2321 width=36)
-> Seq Scan on files (cost=0.00..1072.13 rows=55413 width=16)
-> Hash (cost=300.81..300.81 rows=2321 width=20)
-> Seq Scan on playlist (cost=0.00..300.81 rows=2321 width=20)
-> Sort (cost=6252.41..6252.41 rows=50617 width=28)
-> Seq Scan on paths (cost=0.00..1677.17 rows=50617 width=28)
-> Hash (cost=1.39..1.39 rows=39 width=16)
-> Seq Scan on users (cost=0.00..1.39 rows=39 width=16)

EXPLAIN
music=# explain select users.name as username, playlist.id as id,
paths.len as len, paths.path as path, timestamp(playlist.added) as added,
timestamp(playlist.played) as played from users, playlist, paths where
paths.md5=files.md5 and files.song=playlist.songid and
playlist.userid=users.id and playlist.played is null and playlist.stream=2
order by playlist.added, playlist.id;
NOTICE: QUERY PLAN:

Sort (cost=1467.44..1467.44 rows=164 width=80)
-> Hash Join (cost=1.49..1461.40 rows=164 width=80)
-> Nested Loop (cost=0.00..1452.85 rows=164 width=64)
-> Nested Loop (cost=0.00..771.76 rows=205 width=36)
-> Seq Scan on playlist (cost=0.00..300.81 rows=205 width=20)
-> Index Scan using files_song_btree_key on files (cost=0.00..2.28 rows=1 width=16)
-> Index Scan using paths_md5_btree_key on paths (cost=0.00..3.31 rows=1 width=28)
-> Hash (cost=1.39..1.39 rows=39 width=16)
-> Seq Scan on users (cost=0.00..1.39 rows=39 width=16)

EXPLAIN

Okay, so now the relevant schema:

Table "users"
Attribute | Type | Modifier
-------------+-----------+------------------------------------------------
id | integer | not null default nextval('users_id_seq'::text)
name | text | not null
Indices: users_name_key,
users_pkey

Index "users_name_key"
Attribute | Type
-----------+------
name | text
unique btree

Index "users_pkey"
Attribute | Type
-----------+---------
id | integer
unique btree (primary key)

Table "playlist"
Attribute | Type | Modifier
-----------+---------+---------------------------------------------------
id | integer | not null default nextval('playlist_id_seq'::text)
songid | integer |
userid | integer |
added | integer |
played | integer |
stream | integer |
Indices: playlist_added_key,
playlist_pkey,
playlist_played_key,
playlist_songid_key

Index "playlist_added_key"
Attribute | Type
-----------+---------
added | integer
btree

Index "playlist_pkey"
Attribute | Type
-----------+---------
id | integer
unique btree (primary key)

Index "playlist_played_key"
Attribute | Type
-----------+---------
played | integer
btree

Index "playlist_songid_key"
Attribute | Type
-----------+---------
songid | integer
btree

Table "paths"
Attribute | Type | Modifier
-----------+-----------+----------
md5 | char(32) | not null
path | text | not null
len | integer |
Indices: paths_md5_btree_key,
paths_md5_key,
paths_path_key

Index "paths_md5_btree_key"
Attribute | Type
-----------+----------
md5 | char(32)
btree

Index "paths_md5_key"
Attribute | Type
-----------+----------
md5 | char(32)
hash

Index "paths_path_key"
Attribute | Type
-----------+------
path | text
btree

Table "files"
Attribute | Type | Modifier
-----------+----------+----------
song | integer | not null
md5 | char(32) | not null
Indices: files_md5_btree_key,
files_md5_key,
files_song_btree_key,
files_song_key

Index "files_md5_btree_key"
Attribute | Type
-----------+----------
md5 | char(32)
btree

Index "files_md5_key"
Attribute | Type
-----------+----------
md5 | char(32)
hash

Index "files_song_btree_key"
Attribute | Type
-----------+---------
song | integer
btree

Index "files_song_key"
Attribute | Type
-----------+---------
song | integer
hash

Interestingly, the sequential scan on playlist claims to be returning 2000
results for stream=1 and only 200 for stream=2. I'm not sure which part of
the where clause this guess comes from, because the playlist table has
equal numbers of entries for both streams.

#2Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Ben (#1)
Re: what's going on here?

Has vacuum analyze been run on both recently? What is the maximum number
of rows with a particular stream value, and how many does each of 1 and 2
actually have?

Show quoted text

Interestingly, the sequential scan on playlist claims to be returning 2000
results for stream=1 and only 200 for stream=2. I'm not sure which part of
the where clause this guess comes from, because the playlist table has
equal numbers of entries for both streams.

#3Ben
bench@silentmedia.com
In reply to: Stephan Szabo (#2)
Re: what's going on here?

Every night. There are 6223 rows with stream=2 and 7041 rows with
stream=1. At any given time, there will be between 1 to 30 rows with
played=null for both values.

On Fri, 9 Mar 2001, Stephan Szabo wrote:

Show quoted text

Has vacuum analyze been run on both recently? What is the maximum number
of rows with a particular stream value, and how many does each of 1 and 2
actually have?

Interestingly, the sequential scan on playlist claims to be returning 2000
results for stream=1 and only 200 for stream=2. I'm not sure which part of
the where clause this guess comes from, because the playlist table has
equal numbers of entries for both streams.

#4Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Ben (#3)
Re: what's going on here?

Hmm, I also notice that it's getting very different numbers for
rows from files as well.

On Fri, 9 Mar 2001, Ben wrote:

Show quoted text

Every night. There are 6223 rows with stream=2 and 7041 rows with
stream=1. At any given time, there will be between 1 to 30 rows with
played=null for both values.

On Fri, 9 Mar 2001, Stephan Szabo wrote:

Has vacuum analyze been run on both recently? What is the maximum number
of rows with a particular stream value, and how many does each of 1 and 2
actually have?

Interestingly, the sequential scan on playlist claims to be returning 2000
results for stream=1 and only 200 for stream=2. I'm not sure which part of
the where clause this guess comes from, because the playlist table has
equal numbers of entries for both streams.

#5Ben
bench@silentmedia.com
In reply to: Stephan Szabo (#4)
Re: what's going on here?

I think that's because the stream=1 query is doing a sequential scan on
files, so it has to look at everything, whereas the index scan in the
stream=2 query only looks at the important rows. (Which in this case was
just 1 row.)

I think this question is: why is one using an index scan and the other
using a sequential scan?

On Fri, 9 Mar 2001, Stephan Szabo wrote:

Show quoted text

Hmm, I also notice that it's getting very different numbers for
rows from files as well.

On Fri, 9 Mar 2001, Ben wrote:

Every night. There are 6223 rows with stream=2 and 7041 rows with
stream=1. At any given time, there will be between 1 to 30 rows with
played=null for both values.

On Fri, 9 Mar 2001, Stephan Szabo wrote:

Has vacuum analyze been run on both recently? What is the maximum number
of rows with a particular stream value, and how many does each of 1 and 2
actually have?

Interestingly, the sequential scan on playlist claims to be returning 2000
results for stream=1 and only 200 for stream=2. I'm not sure which part of
the where clause this guess comes from, because the playlist table has
equal numbers of entries for both streams.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ben (#1)
Re: what's going on here?

Ben <bench@silentmedia.com> writes:

music=# explain ... where ... playlist.stream=1
-> Seq Scan on playlist (cost=0.00..300.81 rows=2321 width=20)

music=# explain ... where ... playlist.stream=2
-> Seq Scan on playlist (cost=0.00..300.81 rows=205 width=20)

I am betting that 1 is the most common value in playlist.stream, or at
least is being chosen as the most common value by VACUUM ANALYZE's
not-totally-accurate estimation process. The 2321 rowcount estimate
then falls out of the stored statistic for the most common value's
frequency. In the second case, the estimator knows that 2 is *not* the
most common value, but it has absolutely no statistical basis on which
to guess what the frequency really is. I think it uses 1/10th of the
most common frequency for anything that's not the most common value
(look in utils/adt/selfuncs.c to be sure). There's probably also some
contribution from the "playlist.played is null" clause, else the row
count estimate would be exactly 1/10th as much. However, I don't
believe that the thing currently makes any serious effort to gauge the
selectivity of IS NULL, which is a shame because that would critically
affect the results here. (You did say some thousands of rows matching
the stream=N clause, but only a few matching IS NULL, right?)

Given the fundamental difference in this initial row count estimate,
the large difference in the subsequent join plan structure is not too
surprising.

In short: another demonstration of the limitations of our current
statistics about data frequencies.

BTW, you didn't actually say which plan was faster. Since the second
one was closer to the true statistic (only a few rows returned from
playlist), I'm hoping it was faster...

regards, tom lane

#7Ben
bench@silentmedia.com
In reply to: Tom Lane (#6)
Re: what's going on here?

On Fri, 9 Mar 2001, Tom Lane wrote:

Ben <bench@silentmedia.com> writes:

music=# explain ... where ... playlist.stream=1
-> Seq Scan on playlist (cost=0.00..300.81 rows=2321 width=20)

music=# explain ... where ... playlist.stream=2
-> Seq Scan on playlist (cost=0.00..300.81 rows=205 width=20)

I am betting that 1 is the most common value in playlist.stream, or at
least is being chosen as the most common value by VACUUM ANALYZE's
not-totally-accurate estimation process. The 2321 rowcount estimate
then falls out of the stored statistic for the most common value's
frequency. In the second case, the estimator knows that 2 is *not* the

Yes, stream=1 is the most common value, but 2321? There are ~7000 rows
where stream=1.

most common value, but it has absolutely no statistical basis on which
to guess what the frequency really is. I think it uses 1/10th of the
most common frequency for anything that's not the most common value
(look in utils/adt/selfuncs.c to be sure). There's probably also some
contribution from the "playlist.played is null" clause, else the row
count estimate would be exactly 1/10th as much. However, I don't
believe that the thing currently makes any serious effort to gauge the
selectivity of IS NULL, which is a shame because that would critically
affect the results here. (You did say some thousands of rows matching
the stream=N clause, but only a few matching IS NULL, right?)

This is not good news. Most of the reason I have an index on
playlist.played is to figure out which are null, to limit the results this
query has to look at. Is this (probably very hard to fix) shortcoming
addressed in version 7.1, by chance?

Given the fundamental difference in this initial row count estimate,
the large difference in the subsequent join plan structure is not too
surprising.

This I don't follow. The stream=1 query is returning 10 times the rows -
shouldn't that make the use of an index even *more* likely?

BTW, you didn't actually say which plan was faster. Since the second
one was closer to the true statistic (only a few rows returned from
playlist), I'm hoping it was faster...

Sorry. Yes, the second one was far faster, thanks to the indices it used.

#8Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Ben (#7)
Re: what's going on here?

This I don't follow. The stream=1 query is returning 10 times the rows -
shouldn't that make the use of an index even *more* likely?

No, because if you're getting a large % of the rows of the table it'll be
faster to sequentially scan the heap rather than do so in random access by
the index order. (You have to go into the heap file for the info on
whether or not the tuple is valid, and you'd potentially be seeking around
the heap file which can end up being more expensive).

As a question, does an index on stream, played (both in the same
index) help at all?

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ben (#7)
Re: what's going on here?

Ben <bench@silentmedia.com> writes:

I am betting that 1 is the most common value in playlist.stream, or at
least is being chosen as the most common value by VACUUM ANALYZE's
not-totally-accurate estimation process. The 2321 rowcount estimate
then falls out of the stored statistic for the most common value's
frequency. In the second case, the estimator knows that 2 is *not* the

Yes, stream=1 is the most common value, but 2321? There are ~7000 rows
where stream=1.

Well, I did say not totally accurate ;-). But [digs in code] there is
a hardwired selectivity of 0.333 for the IS NULL clause, so it's really
estimating about 6960 rows matching the stream=1 clause. That's closer
than I'd have expected, actually, given the inaccuracies in VACUUM's
method of obtaining this estimate. So the real problem here is the lack
of estimation for IS NULL.

This is not good news. Most of the reason I have an index on
playlist.played is to figure out which are null, to limit the results this
query has to look at.

We don't currently use indexes for IS NULL at all.

Is this (probably very hard to fix) shortcoming
addressed in version 7.1, by chance?

No. The fact that the planner doesn't derive an estimate could be fixed
with not too much work (but it won't happen for 7.1). Making IS NULL an
indexable condition would affect considerably more code, and it's not
really what's making the difference here anyway --- although it would
allow us to replace the seqscan on playlist with an indexscan, which'd
be even faster.

This I don't follow. The stream=1 query is returning 10 times the rows -
shouldn't that make the use of an index even *more* likely?

No. Your second query is using nested loop with inner indexscan joins.
That's fine for *small* numbers of rows returned by the outer subquery,
but it goes to hell in a handbasket for large numbers of rows. The
planner is doing the right thing to switch to a heavier-duty plan with
more startup overhead ... or it would be if it had the right idea about
how many rows are going to come out of playlist, that is.

The real issue here is the lack of estimation of IS NULL selectivity.
I was aware that that was broken, but I don't recall having seen a
real-life example before where it really mattered. I'll try to move
it up a notch or two on my to-do list. Or if someone else wants to
dive in, feel free...

regards, tom lane

#10Ben
bench@silentmedia.com
In reply to: Stephan Szabo (#8)
Re: what's going on here?

Unfortunately, no.

On Fri, 9 Mar 2001, Stephan Szabo wrote:

Show quoted text

As a question, does an index on stream, played (both in the same
index) help at all?

#11Ben
bench@silentmedia.com
In reply to: Tom Lane (#9)
Re: what's going on here?

On Fri, 9 Mar 2001, Tom Lane wrote:

No. The fact that the planner doesn't derive an estimate could be fixed
with not too much work (but it won't happen for 7.1). Making IS NULL an
indexable condition would affect considerably more code, and it's not
really what's making the difference here anyway --- although it would
allow us to replace the seqscan on playlist with an indexscan, which'd
be even faster.

It would be nice if postgres could index on IS NULL, or if it was
documented that indexes don't get used here. Otherwise I would have used
played == 0 to mean the same thing that I'm trying to get out of played IS
NULL..... doing that lets explain give a very accurate picture.

No. Your second query is using nested loop with inner indexscan joins.
That's fine for *small* numbers of rows returned by the outer subquery,
but it goes to hell in a handbasket for large numbers of rows. The
planner is doing the right thing to switch to a heavier-duty plan with
more startup overhead ... or it would be if it had the right idea about
how many rows are going to come out of playlist, that is.

Ahhh, so desu. Thanks for the explinations.

The real issue here is the lack of estimation of IS NULL selectivity.
I was aware that that was broken, but I don't recall having seen a
real-life example before where it really mattered. I'll try to move
it up a notch or two on my to-do list. Or if someone else wants to
dive in, feel free...

Yeah, like I said above, being about to use an accurate index for
estimating the size of the inner loop makes quite a difference. But at
least in this case, I had a value I could use in place of NULL, so it's
hardly a high priority.

Thanks again.