CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play'

Started by Alexander Farberover 6 years ago2 messagesgeneral
Jump to latest
#1Alexander Farber
alexander.farber@gmail.com

Good evening,

in PostgreSQL 11 I have a table holding player moves (could be: 'play',
'swap', 'skip', ...) in a word game:

# \d words_moves;
Table "public.words_moves"
Column | Type | Collation | Nullable |
Default
---------+--------------------------+-----------+----------+------------------------------------------
mid | bigint | | not null |
nextval('words_moves_mid_seq'::regclass)
action | text | | not null |
gid | integer | | not null |
uid | integer | | not null |
played | timestamp with time zone | | not null |
tiles | jsonb | | |
score | integer | | |
letters | text | | |
hand | text | | |
puzzle | boolean | | not null | false
Indexes:
"words_moves_pkey" PRIMARY KEY, btree (mid)
"words_moves_gid_played_idx" btree (gid, played DESC)
"words_moves_uid_action_played_idx" btree (uid, action, played)
"words_moves_uid_idx" btree (uid)
Check constraints:
"words_moves_score_check" CHECK (score >= 0)
Foreign-key constraints:
"words_moves_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON
DELETE CASCADE
"words_moves_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON
DELETE CASCADE
Referenced by:
TABLE "words_scores" CONSTRAINT "words_scores_mid_fkey" FOREIGN KEY
(mid) REFERENCES words_moves(mid) ON DELETE CASCADE

When I search for "interesting moves" with score higher than 90 or all 7
tiles played, then the query takes a bit longer:

EXPLAIN ANALYZE
SELECT
TO_CHAR(played, 'Mon YYYY'),
COUNT(NULLIF(puzzle, FALSE)) OVER (PARTITION BY TO_CHAR(played, 'Mon
YYYY')),
puzzle,
mid,
MD5(mid || 'cookie'),
gid,
score
FROM words_moves
WHERE action = 'play'
AND LENGTH(hand) = 7
AND (LENGTH(letters) = 7 OR score > 90)
AND played > CURRENT_TIMESTAMP - interval '1 year'
AND played < CURRENT_TIMESTAMP - interval '3 day'
ORDER BY played DESC;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=168533.19..168533.30 rows=44 width=97) (actual
time=2126.433..2126.541 rows=1036 loops=1)
Sort Key: played DESC
Sort Method: quicksort Memory: 194kB
-> WindowAgg (cost=168530.56..168531.99 rows=44 width=97) (actual
time=2122.991..2125.593 rows=1036 loops=1)
-> Sort (cost=168530.56..168530.67 rows=44 width=57) (actual
time=2122.934..2123.049 rows=1036 loops=1)
Sort Key: (to_char(played, 'Mon YYYY'::text))
Sort Method: quicksort Memory: 129kB
-> Gather (cost=1000.00..168529.36 rows=44 width=57)
(actual time=287.461..2121.445 rows=1036 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on words_moves
(cost=0.00..167524.96 rows=18 width=57) (actual time=207.234..2115.686
rows=345 loops=3)
Filter: ((action = 'play'::text) AND
(length(hand) = 7) AND ((length(letters) = 7) OR (score > 90)) AND (played

(CURRENT_TIMESTAMP - '1 year'::interval)) AND (played <

(CURRENT_TIMESTAMP - '3 days'::interval)))
Rows Removed by Filter: 1088073
Planning Time: 0.383 ms
Execution Time: 2126.922 ms
(15 rows)

Here the link: https://explain.depesz.com/s/s3HF

So I add an index with -

CREATE INDEX ON words_moves(length(letters), score, action);

That is because I have several such queries looking for interesting 'play'
moves with high score or all 7 tiles played.

This helps a bit:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=126487.34..126487.45 rows=44 width=97) (actual
time=219.649..219.744 rows=1036 loops=1)
Sort Key: played DESC
Sort Method: quicksort Memory: 194kB
-> WindowAgg (cost=126484.71..126486.14 rows=44 width=97) (actual
time=216.180..218.680 rows=1036 loops=1)
-> Sort (cost=126484.71..126484.82 rows=44 width=57) (actual
time=216.125..216.226 rows=1036 loops=1)
Sort Key: (to_char(played, 'Mon YYYY'::text))
Sort Method: quicksort Memory: 129kB
-> Bitmap Heap Scan on words_moves
(cost=83892.69..126483.50 rows=44 width=57) (actual time=209.447..214.879
rows=1036 loops=1)
Recheck Cond: (((length(letters) = 7) AND (action =
'play'::text)) OR ((score > 90) AND (action = 'play'::text)))
Filter: ((length(hand) = 7) AND (played >
(CURRENT_TIMESTAMP - '1 year'::interval)) AND (played < (CURRENT_TIMESTAMP
- '3 days'::interval)))
Rows Removed by Filter: 468
Heap Blocks: exact=1495
-> BitmapOr (cost=83892.69..83892.69 rows=15280
width=0) (actual time=209.083..209.083 rows=0 loops=1)
-> Bitmap Index Scan on
words_moves_length_score_action_idx (cost=0.00..419.69 rows=14838 width=0)
(actual time=2.040..2.040 rows=912 loops=1)
Index Cond: ((length(letters) = 7) AND
(action = 'play'::text))
-> Bitmap Index Scan on
words_moves_length_score_action_idx (cost=0.00..83472.98 rows=442 width=0)
(actual time=207.038..207.038 rows=608 loops=1)
Index Cond: ((score > 90) AND (action =
'play'::text))
Planning Time: 1.102 ms
Execution Time: 220.676 ms
(19 rows)

Here the resulting link: https://explain.depesz.com/s/Pwbt

Then I drop that index and create another one:

CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play';

And it seems to have a better performance on the query:

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=97687.61..97687.72 rows=44 width=97) (actual
time=175.832..175.927 rows=1036 loops=1)
Sort Key: played DESC
Sort Method: quicksort Memory: 194kB
-> WindowAgg (cost=97684.98..97686.41 rows=44 width=97) (actual
time=172.443..174.937 rows=1036 loops=1)
-> Sort (cost=97684.98..97685.09 rows=44 width=57) (actual
time=172.390..172.490 rows=1036 loops=1)
Sort Key: (to_char(played, 'Mon YYYY'::text))
Sort Method: quicksort Memory: 129kB
-> Bitmap Heap Scan on words_moves
(cost=55092.96..97683.78 rows=44 width=57) (actual time=165.420..171.164
rows=1036 loops=1)
Recheck Cond: (((length(letters) = 7) AND (action =
'play'::text)) OR ((score > 90) AND (action = 'play'::text)))
Filter: ((length(hand) = 7) AND (played >
(CURRENT_TIMESTAMP - '1 year'::interval)) AND (played < (CURRENT_TIMESTAMP
- '3 days'::interval)))
Rows Removed by Filter: 468
Heap Blocks: exact=1495
-> BitmapOr (cost=55092.96..55092.96 rows=15280
width=0) (actual time=165.036..165.036 rows=0 loops=1)
-> Bitmap Index Scan on
words_moves_length_score_idx (cost=0.00..275.71 rows=14838 width=0)
(actual time=0.620..0.620 rows=912 loops=1)
Index Cond: (length(letters) = 7)
-> Bitmap Index Scan on
words_moves_length_score_idx (cost=0.00..54817.23 rows=442 width=0)
(actual time=164.413..164.413 rows=608 loops=1)
Index Cond: (score > 90)
Planning Time: 0.948 ms
Execution Time: 177.604 ms
(19 rows)

Here the resulting link: https://explain.depesz.com/s/pmCw

I still wonder, what am I missing, what could be improved there?

Does anybody please have a hint?

Thanks
Alex

#2Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Alexander Farber (#1)
Re: CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play'

On 2019-12-07 16:20:44 +0100, Alexander Farber wrote:

in PostgreSQL 11 I have a table holding player moves (could be: 'play', 'swap',
'skip', ...) in a word game:

# \d words_moves;
                                      Table "public.words_moves"
 Column  |           Type           | Collation | Nullable |                
Default                  
---------+--------------------------+-----------+----------+------------------------------------------
 mid     | bigint                   |           | not null | nextval
('words_moves_mid_seq'::regclass)
 action  | text                     |           | not null |
 gid     | integer                  |           | not null |
 uid     | integer                  |           | not null |
 played  | timestamp with time zone |           | not null |
 tiles   | jsonb                    |           |          |
 score   | integer                  |           |          |
 letters | text                     |           |          |
 hand    | text                     |           |          |
 puzzle  | boolean                  |           | not null | false

[...]

When I search for "interesting moves" with score higher than 90 or all 7 tiles
played, then the query takes a bit longer:

EXPLAIN ANALYZE 
SELECT 

[...]

FROM words_moves 
 WHERE action = 'play'  
AND LENGTH(hand) = 7 
AND (LENGTH(letters) = 7 OR score > 90) 
AND played > CURRENT_TIMESTAMP - interval '1 year' 
AND played < CURRENT_TIMESTAMP - interval '3 day'  
 ORDER BY played DESC;
                                                                               

[...]

Then I drop that index and create another one:

CREATE INDEX ON words_moves(length(letters), score) WHERE action = 'play';

And it seems to have a better performance on the query:

                                                                             
QUERY PLAN                                                                    
           
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=97687.61..97687.72 rows=44 width=97) (actual time=175.832..175.927
rows=1036 loops=1)
   Sort Key: played DESC
   Sort Method: quicksort  Memory: 194kB
   ->  WindowAgg  (cost=97684.98..97686.41 rows=44 width=97) (actual time= > 172.443..174.937 rows=1036 loops=1)
         ->  Sort  (cost=97684.98..97685.09 rows=44 width=57) (actual time= > 172.390..172.490 rows=1036 loops=1)
               Sort Key: (to_char(played, 'Mon YYYY'::text))
               Sort Method: quicksort  Memory: 129kB
               ->  Bitmap Heap Scan on words_moves  (cost=55092.96..97683.78 > rows=44 width=57) (actual time=165.420..171.164 rows=1036 loops=1)
                     Recheck Cond: (((length(letters) = 7) AND (action = > 'play'::text)) OR ((score > 90) AND (action = 'play'::text)))
                     Filter: ((length(hand) = 7) AND (played > > (CURRENT_TIMESTAMP - '1 year'::interval)) AND (played < (CURRENT_TIMESTAMP - '3 > days'::interval)))
                     Rows Removed by Filter: 468
                     Heap Blocks: exact=1495
                     ->  BitmapOr  (cost=55092.96..55092.96 rows=15280 width=0) > (actual time=165.036..165.036 rows=0 loops=1)
                           ->  Bitmap Index Scan on > words_moves_length_score_idx  (cost=0.00..275.71 rows=14838 width=0) (actual > time=0.620..0.620 rows=912 loops=1)
                                 Index Cond: (length(letters) = 7)
                           ->  Bitmap Index Scan on > words_moves_length_score_idx  (cost=0.00..54817.23 rows=442 width=0) (actual > time=164.413..164.413 rows=608 loops=1)
                                 Index Cond: (score > 90)
 Planning Time: 0.948 ms
 Execution Time: 177.604 ms
(19 rows)

Here the resulting link: https://explain.depesz.com/s/pmCw

I still wonder, what am I missing, what could be improved there?

Several ideas:

1. You are only interested in moves from the last year or so. How much
history do have stored in your table? If it's much more than one
year, then an index on played will help. However, it looks like that
filter removes at most 468 of ~ 1500 rows, so that doesn't seem to
be case (yet).

2. You spend now most of the time scanning words_moves_length_score_idx
for scores > 90. This is basically a full index scan since that
index is ordered by length(letters). I'm surprised that PostgreSQL
can even do that :-). This is a separate index scan than the one for
length(letters) = 7, so separating the indexes should be no worse
and probably a lot better. Create an index on score (possibly
conditional on action = 'play').

3. A lot of the conditions is fixed. So you might want to move them into
the condition of a partial index:

create index on words_moves(played)
where action = 'play' and LENGTH(hand) = 7 and (LENGTH(letters) = 7 OR score > 90);

Then the planner should recognize that it can use this index and the
index contains only interesting moves - no logical operations needed
at query time at all, only an index scan to find recent moves.

Be warned though that creating such ultra-specific indexes comes at
a cost: You may end up with a lot of them if you have many different
queries and maintaining them may make inserts noticably slower.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"