Not sure if I should CREATE INDEX for text columns on which I plan to filter later

Started by Alexander Farberabout 8 years ago13 messagesgeneral
Jump to latest
#1Alexander Farber
alexander.farber@gmail.com

Hello,

in a 2 player game I store all games in the following PostgreSQL 10.2 table:

CREATE TABLE words_games (
gid SERIAL PRIMARY KEY,

created timestamptz NOT NULL,
finished timestamptz,

player1 integer REFERENCES words_users(uid) ON DELETE CASCADE NOT
NULL CHECK (player1 <> player2),
player2 integer REFERENCES words_users(uid) ON DELETE CASCADE,

played1 timestamptz,
played2 timestamptz,

state1 text, -- tie, winning, losing, draw, won, lost
state2 text, -- tie, winning, losing, draw, won, lost
reason text, -- regular, resigned, expired, banned

score1 integer NOT NULL CHECK (score1 >= 0),
score2 integer NOT NULL CHECK (score2 >= 0),

hand1 char[7] NOT NULL,
hand2 char[7] NOT NULL,
pile char[116] NOT NULL,

letters char[15][15] NOT NULL,
values integer[15][15] NOT NULL,
bid integer NOT NULL REFERENCES words_boards ON DELETE CASCADE
);

In the 3 text columns state1, state2 and reason I store all possible
game/player states.

For example: player1 has failed to make her move in time, so that would
result in:
state1 = 'lost',
state2 = 'won',
reason = 'expired',

On an advice I've got from this mailing list I am explicitly not using
enums (in case I need to add unforseen states).

The purpose of these 3 text columns is for me to display player stats
later, by quering the columns.

As you can imagine, mostly I perform SELECT on the words_games table - to
send update to the game clients (PC and mobile).

And in more seldom cases I update these 3 text columns - when a move is
performed or a game gets finished or expires.

My question please:

Should I add the 3 indices as in:

CREATE INDEX words_games_state1_index on words_games(state1);
CREATE INDEX words_games_state2_index on words_games(state2);
CREATE INDEX words_games_reason_index on words_games(reason);

I am asking, because as an unexperienced database user I fail to see any
difference when I run EXPLAIN:

words=> select gid, state1, state2 from words_games where state1='won' or
state2='won';
gid | state1 | state2
-----+--------+--------
146 | lost | won
144 | lost | won
145 | lost | won
150 | won | lost
..........
256 | won | lost
255 | won | lost
35 | lost | won
(100 rows)

words=> explain select gid, state1, state2 from words_games where
state1='won' or state2='won';
QUERY PLAN
---------------------------------------------------------------
Seq Scan on words_games (cost=0.00..109.85 rows=96 width=12)
Filter: ((state1 = 'won'::text) OR (state2 = 'won'::text))
(2 rows)

words=> CREATE INDEX words_games_state1_index on words_games(state1);
CREATE INDEX

words=> CREATE INDEX words_games_state2_index on words_games(state2);
CREATE INDEX

words=> explain select gid, state1, state2 from words_games where
state1='won' or state2='won';
QUERY PLAN
---------------------------------------------------------------
Seq Scan on words_games (cost=0.00..109.85 rows=96 width=12)
Filter: ((state1 = 'won'::text) OR (state2 = 'won'::text))
(2 rows)

Thank you for any insights
Alex

#2Martin Moore
martin.moore@avbrief.com
In reply to: Alexander Farber (#1)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

I’m no expert but I’d think it unlikely an index would be considered for a table with only 100 rows in. Also I’m pretty sure only one index per table is used, so you’d want to put state1 and state2 in one index.

You may wish to consider normalising too – so any field with a 1 or 2 at the end is moved to a separate table linked by gid. This would also help the indexing.

Martin.

From: Alexander Farber <alexander.farber@gmail.com>
Date: Wednesday, 21 February 2018 at 12:16
To: pgsql-general <pgsql-general@postgresql.org>
Subject: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

Hello,

in a 2 player game I store all games in the following PostgreSQL 10.2 table:

CREATE TABLE words_games (
gid SERIAL PRIMARY KEY,

created timestamptz NOT NULL,
finished timestamptz,

player1 integer REFERENCES words_users(uid) ON DELETE CASCADE NOT NULL CHECK (player1 <> player2),
player2 integer REFERENCES words_users(uid) ON DELETE CASCADE,

played1 timestamptz,
played2 timestamptz,

state1 text, -- tie, winning, losing, draw, won, lost
state2 text, -- tie, winning, losing, draw, won, lost
reason text, -- regular, resigned, expired, banned

score1 integer NOT NULL CHECK (score1 >= 0),
score2 integer NOT NULL CHECK (score2 >= 0),

hand1 char[7] NOT NULL,
hand2 char[7] NOT NULL,
pile char[116] NOT NULL,

letters char[15][15] NOT NULL,
values integer[15][15] NOT NULL,
bid integer NOT NULL REFERENCES words_boards ON DELETE CASCADE
);

In the 3 text columns state1, state2 and reason I store all possible game/player states.

For example: player1 has failed to make her move in time, so that would result in:
state1 = 'lost',
state2 = 'won',
reason = 'expired',

On an advice I've got from this mailing list I am explicitly not using enums (in case I need to add unforseen states).

The purpose of these 3 text columns is for me to display player stats later, by quering the columns.

As you can imagine, mostly I perform SELECT on the words_games table - to send update to the game clients (PC and mobile).

And in more seldom cases I update these 3 text columns - when a move is performed or a game gets finished or expires.

My question please:

Should I add the 3 indices as in:

CREATE INDEX words_games_state1_index on words_games(state1);
CREATE INDEX words_games_state2_index on words_games(state2);
CREATE INDEX words_games_reason_index on words_games(reason);

I am asking, because as an unexperienced database user I fail to see any difference when I run EXPLAIN:

words=> select gid, state1, state2 from words_games where state1='won' or state2='won';
gid | state1 | state2
-----+--------+--------
146 | lost | won
144 | lost | won
145 | lost | won
150 | won | lost
..........
256 | won | lost
255 | won | lost
35 | lost | won
(100 rows)

words=> explain select gid, state1, state2 from words_games where state1='won' or state2='won';
QUERY PLAN
---------------------------------------------------------------
Seq Scan on words_games (cost=0.00..109.85 rows=96 width=12)
Filter: ((state1 = 'won'::text) OR (state2 = 'won'::text))
(2 rows)

words=> CREATE INDEX words_games_state1_index on words_games(state1);
CREATE INDEX

words=> CREATE INDEX words_games_state2_index on words_games(state2);
CREATE INDEX

words=> explain select gid, state1, state2 from words_games where state1='won' or state2='won';
QUERY PLAN
---------------------------------------------------------------
Seq Scan on words_games (cost=0.00..109.85 rows=96 width=12)
Filter: ((state1 = 'won'::text) OR (state2 = 'won'::text))
(2 rows)

Thank you for any insights
Alex

#3Alexander Farber
alexander.farber@gmail.com
In reply to: Martin Moore (#2)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

Hi Martin -

On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore <martin.moore@avbrief.com>
wrote:

I’m no expert but I’d think it unlikely an index would be considered for a
table with only 100 rows in. Also I’m pretty sure only one index per table
is used, so you’d want to put state1 and state2 in one index.

I hope to have more records in the words_games table later when my game is
launched (currently in soft launch/beta).....

Regards
Alex

#4Alexander Farber
alexander.farber@gmail.com
In reply to: Alexander Farber (#3)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

Here is the current DESC of the table (I already use few joins....):

words=> \d words_games
Table "public.words_games"
Column | Type | Collation | Nullable
| Default
----------+--------------------------+-----------+----------+------------------------------------------
gid | integer | | not null |
nextval('words_games_gid_seq'::regclass)
created | timestamp with time zone | | not null |
finished | timestamp with time zone | | |
player1 | integer | | not null |
player2 | integer | | |
played1 | timestamp with time zone | | |
played2 | timestamp with time zone | | |
state1 | text | | |
state2 | text | | |
score1 | integer | | not null |
score2 | integer | | not null |
hand1 | character(1)[] | | not null |
hand2 | character(1)[] | | not null |
pile | character(1)[] | | not null |
letters | character(1)[] | | not null |
values | integer[] | | not null |
bid | integer | | not null |
reason | text | | |
Indexes:
"words_games_pkey" PRIMARY KEY, btree (gid)
"words_games_reason_index" btree (reason)
"words_games_state1_index" btree (state1)
"words_games_state2_index" btree (state2)
Check constraints:
"words_games_check" CHECK (player1 <> player2)
"words_games_score1_check" CHECK (score1 >= 0)
"words_games_score2_check" CHECK (score2 >= 0)
Foreign-key constraints:
"words_games_bid_fkey" FOREIGN KEY (bid) REFERENCES words_boards(bid)
ON DELETE CASCADE
"words_games_player1_fkey" FOREIGN KEY (player1) REFERENCES
words_users(uid) ON DELETE CASCADE
"words_games_player2_fkey" FOREIGN KEY (player2) REFERENCES
words_users(uid) ON DELETE CASCADE
Referenced by:
TABLE "words_chat" CONSTRAINT "words_chat_gid_fkey" FOREIGN KEY (gid)
REFERENCES words_games(gid) ON DELETE CASCADE
TABLE "words_moves" CONSTRAINT "words_moves_gid_fkey" FOREIGN KEY (gid)
REFERENCES words_games(gid) ON DELETE CASCADE
TABLE "words_scores" CONSTRAINT "words_scores_gid_fkey" FOREIGN KEY
(gid) REFERENCES words_games(gid) ON DELETE CASCADE

#5Bill Moran
wmoran@potentialtech.com
In reply to: Alexander Farber (#3)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

On Wed, 21 Feb 2018 13:33:18 +0100
Alexander Farber <alexander.farber@gmail.com> wrote:

Hi Martin -

On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore <martin.moore@avbrief.com>
wrote:

I’m no expert but I’d think it unlikely an index would be considered for a
table with only 100 rows in. Also I’m pretty sure only one index per table
is used, so you’d want to put state1 and state2 in one index.

I hope to have more records in the words_games table later when my game is
launched (currently in soft launch/beta).....

To elaborate+clarify Martin's comments.

The index will not be used while the table is small because it's actually slower
to process an index than it is to just read the entire table. However, as the
table gets more rows, these timings will reverse and Postgres will start using
the indexes. It's probably best to just create them even though the table is
small. The performance improvement you'll get when the table grows will be
well worth it, and it avoids the problem of trying to remember to create it later.

However, Martin's other comment about only using a single index is incorrect.
Postgres can use multiple indexes per query, so it's often good practace to
put indexes on every column that might ever be used in a WHERE clause.

That being said, if you'll always use a WHERE on both state1 and state2, it will
be even more performant to create a multicolumn index on those two columns
instead of two single-column indexes. Even though Postgres _can_ use multiple
indexes, using a single index is faster if it's available.

--
Bill Moran <wmoran@potentialtech.com>

#6Martin Moore
martin.moore@avbrief.com
In reply to: Alexander Farber (#3)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

My point was that your explain would be the same with indexes as without as they won’t be used.

Martin.

From: Alexander Farber <alexander.farber@gmail.com>
Date: Wednesday, 21 February 2018 at 12:33
Cc: pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

Hi Martin -

On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore <martin.moore@avbrief.com> wrote:

I’m no expert but I’d think it unlikely an index would be considered for a table with only 100 rows in. Also I’m pretty sure only one index per table is used, so you’d want to put state1 and state2 in one index.

I hope to have more records in the words_games table later when my game is launched (currently in soft launch/beta).....

Regards

Alex

#7Martin Moore
martin.moore@avbrief.com
In reply to: Bill Moran (#5)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

However, Martin's other comment about only using a single index is incorrect.
Postgres can use multiple indexes per query, so it's often good practace to
put indexes on every column that might ever be used in a WHERE clause.

--
Bill Moran <wmoran@potentialtech.com>

That's very useful to know!

Martin.

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alexander Farber (#3)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

On 02/21/2018 01:33 PM, Alexander Farber wrote:

Hi Martin -

On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore <martin.moore@avbrief.com
<mailto:martin.moore@avbrief.com>> wrote:

I’m no expert but I’d think it unlikely an index would be considered
for a table with only 100 rows in. Also I’m pretty sure only one
index per table is used, so you’d want to put state1 and state2 in
one index.

I hope to have more records in the words_games table later when my game
is launched (currently in soft launch/beta).....

I do strongly recommend generating a representative amount of data as
part of testing, and making sure all the indexes are there before going
live. Otherwise you'll be playing whack-a-mole over a long period of
time, depending on how fast the individual tables grow.

Either setup a script that generates data through the application, or
generate synthetic data in some other way.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Bill Moran (#5)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

On 02/21/2018 05:00 PM, Bill Moran wrote:

On Wed, 21 Feb 2018 13:33:18 +0100
Alexander Farber <alexander.farber@gmail.com> wrote:

Hi Martin -

On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore <martin.moore@avbrief.com>
wrote:

I’m no expert but I’d think it unlikely an index would be considered for a
table with only 100 rows in. Also I’m pretty sure only one index per table
is used, so you’d want to put state1 and state2 in one index.

I hope to have more records in the words_games table later when my game is
launched (currently in soft launch/beta).....

To elaborate+clarify Martin's comments.

The index will not be used while the table is small because it's actually slower
to process an index than it is to just read the entire table. However, as the
table gets more rows, these timings will reverse and Postgres will start using
the indexes. It's probably best to just create them even though the table is
small. The performance improvement you'll get when the table grows will be
well worth it, and it avoids the problem of trying to remember to create it later.

However, Martin's other comment about only using a single index is incorrect.
Postgres can use multiple indexes per query, so it's often good practace to
put indexes on every column that might ever be used in a WHERE clause.

I call this practice "shotgun" and generally discourage people from
using it. It seems attractive, but not every where condition can be
evaluated using an index, and every index has maintenance overhead.

There are cases when it's somewhat reasonable (e.g. when you don't know
which columns will be referenced by WHERE conditions, and data ingestion
has lower priority than queries). But that doesn't seem to be the case
here - you know the WHERE conditions, and people are likely sending a
lot of inserts (and expecting low latency responses).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Alexander Farber
alexander.farber@gmail.com
In reply to: Tomas Vondra (#9)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

Thank your for the comments

#11Bill Moran
wmoran@potentialtech.com
In reply to: Tomas Vondra (#9)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

On Thu, 22 Feb 2018 03:57:34 +0100
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 02/21/2018 05:00 PM, Bill Moran wrote:

On Wed, 21 Feb 2018 13:33:18 +0100
Alexander Farber <alexander.farber@gmail.com> wrote:

Hi Martin -

On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore <martin.moore@avbrief.com>
wrote:

I’m no expert but I’d think it unlikely an index would be considered for a
table with only 100 rows in. Also I’m pretty sure only one index per table
is used, so you’d want to put state1 and state2 in one index.

I hope to have more records in the words_games table later when my game is
launched (currently in soft launch/beta).....

To elaborate+clarify Martin's comments.

The index will not be used while the table is small because it's actually slower
to process an index than it is to just read the entire table. However, as the
table gets more rows, these timings will reverse and Postgres will start using
the indexes. It's probably best to just create them even though the table is
small. The performance improvement you'll get when the table grows will be
well worth it, and it avoids the problem of trying to remember to create it later.

However, Martin's other comment about only using a single index is incorrect.
Postgres can use multiple indexes per query, so it's often good practace to
put indexes on every column that might ever be used in a WHERE clause.

I call this practice "shotgun" and generally discourage people from
using it. It seems attractive, but not every where condition can be
evaluated using an index, and every index has maintenance overhead.

There are cases when it's somewhat reasonable (e.g. when you don't know
which columns will be referenced by WHERE conditions, and data ingestion
has lower priority than queries). But that doesn't seem to be the case
here - you know the WHERE conditions, and people are likely sending a
lot of inserts (and expecting low latency responses).

Can't _really_ disagree with anything you're saying there ... but I disagree
with the overall sentament for the following reasons:

1) Not everyone has easy access to experienced people like you and I. As a
result, I often recommend the "least likely to be wrong" course of action instead
of recommending investigation that requires expertise that the original poster
might not possess ... after all, if they had the experience to do all the
research, it's unlikely that they'd be asking this question in the first
place.
2) The negative impact of an unused index is tiny compared to the negative
impact of a missing index.
3) Dropping an unused index is _far_ less headache than creating a missing
index on a busy database.
4) Without knowing _all_ the details of the app and how it works, my past
experience is that problems are about a jillion times more likely to be
the result of underindexing (although I _have_ seen issues due to
overindexing, so it _does_ happen)

I can't argue that the _best_ course of action is to set up a simulation
that can exercise the system at predicted size and load levels and use that
to tune things. But communicating all that to others has never been easy
in my experience. As recently as this month my team was blown away that I
could create a simulation that demonstrated how my code would behave under
real-world like conditions. It's apparently not something that a lot of
people understand or are good at or something.

--
Bill Moran <wmoran@potentialtech.com>

#12David Rowley
dgrowleyml@gmail.com
In reply to: Bill Moran (#11)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

On 23 February 2018 at 04:00, Bill Moran <wmoran@potentialtech.com> wrote:

2) The negative impact of an unused index is tiny compared to the negative
impact of a missing index.

I'd say that would entirely depend on the workload of the table and
the entire cluster. Disk space and I/O to write WAL and index pages to
is surely a finite resource. Not to mention the possibility of
disallowing HOT-Updates in the heap.

It would seem to me that anyone using the "shotgun" indexing method
may end up having to learn more about indexing the hard way. Learning
the hard way is something I like to try to avoid, personally. Probably
it all has to come down to how important it is that your app actually
can handle the load vs devs/dba experience level divided by time, both
of the latter two are also a finite resource. So, it probably all has
to be carefully balanced and quite possibly a person's opinion
strongly relates to their experience. If you were in the air traffic
control business, perhaps your opinion might not be the same!? ...
Sorry, the aeroplane crashed because the replica instance lagged and
the plane's location wasn't updated... Analysis shows that the DBA
indexed every column in the table and the WAL volume was more than the
network's bandwidth could handle over the holiday period. (Note: I
know nothing about air traffic control, but it does seem like
something you'd want to make stable systems for, games on the
internet, probably less so...).

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13Bill Moran
wmoran@potentialtech.com
In reply to: David Rowley (#12)
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

On Fri, 23 Feb 2018 17:10:56 +1300
David Rowley <david.rowley@2ndquadrant.com> wrote:

On 23 February 2018 at 04:00, Bill Moran <wmoran@potentialtech.com> wrote:

2) The negative impact of an unused index is tiny compared to the negative
impact of a missing index.

I'd say that would entirely depend on the workload of the table and
the entire cluster. Disk space and I/O to write WAL and index pages to
is surely a finite resource. Not to mention the possibility of
disallowing HOT-Updates in the heap.

I feel like you missed my point. You're absolutely right, but the
disagreement was not on whether or not an unused index could cause
problems, but on the likelihood that the OP was going to build the
simulation code to actually determine whether the index is needed
or not. Information from the original question led me to believe
that simulation was either beyond his skill level or beyond his
time allocation; so I provided a less good, but more likely to be
helpful (in my opinion) answer.

The pattern that almost ALL new ventures I've seen follow is:
1) Hack something together based on an idea for a product
2) If the product actually succeeds, experience tons of issues
related to scaling and performance
3) Run around like a crazy person fixing all the scaling and
performance issues
4) Sell the company to someone else who ultimately becomes responsible
for maturing the software

In theory, there's no reason this pattern _has_ to be so prevalent,
yet it is. Probably becuase it appears to minimize the up front cost,
which the people footing the bill just love.

Given that process, "shotgun" indexing is part of step 1 or step 3.
Whereas the building of load simulations and _real_ tuning of the
system is relegated to step 4.

Since investers tend to want to get out quick if #2 isn't going to
happen, they don't want people doing work that they consider part
of step #4.

I'm a pragmatist. I'd love to see everyone build software in a sane,
well-considered manner. I'd also love to see government without
corruption. However, in the world I _actually_ see, those things aren't
prevalent.

It would seem to me that anyone using the "shotgun" indexing method
may end up having to learn more about indexing the hard way. Learning
the hard way is something I like to try to avoid, personally. Probably
it all has to come down to how important it is that your app actually
can handle the load vs devs/dba experience level divided by time, both
of the latter two are also a finite resource. So, it probably all has
to be carefully balanced and quite possibly a person's opinion
strongly relates to their experience. If you were in the air traffic
control business, perhaps your opinion might not be the same!? ...
Sorry, the aeroplane crashed because the replica instance lagged and
the plane's location wasn't updated... Analysis shows that the DBA
indexed every column in the table and the WAL volume was more than the
network's bandwidth could handle over the holiday period. (Note: I
know nothing about air traffic control, but it does seem like
something you'd want to make stable systems for, games on the
internet, probably less so...).

I really hope that people writing ATC software have the experience to
do really good testing (including load simulation, etc) but the 3 mile
island accident happened partially because of faulty sensor design, so
there's no guarantee that's the case.

Interesting discussion.

--
Bill Moran