index question

melvin6925@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#1)

Re: index question

On Sun, May 1, 2016 at 5:40 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Hi all,

I've got the following index on the gorfs.inode_segments table:

CREATE INDEX ix_clientids
ON gorfs.inode_segments
USING btree
(("split_part"("full_path"::"text", '/'::"text", 4)::integer))
WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text");

And I'm running the following Query:

SELECT
* FROM ( SELECT split_part(full_path, '/', 4)::INT AS account_id,
split_part(full_path, '/', 6)::INT AS note_id,
split_part(full_path, '/', 9)::TEXT AS variation,
st_size,
segment_index,
reverse(split_part(reverse(full_path), '/', 1)) as file_name,
i.st_ino,
full_path
FROM gorfs.inodes i
JOIN gorfs.inode_segments s
ON i.st_ino = s.st_ino_target
WHERE i.checksum_md5 IS NOT NULL
AND s.full_path ~ '/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
AND i.st_size > 0) as test WHERE account_id = 12225

*- But the query does not use the index... Why?*

Explain analyze:

"Seq Scan on "inode_segments" (cost=0.00..3047212.44 rows=524846
width=63) (actual time=14212.466..51428.439 rows=31 loops=1)"
" Filter: ("split_part"(("full_path")::"text", '/'::"text", 4) =
'12225'::"text")"
" Rows Removed by Filter: 104361402"
"Total runtime: 51428.482 ms"

Cheers
Lucas

Well, a little more information would be useful like:
1. What is the PostgreSQL version?
2. What is the O/S?
3. What is the structure of gorfs.inode_segments?
4. Did you do an ANALYZE table gorfs.inode_segments after you created the
index?

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

david.g.johnston@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#1)

Re: index question

On Sunday, May 1, 2016, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:

Hi all,

I've got the following index on the gorfs.inode_segments table:

CREATE INDEX ix_clientids
ON gorfs.inode_segments
USING btree
(("split_part"("full_path"::"text", '/'::"text", 4)::integer))
WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text");

And I'm running the following Query:

SELECT
* FROM ( SELECT split_part(full_path, '/', 4)::INT AS account_id,
split_part(full_path, '/', 6)::INT AS note_id,
split_part(full_path, '/', 9)::TEXT AS variation,
st_size,
segment_index,
reverse(split_part(reverse(full_path), '/', 1)) as file_name,
i.st_ino,
full_path
FROM gorfs.inodes i
JOIN gorfs.inode_segments s
ON i.st_ino = s.st_ino_target
WHERE i.checksum_md5 IS NOT NULL
AND s.full_path ~ '/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
AND i.st_size > 0) as test WHERE account_id = 12225

*- But the query does not use the index... Why?*

The most obvious reason is that the index is partial but the query doesn't
contain an appropriate where clause.

I'm also not sure how well the planner can move around the functional
expression in the select-list so that it matches up in the where clause to
then match the index.

Explain analyze:

"Seq Scan on "inode_segments" (cost=0.00..3047212.44 rows=524846
width=63) (actual time=14212.466..51428.439 rows=31 loops=1)"
" Filter: ("split_part"(("full_path")::"text", '/'::"text", 4) =
'12225'::"text")"
" Rows Removed by Filter: 104361402"
"Total runtime: 51428.482 ms"

These stats seem wacky...and seem to be missing stuff like the inodes
table...

David J.

almost 10 years ago

In reply to: David G. Johnston (#3)

Re: index question

Well, a little more information would be useful like:

Ops.. yes sure.. sorry about that.

1. What is the PostgreSQL version?

PostgreSQL 9.2

2. What is the O/S?

Linux Centos 6.7 64 bits

3. What is the structure of gorfs.inode_segments?

Table inode_segments: (I'll leave the comments to help)

CREATE TABLE gorfs.inode_segments
(
st_ino "gorfs"."ino_t" NOT NULL, -- Inode number the segment belongs to.
alongside segment_index, it forms the table's primary key to ensure
uniqueness per relevant scope
segment_index "gorfs"."pathname_component" NOT NULL, -- See st_no's
column description for further details. The meaning of this column varies
based on the host inode type:...
st_ino_target "gorfs"."ino_t", -- Target inode number. Meaningful for
directory inode segments (objects in the directory)
full_path "gorfs"."absolute_pathname", -- Exploded absolute canonical
path for quick lookups. Meaningful only for directory inode segments
(objects in the directory)
segment_data "bytea", -- Actual data segment. Meaningful only for
S_IFLNK and S_IFREG....
CONSTRAINT pk_inode_segments PRIMARY KEY ("st_ino", "segment_index"),
CONSTRAINT fk_host_inode_must_exist FOREIGN KEY (st_ino)
REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_target_inode_must_exist FOREIGN KEY (st_ino_target)
REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT uc_no_duplicate_full_paths UNIQUE ("full_path"),
CONSTRAINT cc_only_root_can_be_its_own_parent CHECK ("st_ino_target" IS
NULL OR "st_ino"::bigint <> "st_ino_target"::bigint OR "st_ino"::bigint = 2)
)

Table gorfs.inodes:

CREATE TABLE gorfs.inodes
(
st_dev "gorfs"."dev_t" DEFAULT NULL::bigint, -- ID of device containing
file. Meaningless in this implementation....
st_ino "gorfs"."ino_t" NOT NULL DEFAULT
"nextval"('"gorfs"."inodes_st_ino_idseq"'::"regclass"), -- Inode number....
st_mode "gorfs"."mode_t" NOT NULL, -- File type/mode bits....
st_nlink "gorfs"."nlink_t" NOT NULL, -- Number of hard links (directory
segments) pointing to this inode. See stat(2) manual page for details (man
2 stat)
st_uid "gorfs"."uid_t" NOT NULL, -- User ID that owns the file. See
stat(2) manual page for details (man 2 stat)
st_gid "gorfs"."gid_t" NOT NULL, -- Group ID that owns the file.See
stat(2) manual page for details (man 2 stat)
st_rdev "gorfs"."dev_t", -- Device number (currently we don't support
device files). See stat(2) manual page for details (man 2 stat)
st_size "gorfs"."off_t", -- File size, if applicable. See stat(2) manual
page for details (man 2 stat)
st_blksize "gorfs"."blksize_t", -- Block size for I/O. Meaningless here,
hard coded to 512. See stat(2) manual page for details (man 2 stat)
st_blocks "gorfs"."blkcnt_t", -- Number of allocated blocks. Meaningless
here, but calculated from block size. See stat(2) manual page for details
(man 2 stat)
st_atime "gorfs"."time_t" NOT NULL, -- Timestamp of last access. Stored
as a timestamp as opposed to unix TS. See stat(2) manual page for details
(man 2 stat)
st_mtime "gorfs"."time_t" NOT NULL, -- Timestamp of last modification.
Stored as a timestamp as opposed to unix TS. See stat(2) manual page for
details (man 2 stat)
st_ctime "gorfs"."time_t" NOT NULL, -- Timestamp of last change. Stored
as a timestamp as opposed to unix TS. See stat(2) manual page for details
(man 2 stat)
checksum_md5 "md5_hash", -- MD5 checksum of the file. Supplied by the
application as the DB might not even see the payload
media_subtype_id integer, -- Reference to MIME type (see FK constraint).
We can't support all media types but unknow types can be stored as
application/octet-stream
external_size "gorfs"."off_t", -- For symlinks only. Meaningful for fat
links only: total size of the fat link target. Null for normal symlinks
CONSTRAINT pk_inodes PRIMARY KEY ("st_ino"),
CONSTRAINT fk_media_subtype_must_exist FOREIGN KEY (media_subtype_id)
REFERENCES public.media_subtypes (media_subtype_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT cc_mount_devices_not_supported CHECK ("st_dev" IS NULL)
)

4. Did you do an ANALYZE table gorfs.inode_segments after you created the

index?

Yes.. actually the index was already created.

melvin6925@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#4)

Re: index question

On Sun, May 1, 2016 at 5:58 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Well, a little more information would be useful like:

Ops.. yes sure.. sorry about that.

1. What is the PostgreSQL version?

PostgreSQL 9.2

2. What is the O/S?

Linux Centos 6.7 64 bits

3. What is the structure of gorfs.inode_segments?

Table inode_segments: (I'll leave the comments to help)

CREATE TABLE gorfs.inode_segments
(
st_ino "gorfs"."ino_t" NOT NULL, -- Inode number the segment belongs
to. alongside segment_index, it forms the table's primary key to ensure
uniqueness per relevant scope
segment_index "gorfs"."pathname_component" NOT NULL, -- See st_no's
column description for further details. The meaning of this column varies
based on the host inode type:...
st_ino_target "gorfs"."ino_t", -- Target inode number. Meaningful for
directory inode segments (objects in the directory)
full_path "gorfs"."absolute_pathname", -- Exploded absolute canonical
path for quick lookups. Meaningful only for directory inode segments
(objects in the directory)
segment_data "bytea", -- Actual data segment. Meaningful only for
S_IFLNK and S_IFREG....
CONSTRAINT pk_inode_segments PRIMARY KEY ("st_ino", "segment_index"),
CONSTRAINT fk_host_inode_must_exist FOREIGN KEY (st_ino)
REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_target_inode_must_exist FOREIGN KEY (st_ino_target)
REFERENCES gorfs.inodes (st_ino) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT uc_no_duplicate_full_paths UNIQUE ("full_path"),
CONSTRAINT cc_only_root_can_be_its_own_parent CHECK ("st_ino_target" IS
NULL OR "st_ino"::bigint <> "st_ino_target"::bigint OR "st_ino"::bigint = 2)
)

Table gorfs.inodes:

CREATE TABLE gorfs.inodes
(
st_dev "gorfs"."dev_t" DEFAULT NULL::bigint, -- ID of device containing
file. Meaningless in this implementation....
st_ino "gorfs"."ino_t" NOT NULL DEFAULT
"nextval"('"gorfs"."inodes_st_ino_idseq"'::"regclass"), -- Inode number....
st_mode "gorfs"."mode_t" NOT NULL, -- File type/mode bits....
st_nlink "gorfs"."nlink_t" NOT NULL, -- Number of hard links (directory
segments) pointing to this inode. See stat(2) manual page for details (man
2 stat)
st_uid "gorfs"."uid_t" NOT NULL, -- User ID that owns the file. See
stat(2) manual page for details (man 2 stat)
st_gid "gorfs"."gid_t" NOT NULL, -- Group ID that owns the file.See
stat(2) manual page for details (man 2 stat)
st_rdev "gorfs"."dev_t", -- Device number (currently we don't support
device files). See stat(2) manual page for details (man 2 stat)
st_size "gorfs"."off_t", -- File size, if applicable. See stat(2)
manual page for details (man 2 stat)
st_blksize "gorfs"."blksize_t", -- Block size for I/O. Meaningless
here, hard coded to 512. See stat(2) manual page for details (man 2 stat)
st_blocks "gorfs"."blkcnt_t", -- Number of allocated blocks.
Meaningless here, but calculated from block size. See stat(2) manual page
for details (man 2 stat)
st_atime "gorfs"."time_t" NOT NULL, -- Timestamp of last access. Stored
as a timestamp as opposed to unix TS. See stat(2) manual page for details
(man 2 stat)
st_mtime "gorfs"."time_t" NOT NULL, -- Timestamp of last modification.
Stored as a timestamp as opposed to unix TS. See stat(2) manual page for
details (man 2 stat)
st_ctime "gorfs"."time_t" NOT NULL, -- Timestamp of last change. Stored
as a timestamp as opposed to unix TS. See stat(2) manual page for details
(man 2 stat)
checksum_md5 "md5_hash", -- MD5 checksum of the file. Supplied by the
application as the DB might not even see the payload
media_subtype_id integer, -- Reference to MIME type (see FK
constraint). We can't support all media types but unknow types can be
stored as application/octet-stream
external_size "gorfs"."off_t", -- For symlinks only. Meaningful for fat
links only: total size of the fat link target. Null for normal symlinks
CONSTRAINT pk_inodes PRIMARY KEY ("st_ino"),
CONSTRAINT fk_media_subtype_must_exist FOREIGN KEY (media_subtype_id)
REFERENCES public.media_subtypes (media_subtype_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT cc_mount_devices_not_supported CHECK ("st_dev" IS NULL)
)

4. Did you do an ANALYZE table gorfs.inode_segments after you created

the index?

Yes.. actually the index was already created.

Well, it looks like David's explanation is correct.
Your index is based on split_part function
but the WHERE clause is specific to full_path, so the planner cannot find a
valid index

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

david.g.johnston@gmail.com

almost 10 years ago

In reply to: Melvin Davidson (#5)

Re: index question

On Sunday, May 1, 2016, Melvin Davidson <melvin6925@gmail.com> wrote:

Your index is based on split_part function
but the WHERE clause is specific to full_path, so the planner cannot find
a valid index

This sentence is even less useful than the questions that you asked...

David J.

melvin6925@gmail.com

almost 10 years ago

In reply to: David G. Johnston (#6)

Re: index question

On Sun, May 1, 2016 at 6:31 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Sunday, May 1, 2016, Melvin Davidson <melvin6925@gmail.com> wrote:

Your index is based on split_part function
but the WHERE clause is specific to full_path, so the planner cannot find
a valid index

David J.

This sentence is even less useful than the questions that you asked...

Your comments are antagonistic at best, but perhaps you should take a
course in english.

To clarify, the index is based on a function called "split_part(....)
The WHERE clause is only referencing the full_part column, so the planner
cannot associate the index with the full_part column.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

almost 10 years ago

In reply to: Melvin Davidson (#7)

Re: index question

To clarify, the index is based on a function called "split_part(....)
The WHERE clause is only referencing the full_part column, so the planner
cannot associate the index with the full_part column.

Thanks for the explanation, Melvin.

It would be simple like:

CREATE INDEX CONCURRENTLY ON gorfs.inode_segments USING btree ("full_path");

Thanks again.
Lucas

melvin6925@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#8)

Re: index question

On Sun, May 1, 2016 at 9:18 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

To clarify, the index is based on a function called "split_part(....)

The WHERE clause is only referencing the full_part column, so the planner
cannot associate the index with the full_part column.

Thanks for the explanation, Melvin.

It would be simple like:

CREATE INDEX CONCURRENTLY ON gorfs.inode_segments USING btree
("full_path");

?

Thanks again.
Lucas

CREATE INDEX CONCURRENTLY ON gorfs.inode_segments USING btree

("full_path");

Yes, that should work.
A word of caution, only create additional indexes that will actually be
used in queries.

You can check how often indexes are used (and status) with:

SELECT n.nspname as schema,
i.relname as table,
i.indexrelname as index,
i.idx_scan,
i.idx_tup_read,
i.idx_tup_fetch,
CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
pg_get_indexdef(idx.indexrelid),
CASE WHEN idx.indisvalid
THEN 'valid'
ELSE 'INVALID'
END as statusi,
pg_relation_size(quote_ident(n.nspname)|| '.' ||
quote_ident(i.relname)) as size_in_bytes,
pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' ||
quote_ident(i.relname))) as size
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE n.nspname NOT LIKE 'pg_%'
ORDER BY 1, 2, 3;

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#10

http://explain.depesz.com/s/rHOU

almost 10 years ago

In reply to: Melvin Davidson (#9)

Re: index question

Sorry @Melvin, sent the previous email just to you..

That's a great one, too! Cheers!

Well.. the index creation did not help...

if possible please have a look on the explain analyze results:

What else can I do?

*The indexes I created is:*
- CREATE INDEX CONCURRENTLY ix_inode_segments_notes_clientids2 ON
gorfs.inode_segments USING btree ("full_path");

- CREATE INDEX CONCURRENTLY ix_inodes_checksum_st_size ON gorfs.inodes
USING btree ("checksum_md5","st_size");

#11

melvin6925@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#10)

Re: index question

On Sun, May 1, 2016 at 10:27 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Sorry @Melvin, sent the previous email just to you..

That's a great one, too! Cheers!

Well.. the index creation did not help...

if possible please have a look on the explain analyze results:

http://explain.depesz.com/s/rHOU

What else can I do?

*The indexes I created is:*
- CREATE INDEX CONCURRENTLY ix_inode_segments_notes_clientids2 ON
gorfs.inode_segments USING btree ("full_path");

- CREATE INDEX CONCURRENTLY ix_inodes_checksum_st_size ON gorfs.inodes
USING btree ("checksum_md5","st_size");

Two things to consider.
1. Did you remember to run ANALYZE on the table after the new indexes were
created?
2. Try doing a
SET enable_seqscan = off;
before executing the query and compare execution times.
It might just be that a seqscan would be faster.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#12

http://www.postgresql.org/docs/current/static/pgtrgm.html

david.g.johnston@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#10)

Re: index question

On Sun, May 1, 2016 at 7:27 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Repeating the query to improve the self-containment aspect of the email
would have been appreciated.

if possible please have a look on the explain analyze results:

http://explain.depesz.com/s/rHOU

What else can I do?

*The indexes I created is:*
- CREATE INDEX CONCURRENTLY ix_inode_segments_notes_clientids2 ON
gorfs.inode_segments USING btree ("full_path");

the only condition that could even potentially use this index is:

s.full_path ~ '/userfiles/account/[0-9]+/[a-z]+/[0-9]+'

My knowledge is limited in this area, and the documentation covers this
specific dynamic only minimally, but for certain attempting to perform an
un-anchored regexp match using a btree index is impossible.

These leaves to avenues to explore.

1) See if a start-of-string anchor will make the btree index usable
2) Use the pg_trgm contrib module

- CREATE INDEX CONCURRENTLY ix_inodes_checksum_st_size ON gorfs.inodes

USING btree ("checksum_md5","st_size");

This one was used.

IMO you are leaving too much infomation encoded in the full_path. I'd
personally setup triggers to parse out the components on insert/update into
fields and then index those fields. In fact I'd probably use some form of
inheritance or other one-to-one relationship here.

David J.

#13

http://www.postgresql.org/docs/current/static/indexes-types.html

david.g.johnston@gmail.com

almost 10 years ago

In reply to: David G. Johnston (#12)

Re: index question

On Mon, May 2, 2016 at 8:16 AM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Sun, May 1, 2016 at 7:27 PM, drum.lucas@gmail.com <drum.lucas@gmail.com

wrote:

Repeating the query to improve the self-containment aspect of the email
would have been appreciated.

if possible please have a look on the explain analyze results:

http://explain.depesz.com/s/rHOU

What else can I do?

*The indexes I created is:*
- CREATE INDEX CONCURRENTLY ix_inode_segments_notes_clientids2 ON
gorfs.inode_segments USING btree ("full_path");

the only condition that could even potentially use this index is:

s.full_path ~ '/userfiles/account/[0-9]+/[a-z]+/[0-9]+'

My knowledge is limited in this area, and the documentation covers this
specific dynamic only minimally, but for certain attempting to perform an
un-anchored regexp match using a btree index is impossible.

These leaves to avenues to explore.

1) See if a start-of-string anchor will make the btree index usable
2) Use the pg_trgm contrib module

http://www.postgresql.org/docs/current/static/pgtrgm.html

- CREATE INDEX CONCURRENTLY ix_inodes_checksum_st_size ON gorfs.inodes

USING btree ("checksum_md5","st_size");

This one was used.

IMO you are leaving too much infomation encoded in the full_path. I'd
personally setup triggers to parse out the components on insert/update into
fields and then index those fields. In fact I'd probably use some form of
inheritance or other one-to-one relationship here.

I guess it is documented, I just needed to look a bit more.

"""
The optimizer can also use a B-tree index for queries involving the pattern
matching operators LIKE and ~ if the pattern is a constant and is anchored
to the beginning of the string — for example, col LIKE 'foo%' or col ~
'^foo', but not col LIKE '%bar'. However, if your database does not use the
C locale you will need to create the index with a special operator class to
support indexing of pattern-matching queries; see Section 11.9 below. It is
also possible to use B-tree indexes for ILIKE and ~*, but only if the
pattern starts with non-alphabetic characters, i.e., characters that are
not affected by upper/lower case conversion.
"""

David J.

#14

almost 10 years ago

In reply to: David G. Johnston (#13)

Re: index question

The index that I've created and is working is:

Index without typecasting:

CREATE INDEX CONCURRENTLY ix_clientids2 ON gorfs.inode_segments USING
btree (full_path);

Thanks for the help, guys!

Melvin, that Query you sent is very interesting..

SELECT n.nspname as schema,

i.relname as table,
i.indexrelname as index,
i.idx_scan,
i.idx_tup_read,
i.idx_tup_fetch,
CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
pg_get_indexdef(idx.indexrelid),
CASE WHEN idx.indisvalid
THEN 'valid'
ELSE 'INVALID'
END as statusi,
pg_relation_size(quote_ident(n.nspname)|| '.' ||
quote_ident(i.relname)) as size_in_bytes,
pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' ||
quote_ident(i.relname))) as size
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE n.nspname NOT LIKE 'pg_%'
ORDER BY 1, 2, 3;

I've found more then 100 indexes that the columns:

"idx_scan", "idx_tup_read" and "idx_tup_fetch" are 0.
So, it's safe to say that they are not being used, is that right?

But some indexes have almost 100GB on the size column. This means they are
not being used now, but they could be used in the past?

- Is it safe to remove them?

Cheers
Lucas

#15

david.g.johnston@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#14)

Re: index question

On Mon, May 2, 2016 at 12:44 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Melvin, that Query you sent is very interesting..

SELECT n.nspname as schema,

i.relname as table,
i.indexrelname as index,
i.idx_scan,
i.idx_tup_read,
i.idx_tup_fetch,
CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
pg_get_indexdef(idx.indexrelid),
CASE WHEN idx.indisvalid
THEN 'valid'
ELSE 'INVALID'
END as statusi,
pg_relation_size(quote_ident(n.nspname)|| '.' ||
quote_ident(i.relname)) as size_in_bytes,
pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' ||
quote_ident(i.relname))) as size
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE n.nspname NOT LIKE 'pg_%'
ORDER BY 1, 2, 3;

I've found more then 100 indexes that the columns:

"idx_scan", "idx_tup_read" and "idx_tup_fetch" are 0.
So, it's safe to say that they are not being used, is that right?

But some indexes have almost 100GB on the size column. This means they are
not being used now, but they could be used in the past?

Index size and index usage are unrelated. Modifications to the index to
keep it in sync with the table do not count as "usage" - only reading it
for where clause use counts.

David J.

#16

almost 10 years ago

In reply to: David G. Johnston (#15)

Re: index question

Index size and index usage are unrelated. Modifications to the index to
keep it in sync with the table do not count as "usage" - only reading it
for where clause use counts.

David J.

So only those with* 0 size*, should be deleted? Is that you're saying?
Can you be more clear please?

Lucas

#17

david.g.johnston@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#16)

Re: index question

On Mon, May 2, 2016 at 12:56 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Index size and index usage are unrelated. Modifications to the index to
keep it in sync with the table do not count as "usage" - only reading it
for where clause use counts.

So only those with* 0 size*, should be deleted? Is that you're saying?

I'm not offering advice as to when to delete or not delete any particular
index.

Can you be more clear please?

Probably not :)

You cannot make an inference about an index's usage by looking at its
size. Similarly, a seldom used but large index is not necessarily one you
want to remove if doing so causes a once-a-month process that usually take
seconds or minutes to now take hours.

David J.

#18

melvin6925@gmail.com

almost 10 years ago

In reply to: David G. Johnston (#17)

Re: index question

On Mon, May 2, 2016 at 4:08 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Mon, May 2, 2016 at 12:56 PM, drum.lucas@gmail.com <
drum.lucas@gmail.com> wrote:

Index size and index usage are unrelated. Modifications to the index
to keep it in sync with the table do not count as "usage" - only reading it
for where clause use counts.

So only those with* 0 size*, should be deleted? Is that you're saying?

I'm not offering advice as to when to delete or not delete any particular
index.

Can you be more clear please?

Probably not :)

You cannot make an inference about an index's usage by looking at its
size. Similarly, a seldom used but large index is not necessarily one you
want to remove if doing so causes a once-a-month process that usually take
seconds or minutes to now take hours.

David J.

Generically speaking, if the total of dx_scan + idx_tup_read +
idx_tup_fetch are 0, then it is an _indication_ that those indexes should
be dropped.
You should also consider how long those indexes have existed and how often
queries are executed.

A good practice would be to save the SQL to recreate the indexes before you
drop any. In that way, if you notice a degradation in performance, you can
just rebuild
You can use the following query to do that, but you might want to edit and
add the CONCURRENT option.

SELECT pg_get_indexdef(idx.indexrelid) || ';'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE NOT idx.indisprimary
AND NOT idx.indisunique
AND i.relname NOT LIKE 'pg_%'
AND i.idx_scan + idx_tup_read + idx_tup_fetch = 0
ORDER BY n.nspname,
i.relname;

The following query generates the drop statements.

SELECT 'DROP INDEX CONCURRENTLY IF EXISTS "' || quote_ident(n.nspname) ||
'"' || '.' || '"' || quote_ident(i.indexrelname) || '"' ||';'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE NOT idx.indisprimary
AND i.relname NOT LIKE 'pg_%'
AND i.idx_scan + idx_tup_read + idx_tup_fetch = 0
ORDER BY i.indexrelname;

I would not place any concern on the size of the index. That is just what
is needed to keep track of all associated rows.
Once you drop the indexes you determine are not needed, you will gain back
the space that they use up.

Please stay in touch and let me know how it goes.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#19

almost 10 years ago

In reply to: Melvin Davidson (#18)

Re: index question

Generically speaking, if the total of dx_scan + idx_tup_read +
idx_tup_fetch are 0, then it is an _indication_ that those indexes should
be dropped.
You should also consider how long those indexes have existed and how often
queries are executed.

A good practice would be to save the SQL to recreate the indexes before
you drop any. In that way, if you notice a degradation in performance, you
can just rebuild
You can use the following query to do that, but you might want to edit and
add the CONCURRENT option.

SELECT pg_get_indexdef(idx.indexrelid) || ';'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE NOT idx.indisprimary
AND NOT idx.indisunique
AND i.relname NOT LIKE 'pg_%'
AND i.idx_scan + idx_tup_read + idx_tup_fetch = 0
ORDER BY n.nspname,
i.relname;

The following query generates the drop statements.

SELECT 'DROP INDEX CONCURRENTLY IF EXISTS "' || quote_ident(n.nspname) ||
'"' || '.' || '"' || quote_ident(i.indexrelname) || '"' ||';'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE NOT idx.indisprimary
AND i.relname NOT LIKE 'pg_%'
AND i.idx_scan + idx_tup_read + idx_tup_fetch = 0
ORDER BY i.indexrelname;

I would not place any concern on the size of the index. That is just what
is needed to keep track of all associated rows.
Once you drop the indexes you determine are not needed, you will gain back
the space that they use up.

Please stay in touch and let me know how it goes.

I will. Thanks for the help/tips!

Cheers
Lucas

#20

Julien Rouhaud

rjuju123@gmail.com

almost 10 years ago

In reply to: drum.lucas@gmail.com (#19)

Re: index question

On 02/05/2016 23:02, drum.lucas@gmail.com wrote:

Generically speaking, if the total of dx_scan + idx_tup_read +
idx_tup_fetch are 0, then it is an _indication_ that those indexes
should be dropped.
You should also consider how long those indexes have existed and how
often queries are executed.

A good practice would be to save the SQL to recreate the indexes
before you drop any. In that way, if you notice a degradation in
performance, you can just rebuild
You can use the following query to do that, but you might want to
edit and add the CONCURRENT option.

SELECT pg_get_indexdef(idx.indexrelid) || ';'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE NOT idx.indisprimary
AND NOT idx.indisunique
AND i.relname NOT LIKE 'pg_%'
AND i.idx_scan + idx_tup_read + idx_tup_fetch = 0
ORDER BY n.nspname,
i.relname;

The following query generates the drop statements.

SELECT 'DROP INDEX CONCURRENTLY IF EXISTS "' ||
quote_ident(n.nspname) || '"' || '.' || '"' ||
quote_ident(i.indexrelname) || '"' ||';'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE NOT idx.indisprimary
AND i.relname NOT LIKE 'pg_%'
AND i.idx_scan + idx_tup_read + idx_tup_fetch = 0
ORDER BY i.indexrelname;

I would not place any concern on the size of the index. That is just
what is needed to keep track of all associated rows.
Once you drop the indexes you determine are not needed, you will
gain back the space that they use up.

Please stay in touch and let me know how it goes.

I will. Thanks for the help/tips!

Be careful, this query discards indexes used for primary key, but at
least unique (indisunique) and exclusion constraint (indisexclusion)
indexes should also be excluded, and also probably indexes used to
cluster tables (indisclustered).

You should also check since when the idsx_scan and other counters are
aggregating before dropping any index. Check
pg_stat_get_db_stat_reset_time(oid), with the oid of the related
database(s).

Cheers
Lucas

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#21