bloat indexes - opinion

Started by Patrick Babout 9 years ago9 messagesgeneral
Jump to latest
#1Patrick B
patrickbakerbr@gmail.com

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on pg_toast table. Its real
size is 2TB
Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
(full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)

What is the real impact of a bloat index? If I reindex it, queries will be
faster?

Thanks
Patrick

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Patrick B (#1)
Re: bloat indexes - opinion

On 02/21/2017 01:44 PM, Patrick B wrote:

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on pg_toast table.
Its real size is 2TB

How do you know one number is right and the other is wrong?

Have you looked at the functions here?:
https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-ADMIN-DBOBJECT

Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
(full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)

What is the real impact of a bloat index? If I reindex it, queries will
be faster?

Thanks
Patrick

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Patrick B
patrickbakerbr@gmail.com
In reply to: Adrian Klaver (#2)
Re: bloat indexes - opinion

2017-02-22 10:59 GMT+13:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 02/21/2017 01:44 PM, Patrick B wrote:

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on pg_toast table.
Its real size is 2TB

How do you know one number is right and the other is wrong?

1. on that table (seg) i store binary data. It is impossible to have only
18GB of it.
2.

SELECT schema_name,

pg_size_pretty(sum(table_size)::bigint),

(sum(table_size) / pg_database_size(current_database())) * 100

FROM (

SELECT pg_catalog.pg_namespace.nspname as schema_name,

pg_relation_size(pg_catalog.pg_class.oid) as table_size

FROM pg_catalog.pg_class

JOIN pg_catalog.pg_namespace ON relnamespace =
pg_catalog.pg_namespace.oid

) t

GROUP BY schema_name

ORDER BY schema_name

pg_toast 2706 GB 82.62112838877240860000 <-- this belongs to the seg table.

Show quoted text

Have you looked at the functions here?:
https://www.postgresql.org/docs/9.6/static/functions-admin.
html#FUNCTIONS-ADMIN-DBOBJECT

Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
(full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)

What is the real impact of a bloat index? If I reindex it, queries will
be faster?

Thanks
Patrick

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Patrick B
patrickbakerbr@gmail.com
In reply to: Patrick B (#3)
Re: bloat indexes - opinion

2017-02-22 11:11 GMT+13:00 Patrick B <patrickbakerbr@gmail.com>:

2017-02-22 10:59 GMT+13:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 02/21/2017 01:44 PM, Patrick B wrote:

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on pg_toast table.
Its real size is 2TB

How do you know one number is right and the other is wrong?

1. on that table (seg) i store binary data. It is impossible to have only
18GB of it.
2.

SELECT schema_name,

pg_size_pretty(sum(table_size)::bigint),

(sum(table_size) / pg_database_size(current_database())) * 100

FROM (

SELECT pg_catalog.pg_namespace.nspname as schema_name,

pg_relation_size(pg_catalog.pg_class.oid) as table_size

FROM pg_catalog.pg_class

JOIN pg_catalog.pg_namespace ON relnamespace =
pg_catalog.pg_namespace.oid

) t

GROUP BY schema_name

ORDER BY schema_name

pg_toast 2706 GB 82.62112838877240860000 <-- this belongs to the seg
table.

Have you looked at the functions here?:
https://www.postgresql.org/docs/9.6/static/functions-admin.h
tml#FUNCTIONS-ADMIN-DBOBJECT

Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
(full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)

What is the real impact of a bloat index? If I reindex it, queries will
be faster?

Thanks
Patrick

I ran the query before and after the reindex, and it seems it did not help
on performance.

*The query I used:*

explain analyze select * from seg where full_path = '/userfile/123';

*Before reindex:*

Index Scan using ix_filter_by_tree on seg (cost=0.00..144.87 rows=215
width=8) (actual time=0.047..0.047 rows=1 loops=1)
Index Cond: (full_path = '/userfile/123')
Total runtime: 0.059 ms
(3 rows)

*After reindex:*

Index Scan using ix_filter_by_tree on seg (cost=0.00..141.83 rows=220
width=8) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (full_path = '/userfile/123')
Total runtime: 0.036 ms
(3 rows)

Note that the '*cost*' is pretty much the same.

*My question is:*
If I have a bloat index. Why do I need to reindex it if I got none
performance improvements?

Cheers
Patrick

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Patrick B (#4)
Re: bloat indexes - opinion

On 02/21/2017 03:41 PM, Patrick B wrote:

2017-02-22 11:11 GMT+13:00 Patrick B <patrickbakerbr@gmail.com
<mailto:patrickbakerbr@gmail.com>>:

2017-02-22 10:59 GMT+13:00 Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>:

On 02/21/2017 01:44 PM, Patrick B wrote:

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on pg_toast table.
Its real size is 2TB

How do you know one number is right and the other is wrong?

1. on that table (seg) i store binary data. It is impossible to have
only 18GB of it.
2.

SELECT schema_name,

pg_size_pretty(sum(table_size)::bigint),

(sum(table_size) /
pg_database_size(current_database())) * 100

FROM (

SELECT pg_catalog.pg_namespace.nspname as schema_name,

pg_relation_size(pg_catalog.pg_class.oid) as table_size

FROM pg_catalog.pg_class

JOIN pg_catalog.pg_namespace ON relnamespace =
pg_catalog.pg_namespace.oid

) t

GROUP BY schema_name

ORDER BY schema_name

pg_toast2706 GB82.62112838877240860000 <-- this belongs to the seg
table.

Have you looked at the functions here?:
https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-ADMIN-DBOBJECT
<https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-ADMIN-DBOBJECT&gt;

Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
(full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)

What is the real impact of a bloat index? If I reindex it, queries will
be faster?

Thanks
Patrick

I ran the query before and after the reindex, and it seems it did not
help on performance.

*The query I used:*

explain analyze select * from seg where full_path = '/userfile/123';

The table schema would be useful.

*Before reindex:*

Index Scan using ix_filter_by_tree on seg (cost=0.00..144.87
rows=215 width=8) (actual time=0.047..0.047 rows=1 loops=1)
Index Cond: (full_path = '/userfile/123')
Total runtime: 0.059 ms
(3 rows)

*After reindex:*

Index Scan using ix_filter_by_tree on seg (cost=0.00..141.83
rows=220 width=8) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (full_path = '/userfile/123')
Total runtime: 0.036 ms
(3 rows)

Not showing the complete explain analyze makes the above not all that enlightening.

Note that the '*/cost/*' is pretty much the same.

*My question is:*
If I have a bloat index. Why do I need to reindex it if I got none
performance improvements?

Because it is an indication that you may not have index bloat?

Not sure a runtime of 0.036 to 0.036 ms over a 2TB table is symptomatic of a problem.

Might be worth taking a look at:

https://www.postgresql.org/docs/9.6/static/monitoring-stats.html#PG-STAT-ALL-TABLES-VIEW

Cheers
Patrick

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Jeff Janes
jeff.janes@gmail.com
In reply to: Patrick B (#1)
Re: bloat indexes - opinion

On Tue, Feb 21, 2017 at 1:44 PM, Patrick B <patrickbakerbr@gmail.com> wrote:

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on pg_toast table. Its
real size is 2TB
Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
(full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)

What is this from? If you think the table size reported should include
toast, then change it to do that, or request the author of whatever-this-is
to make that change.

What indication is there that the index is bloated? If the
meat-and-potatoes of a table is held in toast, then wouldn't you expect the
size of the table and the size of the index to be about the same?

Cheers,

Jeff

#7Patrick B
patrickbakerbr@gmail.com
In reply to: Adrian Klaver (#5)
Re: bloat indexes - opinion

2017-02-22 13:10 GMT+13:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 02/21/2017 03:41 PM, Patrick B wrote:

2017-02-22 11:11 GMT+13:00 Patrick B <patrickbakerbr@gmail.com
<mailto:patrickbakerbr@gmail.com>>:

2017-02-22 10:59 GMT+13:00 Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>:

On 02/21/2017 01:44 PM, Patrick B wrote:

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on

pg_toast table.

Its real size is 2TB

How do you know one number is right and the other is wrong?

1. on that table (seg) i store binary data. It is impossible to have
only 18GB of it.
2.

SELECT schema_name,

pg_size_pretty(sum(table_size)::bigint),

(sum(table_size) /
pg_database_size(current_database())) * 100

FROM (

SELECT pg_catalog.pg_namespace.nspname as schema_name,

pg_relation_size(pg_catalog.pg_class.oid) as

table_size

FROM pg_catalog.pg_class

JOIN pg_catalog.pg_namespace ON relnamespace =
pg_catalog.pg_namespace.oid

) t

GROUP BY schema_name

ORDER BY schema_name

pg_toast2706 GB82.62112838877240860000 <-- this belongs to the seg
table.

Have you looked at the functions here?:
https://www.postgresql.org/docs/9.6/static/functions-

admin.html#FUNCTIONS-ADMIN-DBOBJECT

<https://www.postgresql.org/docs/9.6/static/functions-

admin.html#FUNCTIONS-ADMIN-DBOBJECT>

Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg

USING btree

(full_path varchar_pattern_ops) WHERE (full_path IS NOT

NULL)

What is the real impact of a bloat index? If I reindex it,

queries will

be faster?

Thanks
Patrick

I ran the query before and after the reindex, and it seems it did not
help on performance.

*The query I used:*

explain analyze select * from seg where full_path = '/userfile/123';

The table schema would be useful.

Why? If i just wanna know how bloat indexes work?

*Before reindex:*

Index Scan using ix_filter_by_tree on seg (cost=0.00..144.87
rows=215 width=8) (actual time=0.047..0.047 rows=1 loops=1)
Index Cond: (full_path = '/userfile/123')
Total runtime: 0.059 ms
(3 rows)

*After reindex:*

Index Scan using ix_filter_by_tree on seg (cost=0.00..141.83
rows=220 width=8) (actual time=0.021..0.021 rows=1 loops=1)
Index Cond: (full_path = '/userfile/123')
Total runtime: 0.036 ms
(3 rows)

Not showing the complete explain analyze makes the above not all that
enlightening.

I am showing the whole explain analyze mate.

Note that the '*/cost/*' is pretty much the same.

*My question is:*
If I have a bloat index. Why do I need to reindex it if I got none
performance improvements?

Because it is an indication that you may not have index bloat?

Queries for bloat indexes show that I do have bloat indexes. Also, it is
really simple to look.

\d tablename

The table is 18GB big and the index is 17GB big.. this clearly shows me
bloated index.

Not sure a runtime of 0.036 to 0.036 ms over a 2TB table is symptomatic of
a problem.

Might be worth taking a look at:

https://www.postgresql.org/docs/9.6/static/monitoring-
stats.html#PG-STAT-ALL-TABLES-VIEW

<adrian.klaver@aklaver.com>

Patrick.

#8Patrick B
patrickbakerbr@gmail.com
In reply to: Jeff Janes (#6)
Re: bloat indexes - opinion

2017-02-23 11:46 GMT+13:00 Jeff Janes <jeff.janes@gmail.com>:

On Tue, Feb 21, 2017 at 1:44 PM, Patrick B <patrickbakerbr@gmail.com>
wrote:

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on pg_toast table. Its
real size is 2TB
Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
(full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)

What is this from? If you think the table size reported should include
toast, then change it to do that, or request the author of whatever-this-is
to make that change.

What indication is there that the index is bloated? If the
meat-and-potatoes of a table is held in toast, then wouldn't you expect the
size of the table and the size of the index to be about the same?

Cheers,

Jeff

I am running queries to see bloat indexes [1]https://wiki.postgresql.org/wiki/Index_Maintenance. Also i understand an index
can't have same size as table.
If you have any other table that can prove the index is indeed bloat,
please let me know and i will be happy to post results here.

[1]: https://wiki.postgresql.org/wiki/Index_Maintenance

Patrick

#9Patrick B
patrickbakerbr@gmail.com
In reply to: Patrick B (#8)
Re: bloat indexes - opinion

2017-02-25 17:53 GMT+13:00 Patrick B <patrickbakerbr@gmail.com>:

2017-02-23 11:46 GMT+13:00 Jeff Janes <jeff.janes@gmail.com>:

On Tue, Feb 21, 2017 at 1:44 PM, Patrick B <patrickbakerbr@gmail.com>
wrote:

Hi guys,

I've got a lot of bloat indexes on my 4TB database.

Let's take this example:

Table: seg
Index: ix_filter_by_tree
Times_used: 1018082183
Table_size: 18 GB -- wrong. The table is mostly on pg_toast table. Its
real size is 2TB
Index_size: 17 GB
Num_writes 16245023
Index definition: CREATE INDEX ix_filter_by_tree ON seg USING btree
(full_path varchar_pattern_ops) WHERE (full_path IS NOT NULL)

What is this from? If you think the table size reported should include
toast, then change it to do that, or request the author of whatever-this-is
to make that change.

What indication is there that the index is bloated? If the
meat-and-potatoes of a table is held in toast, then wouldn't you expect the
size of the table and the size of the index to be about the same?

Cheers,

Jeff

I am running queries to see bloat indexes [1]. Also i understand an index
can't have same size as table.
If you have any other table that can prove the index is indeed bloat,
please let me know and i will be happy to post results here.

[1] https://wiki.postgresql.org/wiki/Index_Maintenance

Patrick

FYI - using this query to see the index size:

SELECT idstat.schemaname AS schema,

idstat.relname AS table_name,

indexrelname AS index_name,

idstat.idx_scan AS times_used,

pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
'.' || quote_ident(idstat.relname))) AS table_size,

pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
'.' || quote_ident(indexrelname))) AS index_size,

n_tup_upd + n_tup_ins + n_tup_del as num_writes,

indexdef AS definition

FROM pg_stat_user_indexes AS idstat

JOIN pg_indexes ON indexrelname = indexname

JOIN pg_stat_user_tables AS tabstat ON idstat.relname = tabstat.relname

WHERE indexrelname = 'index_name';